statistical classification methods: Topics by WorldWideScience.org

Sample records for statistical classification methods

Classification of Specialized Farms Applying Multivariate Statistical Methods

Directory of Open Access Journals (Sweden)

Zuzana Hloušková

2017-01-01

Full Text Available Classification of specialized farms applying multivariate statistical methods The paper is aimed at application of advanced multivariate statistical methods when classifying cattle breeding farming enterprises by their economic size. Advantage of the model is its ability to use a few selected indicators compared to the complex methodology of current classification model that requires knowledge of detailed structure of the herd turnover and structure of cultivated crops. Output of the paper is intended to be applied within farm structure research focused on future development of Czech agriculture. As data source, the farming enterprises database for 2014 has been used, from the FADN CZ system. The predictive model proposed exploits knowledge of actual size classes of the farms tested. Outcomes of the linear discriminatory analysis multifactor classification method have supported the chance of filing farming enterprises in the group of Small farms (98 % filed correctly, and the Large and Very Large enterprises (100 % filed correctly. The Medium Size farms have been correctly filed at 58.11 % only. Partial shortages of the process presented have been found when discriminating Medium and Small farms.
Application of statistical classification methods for predicting the acceptability of well-water quality

Science.gov (United States)

Cameron, Enrico; Pilla, Giorgio; Stella, Fabio A.

2018-01-01

The application of statistical classification methods is investigated—in comparison also to spatial interpolation methods—for predicting the acceptability of well-water quality in a situation where an effective quantitative model of the hydrogeological system under consideration cannot be developed. In the example area in northern Italy, in particular, the aquifer is locally affected by saline water and the concentration of chloride is the main indicator of both saltwater occurrence and groundwater quality. The goal is to predict if the chloride concentration in a water well will exceed the allowable concentration so that the water is unfit for the intended use. A statistical classification algorithm achieved the best predictive performances and the results of the study show that statistical classification methods provide further tools for dealing with groundwater quality problems concerning hydrogeological systems that are too difficult to describe analytically or to simulate effectively.
Statistical methods of discrimination and classification advances in theory and applications

CERN Document Server

Choi, Sung C

1986-01-01

Statistical Methods of Discrimination and Classification: Advances in Theory and Applications is a collection of papers that tackles the multivariate problems of discriminating and classifying subjects into exclusive population. The book presents 13 papers that cover that advancement in the statistical procedure of discriminating and classifying. The studies in the text primarily focus on various methods of discriminating and classifying variables, such as multiple discriminant analysis in the presence of mixed continuous and categorical data; choice of the smoothing parameter and efficiency o
Classification of Underlying Causes of Power Quality Disturbances: Deterministic versus Statistical Methods

Directory of Open Access Journals (Sweden)

Emmanouil Styvaktakis

2007-01-01

Full Text Available This paper presents the two main types of classification methods for power quality disturbances based on underlying causes: deterministic classification, giving an expert system as an example, and statistical classification, with support vector machines (a novel method as an example. An expert system is suitable when one has limited amount of data and sufficient power system expert knowledge; however, its application requires a set of threshold values. Statistical methods are suitable when large amount of data is available for training. Two important issues to guarantee the effectiveness of a classifier, data segmentation, and feature extraction are discussed. Segmentation of a sequence of data recording is preprocessing to partition the data into segments each representing a duration containing either an event or a transition between two events. Extraction of features is applied to each segment individually. Some useful features and their effectiveness are then discussed. Some experimental results are included for demonstrating the effectiveness of both systems. Finally, conclusions are given together with the discussion of some future research directions.
A Classification of Statistics Courses (A Framework for Studying Statistical Education)

Science.gov (United States)

Turner, J. C.

1976-01-01

A classification of statistics courses in presented, with main categories of "course type,""methods of presentation,""objectives," and "syllabus." Examples and suggestions for uses of the classification are given. (DT)
Statistical methods for segmentation and classification of images

DEFF Research Database (Denmark)

Rosholm, Anders

1997-01-01

The central matter of the present thesis is Bayesian statistical inference applied to classification of images. An initial review of Markov Random Fields relates to the modeling aspect of the indicated main subject. In that connection, emphasis is put on the relatively unknown sub-class of Pickard...... with a Pickard Random Field modeling of a considered (categorical) image phenomemon. An extension of the fast PRF based classification technique is presented. The modification introduces auto-correlation into the model of an involved noise process, which previously has been assumed independent. The suitability...... of the extended model is documented by tests on controlled image data containing auto-correlated noise....
Texture classification by texton: statistical versus binary.

Directory of Open Access Journals (Sweden)

Zhenhua Guo

Full Text Available Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8, image patch (Statistical_Joint and locally invariant fractal (Statistical_Fractal are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor.
A statistical approach to root system classification.

Directory of Open Access Journals (Sweden)

Gernot eBodner

2013-08-01

Full Text Available Plant root systems have a key role in ecology and agronomy. In spite of fast increase in root studies, still there is no classification that allows distinguishing among distinctive characteristics within the diversity of rooting strategies. Our hypothesis is that a multivariate approach for plant functional type identification in ecology can be applied to the classification of root systems. We demonstrate that combining principal component and cluster analysis yields a meaningful classification of rooting types based on morphological traits. The classification method presented is based on a data-defined statistical procedure without a priori decision on the classifiers. Biplot inspection is used to determine key traits and to ensure stability in cluster based grouping. The classification method is exemplified with simulated root architectures and morphological field data. Simulated root architectures showed that morphological attributes with spatial distribution parameters capture most distinctive features within root system diversity. While developmental type (tap vs. shoot-borne systems is a strong, but coarse classifier, topological traits provide the most detailed differentiation among distinctive groups. Adequacy of commonly available morphologic traits for classification is supported by field data. Three rooting types emerged from measured data, distinguished by diameter/weight, density and spatial distribution respectively. Similarity of root systems within distinctive groups was the joint result of phylogenetic relation and environmental as well as human selection pressure. We concluded that the data-define classification is appropriate for integration of knowledge obtained with different root measurement methods and at various scales. Currently root morphology is the most promising basis for classification due to widely used common measurement protocols. To capture details of root diversity efforts in architectural measurement
14 CFR Section 19 - Uniform Classification of Operating Statistics

Science.gov (United States)

2010-01-01

... Statistics Section 19 Section 19 Aeronautics and Space OFFICE OF THE SECRETARY, DEPARTMENT OF TRANSPORTATION... AIR CARRIERS Operating Statistics Classifications Section 19 Uniform Classification of Operating Statistics ...
Classification, (big) data analysis and statistical learning

CERN Document Server

Conversano, Claudio; Vichi, Maurizio

2018-01-01

This edited book focuses on the latest developments in classification, statistical learning, data analysis and related areas of data science, including statistical analysis of large datasets, big data analytics, time series clustering, integration of data from different sources, as well as social networks. It covers both methodological aspects as well as applications to a wide range of areas such as economics, marketing, education, social sciences, medicine, environmental sciences and the pharmaceutical industry. In addition, it describes the basic features of the software behind the data analysis results, and provides links to the corresponding codes and data sets where necessary. This book is intended for researchers and practitioners who are interested in the latest developments and applications in the field. The peer-reviewed contributions were presented at the 10th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in Santa Margherita di Pul...
Statistical Fractal Models Based on GND-PCA and Its Application on Classification of Liver Diseases

Directory of Open Access Journals (Sweden)

Huiyan Jiang

2013-01-01

Full Text Available A new method is proposed to establish the statistical fractal model for liver diseases classification. Firstly, the fractal theory is used to construct the high-order tensor, and then Generalized -dimensional Principal Component Analysis (GND-PCA is used to establish the statistical fractal model and select the feature from the region of liver; at the same time different features have different weights, and finally, Support Vector Machine Optimized Ant Colony (ACO-SVM algorithm is used to establish the classifier for the recognition of liver disease. In order to verify the effectiveness of the proposed method, PCA eigenface method and normal SVM method are chosen as the contrast methods. The experimental results show that the proposed method can reconstruct liver volume better and improve the classification accuracy of liver diseases.
Statistical Emulator for Expensive Classification Simulators

Science.gov (United States)

Ross, Jerret; Samareh, Jamshid A.

2016-01-01

Expensive simulators prevent any kind of meaningful analysis to be performed on the phenomena they model. To get around this problem the concept of using a statistical emulator as a surrogate representation of the simulator was introduced in the 1980's. Presently, simulators have become more and more complex and as a result running a single example on these simulators is very expensive and can take days to weeks or even months. Many new techniques have been introduced, termed criteria, which sequentially select the next best (most informative to the emulator) point that should be run on the simulator. These criteria methods allow for the creation of an emulator with only a small number of simulator runs. We follow and extend this framework to expensive classification simulators.
A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

Directory of Open Access Journals (Sweden)

Zekić-Sušac Marijana

2014-09-01

Full Text Available Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.
Estimation of Lithological Classification in Taipei Basin: A Bayesian Maximum Entropy Method

Science.gov (United States)

Wu, Meng-Ting; Lin, Yuan-Chien; Yu, Hwa-Lung

2015-04-01

In environmental or other scientific applications, we must have a certain understanding of geological lithological composition. Because of restrictions of real conditions, only limited amount of data can be acquired. To find out the lithological distribution in the study area, many spatial statistical methods used to estimate the lithological composition on unsampled points or grids. This study applied the Bayesian Maximum Entropy (BME method), which is an emerging method of the geological spatiotemporal statistics field. The BME method can identify the spatiotemporal correlation of the data, and combine not only the hard data but the soft data to improve estimation. The data of lithological classification is discrete categorical data. Therefore, this research applied Categorical BME to establish a complete three-dimensional Lithological estimation model. Apply the limited hard data from the cores and the soft data generated from the geological dating data and the virtual wells to estimate the three-dimensional lithological classification in Taipei Basin. Keywords: Categorical Bayesian Maximum Entropy method, Lithological Classification, Hydrogeological Setting
Empirical evaluation of data normalization methods for molecular classification.

Science.gov (United States)

Huang, Huei-Chung; Qin, Li-Xuan

2018-01-01

Data artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-marker molecular classifiers-an increasingly important application of microarrays in the era of personalized medicine. In this study, we set out to evaluate the performance of three commonly used methods for data normalization in the context of molecular classification, using extensive simulations based on re-sampling from a unique pair of microRNA microarray datasets for the same set of samples. The data and code for our simulations are freely available as R packages at GitHub. In the presence of confounding handling effects, all three normalization methods tended to improve the accuracy of the classifier when evaluated in an independent test data. The level of improvement and the relative performance among the normalization methods depended on the relative level of molecular signal, the distributional pattern of handling effects (e.g., location shift vs scale change), and the statistical method used for building the classifier. In addition, cross-validation was associated with biased estimation of classification accuracy in the over-optimistic direction for all three normalization methods. Normalization may improve the accuracy of molecular classification for data with confounding handling effects; however, it cannot circumvent the over-optimistic findings associated with cross-validation for assessing classification accuracy.
The ability of current statistical classifications to separate services and manufacturing

DEFF Research Database (Denmark)

Christensen, Jesper Lindgaard

2013-01-01

This paper explores the performance of current statistical classification systems in classifying firms and, in particular, their ability to distinguish between firms that provide services and firms that provide manufacturing. We find that a large share of firms, almost 20%, are not classified...... as expected based on a comparison of their statements of activities with the assigned industry codes. This result is robust to analyses on different levels of aggregation and is validated in an additional survey. It is well known from earlier literature that industry classification systems are not perfect....... This paper provides a quantification of the flaws in classifications of firms. Moreover, it is explained why the classifications of firms are imprecise. The increasing complexity of production, inertia in changes to statistical systems and the increasing integration of manufacturing products and services...
Evaluating the statistical performance of less applied algorithms in classification of worldview-3 imagery data in an urbanized landscape

Science.gov (United States)

Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa

2018-03-01

In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.
BASIC METHODS OF CLASSIFICATION AND CHARACTERISTICS OF METHODS OF PRICING IN UKRAINE

Directory of Open Access Journals (Sweden)

A. Boguslavskiy

2014-12-01

Full Text Available The article provided definitions and shows the need to use different methods of pricing of enterprises. Exposed the reasons of the absence of a universal classification of pricing methods. The approaches of different authors to classify groups of pricing methods: 1 the cost method; 2 methods with a focus on competition; 3 methods for pricing based on demand, 4 pricing with a focus on maximum profit, 5 parametric methods, 6 pricing under risk and uncertainty, etc. An improved classification pricing methods with the release of the following groups: 1 the methods of cost pricing; 2 methods based on demand; 3 methods, based on competition; 4 microeconomic methods; 5 methods which are based on product life cycles; 6 methods, depending on economic conditions; 7 econometric and statistical techniques 8 Methods of transfer pricing; 9 methods in accordance with the terms of agreements; 10 Methods of assortment pricing; 11 combined methods of pricing and so on. The basic directions of use of combined methods of pricing and analysis of their possible use in Ukraine are shown.
Statistic methods for searching inundated radioactive entities

International Nuclear Information System (INIS)

Dubasov, Yu.V.; Krivokhatskij, A.S.; Khramov, N.N.

1993-01-01

The problem of searching flooded radioactive object in a present area was considered. Various models of the searching route plotting are discussed. It is shown that spiral route by random points from the centre of the area examined is the most efficient one. The conclusion is made that, when searching flooded radioactive objects, it is advisable to use multidimensional statistical methods of classification
Ice Water Classification Using Statistical Distribution Based Conditional Random Fields in RADARSAT-2 Dual Polarization Imagery

Science.gov (United States)

Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.

2017-09-01

In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.

Statistics of Monte Carlo methods used in radiation transport calculation

International Nuclear Information System (INIS)

Datta, D.

2009-01-01

Radiation transport calculation can be carried out by using either deterministic or statistical methods. Radiation transport calculation based on statistical methods is basic theme of the Monte Carlo methods. The aim of this lecture is to describe the fundamental statistics required to build the foundations of Monte Carlo technique for radiation transport calculation. Lecture note is organized in the following way. Section (1) will describe the introduction of Basic Monte Carlo and its classification towards the respective field. Section (2) will describe the random sampling methods, a key component of Monte Carlo radiation transport calculation, Section (3) will provide the statistical uncertainty of Monte Carlo estimates, Section (4) will describe in brief the importance of variance reduction techniques while sampling particles such as photon, or neutron in the process of radiation transport
Revisiting Classification of Eating Disorders-toward Diagnostic and Statistical Manual of Mental Disorders-5 and International Statistical Classification of Diseases and Related Health Problems-11.

Science.gov (United States)

Goyal, Shrigopal; Balhara, Yatan Pal Singh; Khandelwal, S K

2012-07-01

Two of the most commonly used nosological systems- International Statistical Classification of Diseases and Related Health Problems (ICD)-10 and Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV are under revision. This process has generated a lot of interesting debates with regards to future of the current diagnostic categories. In fact, the status of categorical approach in the upcoming versions of ICD and DSM is also being debated. The current article focuses on the debate with regards to the eating disorders. The existing classification of eating disorders has been criticized for its limitations. A host of new diagnostic categories have been recommended for inclusion in the upcoming revisions. Also the structure of the existing categories has also been put under scrutiny.
Assessment of statistical methods used in library-based approaches to microbial source tracking.

Science.gov (United States)

Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D

2003-12-01

Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Classification Methods for High-Dimensional Genetic Data

Czech Academy of Sciences Publication Activity Database

Kalina, Jan

2014-01-01

Roč. 34, č. 1 (2014), s. 10-18 ISSN 0208-5216 Institutional support: RVO:67985807 Keywords : multivariate statistics * classification analysis * shrinkage estimation * dimension reduction * data mining Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.646, year: 2014
Methods and statistics for combining motif match scores.

Science.gov (United States)

Bailey, T L; Gribskov, M

1998-01-01

Position-specific scoring matrices are useful for representing and searching for protein sequence motifs. A sequence family can often be described by a group of one or more motifs, and an effective search must combine the scores for matching a sequence to each of the motifs in the group. We describe three methods for combining match scores and estimating the statistical significance of the combined scores and evaluate the search quality (classification accuracy) and the accuracy of the estimate of statistical significance of each. The three methods are: 1) sum of scores, 2) sum of reduced variates, 3) product of score p-values. We show that method 3) is superior to the other two methods in both regards, and that combining motif scores indeed gives better search accuracy. The MAST sequence homology search algorithm utilizing the product of p-values scoring method is available for interactive use and downloading at URL http:/(/)www.sdsc.edu/MEME.
DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.

Science.gov (United States)

Austerlitz, Frederic; David, Olivier; Schaeffer, Brigitte; Bleakley, Kevin; Olteanu, Madalina; Leblois, Raphael; Veuille, Michel; Laredo, Catherine

2009-11-10

DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
DNA barcode analysis: a comparison of phylogenetic and statistical classification methods

Directory of Open Access Journals (Sweden)

Leblois Raphael

2009-11-01

Full Text Available Abstract Background DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i phylogenetic methods (neighbour-joining and PhyML that attempt to account for the genealogical framework of DNA evolution and (ii supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods. These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. Results No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci - nuclear genes - improved the predictive performance of most methods. Conclusion The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
Tridimensional statistic analysis of cooling tower plumes. Methods and results relating to power effect and disposal conditions

International Nuclear Information System (INIS)

Sabaton, M.; Viollet, P.L.; Darles, A.; Gland, H.

1980-07-01

The PANACH three dimensional calculation code developed from tests on a small scale model and validated from full scale measurement campaigns, was used to estimate a three dimensional statistic of plumes. As it is not possible with the calculation times to make a calculation for each radio sondage, a classification method was adopted. This method developed by the French National Meteorological Office is based on a double classification comprising basic classes in which the plumes are assumed to be dynamically similar and a sub-classification to take better account of the true moisture profiles. This statistical method was then applied to the case of 2 or 4 1300 MWe units fitted with natural draught cooling towers of the wet, dry or wet-dry types [fr
A DATA FIELD METHOD FOR URBAN REMOTELY SENSED IMAGERY CLASSIFICATION CONSIDERING SPATIAL CORRELATION

Directory of Open Access Journals (Sweden)

Y. Zhang

2016-06-01

Full Text Available Spatial correlation between pixels is important information for remotely sensed imagery classification. Data field method and spatial autocorrelation statistics have been utilized to describe and model spatial information of local pixels. The original data field method can represent the spatial interactions of neighbourhood pixels effectively. However, its focus on measuring the grey level change between the central pixel and the neighbourhood pixels results in exaggerating the contribution of the central pixel to the whole local window. Besides, Geary’s C has also been proven to well characterise and qualify the spatial correlation between each pixel and its neighbourhood pixels. But the extracted object is badly delineated with the distracting salt-and-pepper effect of isolated misclassified pixels. To correct this defect, we introduce the data field method for filtering and noise limitation. Moreover, the original data field method is enhanced by considering each pixel in the window as the central pixel to compute statistical characteristics between it and its neighbourhood pixels. The last step employs a support vector machine (SVM for the classification of multi-features (e.g. the spectral feature and spatial correlation feature. In order to validate the effectiveness of the developed method, experiments are conducted on different remotely sensed images containing multiple complex object classes inside. The results show that the developed method outperforms the traditional method in terms of classification accuracies.
Optimal statistical damage detection and classification in an experimental wind turbine blade using minimum instrumentation

Science.gov (United States)

Hoell, Simon; Omenzetter, Piotr

2017-04-01

The increasing demand for carbon neutral energy in a challenging economic environment is a driving factor for erecting ever larger wind turbines in harsh environments using novel wind turbine blade (WTBs) designs characterized by high flexibilities and lower buckling capacities. To counteract resulting increasing of operation and maintenance costs, efficient structural health monitoring systems can be employed to prevent dramatic failures and to schedule maintenance actions according to the true structural state. This paper presents a novel methodology for classifying structural damages using vibrational responses from a single sensor. The method is based on statistical classification using Bayes' theorem and an advanced statistic, which allows controlling the performance by varying the number of samples which represent the current state. This is done for multivariate damage sensitive features defined as partial autocorrelation coefficients (PACCs) estimated from vibrational responses and principal component analysis scores from PACCs. Additionally, optimal DSFs are composed not only for damage classification but also for damage detection based on binary statistical hypothesis testing, where features selections are found with a fast forward procedure. The method is applied to laboratory experiments with a small scale WTB with wind-like excitation and non-destructive damage scenarios. The obtained results demonstrate the advantages of the proposed procedure and are promising for future applications of vibration-based structural health monitoring in WTBs.
Visual classification of very fine-grained sediments: Evaluation through univariate and multivariate statistics

Science.gov (United States)

Hohn, M. Ed; Nuhfer, E.B.; Vinopal, R.J.; Klanderman, D.S.

1980-01-01

Classifying very fine-grained rocks through fabric elements provides information about depositional environments, but is subject to the biases of visual taxonomy. To evaluate the statistical significance of an empirical classification of very fine-grained rocks, samples from Devonian shales in four cored wells in West Virginia and Virginia were measured for 15 variables: quartz, illite, pyrite and expandable clays determined by X-ray diffraction; total sulfur, organic content, inorganic carbon, matrix density, bulk density, porosity, silt, as well as density, sonic travel time, resistivity, and ??-ray response measured from well logs. The four lithologic types comprised: (1) sharply banded shale, (2) thinly laminated shale, (3) lenticularly laminated shale, and (4) nonbanded shale. Univariate and multivariate analyses of variance showed that the lithologic classification reflects significant differences for the variables measured, difference that can be detected independently of stratigraphic effects. Little-known statistical methods found useful in this work included: the multivariate analysis of variance with more than one effect, simultaneous plotting of samples and variables on canonical variates, and the use of parametric ANOVA and MANOVA on ranked data. ?? 1980 Plenum Publishing Corporation.
Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides

Directory of Open Access Journals (Sweden)

Stanislawski Jerzy

2013-01-01

Full Text Available Abstract Background Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. Results We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%. The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile to 0.5 CPU-hours (simplified 3D profile to seconds (machine learning. Conclusions We showed that the simplified profile generation method does not introduce an error with regard to the original method, while
Machine learning methods can replace 3D profile method in classification of amyloidogenic hexapeptides.

Science.gov (United States)

Stanislawski, Jerzy; Kotulska, Malgorzata; Unold, Olgierd

2013-01-17

Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset
A study of several CAD methods for classification of clustered microcalcifications

Science.gov (United States)

Wei, Liyang; Yang, Yongyi; Nishikawa, Robert M.; Jiang, Yulei

2005-04-01

In this paper we investigate several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs), aimed to assisting radiologists for more accurate diagnosis of breast cancer in a computer-aided diagnosis (CADx) scheme. The methods we consider include: support vector machine (SVM), kernel Fisher discriminant (KFD), and committee machines (ensemble averaging and AdaBoost), most of which have been developed recently in statistical learning theory. We formulate differentiation of malignant from benign MCs as a supervised learning problem, and apply these learning methods to develop the classification algorithms. As input, these methods use image features automatically extracted from clustered MCs. We test these methods using a database of 697 clinical mammograms from 386 cases, which include a wide spectrum of difficult-to-classify cases. We use receiver operating characteristic (ROC) analysis to evaluate and compare the classification performance by the different methods. In addition, we also investigate how to combine information from multiple-view mammograms of the same case so that the best decision can be made by a classifier. In our experiments, the kernel-based methods (i.e., SVM, KFD) yield the best performance, significantly outperforming a well-established CADx approach based on neural network learning.
Using support vector machines with tract-based spatial statistics for automated classification of Tourette syndrome children

Science.gov (United States)

Wen, Hongwei; Liu, Yue; Wang, Jieqiong; Zhang, Jishui; Peng, Yun; He, Huiguang

2016-03-01

Tourette syndrome (TS) is a developmental neuropsychiatric disorder with the cardinal symptoms of motor and vocal tics which emerges in early childhood and fluctuates in severity in later years. To date, the neural basis of TS is not fully understood yet and TS has a long-term prognosis that is difficult to accurately estimate. Few studies have looked at the potential of using diffusion tensor imaging (DTI) in conjunction with machine learning algorithms in order to automate the classification of healthy children and TS children. Here we apply Tract-Based Spatial Statistics (TBSS) method to 44 TS children and 48 age and gender matched healthy children in order to extract the diffusion values from each voxel in the white matter (WM) skeleton, and a feature selection algorithm (ReliefF) was used to select the most salient voxels for subsequent classification with support vector machine (SVM). We use a nested cross validation to yield an unbiased assessment of the classification method and prevent overestimation. The accuracy (88.04%), sensitivity (88.64%) and specificity (87.50%) were achieved in our method as peak performance of the SVM classifier was achieved using the axial diffusion (AD) metric, demonstrating the potential of a joint TBSS and SVM pipeline for fast, objective classification of healthy and TS children. These results support that our methods may be useful for the early identification of subjects with TS, and hold promise for predicting prognosis and treatment outcome for individuals with TS.
Urban Image Classification: Per-Pixel Classifiers, Sub-Pixel Analysis, Object-Based Image Analysis, and Geospatial Methods. 10; Chapter

Science.gov (United States)

Myint, Soe W.; Mesev, Victor; Quattrochi, Dale; Wentz, Elizabeth A.

2013-01-01

Remote sensing methods used to generate base maps to analyze the urban environment rely predominantly on digital sensor data from space-borne platforms. This is due in part from new sources of high spatial resolution data covering the globe, a variety of multispectral and multitemporal sources, sophisticated statistical and geospatial methods, and compatibility with GIS data sources and methods. The goal of this chapter is to review the four groups of classification methods for digital sensor data from space-borne platforms; per-pixel, sub-pixel, object-based (spatial-based), and geospatial methods. Per-pixel methods are widely used methods that classify pixels into distinct categories based solely on the spectral and ancillary information within that pixel. They are used for simple calculations of environmental indices (e.g., NDVI) to sophisticated expert systems to assign urban land covers. Researchers recognize however, that even with the smallest pixel size the spectral information within a pixel is really a combination of multiple urban surfaces. Sub-pixel classification methods therefore aim to statistically quantify the mixture of surfaces to improve overall classification accuracy. While within pixel variations exist, there is also significant evidence that groups of nearby pixels have similar spectral information and therefore belong to the same classification category. Object-oriented methods have emerged that group pixels prior to classification based on spectral similarity and spatial proximity. Classification accuracy using object-based methods show significant success and promise for numerous urban 3 applications. Like the object-oriented methods that recognize the importance of spatial proximity, geospatial methods for urban mapping also utilize neighboring pixels in the classification process. The primary difference though is that geostatistical methods (e.g., spatial autocorrelation methods) are utilized during both the pre- and post-classification
Statistical methods

CERN Document Server

Szulc, Stefan

1965-01-01

Statistical Methods provides a discussion of the principles of the organization and technique of research, with emphasis on its application to the problems in social statistics. This book discusses branch statistics, which aims to develop practical ways of collecting and processing numerical data and to adapt general statistical methods to the objectives in a given field.Organized into five parts encompassing 22 chapters, this book begins with an overview of how to organize the collection of such information on individual units, primarily as accomplished by government agencies. This text then
A Quantile Mapping Bias Correction Method Based on Hydroclimatic Classification of the Guiana Shield.

Science.gov (United States)

Ringard, Justine; Seyler, Frederique; Linguet, Laurent

2017-06-16

Satellite precipitation products (SPPs) provide alternative precipitation data for regions with sparse rain gauge measurements. However, SPPs are subject to different types of error that need correction. Most SPP bias correction methods use the statistical properties of the rain gauge data to adjust the corresponding SPP data. The statistical adjustment does not make it possible to correct the pixels of SPP data for which there is no rain gauge data. The solution proposed in this article is to correct the daily SPP data for the Guiana Shield using a novel two set approach, without taking into account the daily gauge data of the pixel to be corrected, but the daily gauge data from surrounding pixels. In this case, a spatial analysis must be involved. The first step defines hydroclimatic areas using a spatial classification that considers precipitation data with the same temporal distributions. The second step uses the Quantile Mapping bias correction method to correct the daily SPP data contained within each hydroclimatic area. We validate the results by comparing the corrected SPP data and daily rain gauge measurements using relative RMSE and relative bias statistical errors. The results show that analysis scale variation reduces rBIAS and rRMSE significantly. The spatial classification avoids mixing rainfall data with different temporal characteristics in each hydroclimatic area, and the defined bias correction parameters are more realistic and appropriate. This study demonstrates that hydroclimatic classification is relevant for implementing bias correction methods at the local scale.
75 FR 39265 - National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards...

Science.gov (United States)

2010-07-08

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Prevention, Classifications and Public Health Data Standards, 3311 Toledo Road, Room 2337, Hyattsville, MD...
78 FR 53148 - National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards...

Science.gov (United States)

2013-08-28

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Administrator, Classifications and Public Health Data Standards Staff, NCHS, 3311 Toledo Road, Room 2337...

78 FR 9055 - National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards...

Science.gov (United States)

2013-02-07

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the..., Medical Systems Administrator, Classifications and Public Health Data Standards Staff, NCHS, 3311 Toledo...
Multi-agent Negotiation Mechanisms for Statistical Target Classification in Wireless Multimedia Sensor Networks

Science.gov (United States)

Wang, Xue; Bi, Dao-wei; Ding, Liang; Wang, Sheng

2007-01-01

The recent availability of low cost and miniaturized hardware has allowed wireless sensor networks (WSNs) to retrieve audio and video data in real world applications, which has fostered the development of wireless multimedia sensor networks (WMSNs). Resource constraints and challenging multimedia data volume make development of efficient algorithms to perform in-network processing of multimedia contents imperative. This paper proposes solving problems in the domain of WMSNs from the perspective of multi-agent systems. The multi-agent framework enables flexible network configuration and efficient collaborative in-network processing. The focus is placed on target classification in WMSNs where audio information is retrieved by microphones. To deal with the uncertainties related to audio information retrieval, the statistical approaches of power spectral density estimates, principal component analysis and Gaussian process classification are employed. A multi-agent negotiation mechanism is specially developed to efficiently utilize limited resources and simultaneously enhance classification accuracy and reliability. The negotiation is composed of two phases, where an auction based approach is first exploited to allocate the classification task among the agents and then individual agent decisions are combined by the committee decision mechanism. Simulation experiments with real world data are conducted and the results show that the proposed statistical approaches and negotiation mechanism not only reduce memory and computation requirements in WMSNs but also significantly enhance classification accuracy and reliability. PMID:28903223
Selecting statistical models and variable combinations for optimal classification using otolith microchemistry.

Science.gov (United States)

Mercier, Lény; Darnaude, Audrey M; Bruguier, Olivier; Vasconcelos, Rita P; Cabral, Henrique N; Costa, Maria J; Lara, Monica; Jones, David L; Mouillot, David

2011-06-01

Reliable assessment of fish origin is of critical importance for exploited species, since nursery areas must be identified and protected to maintain recruitment to the adult stock. During the last two decades, otolith chemical signatures (or "fingerprints") have been increasingly used as tools to discriminate between coastal habitats. However, correct assessment of fish origin from otolith fingerprints depends on various environmental and methodological parameters, including the choice of the statistical method used to assign fish to unknown origin. Among the available methods of classification, Linear Discriminant Analysis (LDA) is the most frequently used, although it assumes data are multivariate normal with homogeneous within-group dispersions, conditions that are not always met by otolith chemical data, even after transformation. Other less constrained classification methods are available, but there is a current lack of comparative analysis in applications to otolith microchemistry. Here, we assessed stock identification accuracy for four classification methods (LDA, Quadratic Discriminant Analysis [QDA], Random Forests [RF], and Artificial Neural Networks [ANN]), through the use of three distinct data sets. In each case, all possible combinations of chemical elements were examined to identify the elements to be used for optimal accuracy in fish assignment to their actual origin. Our study shows that accuracy differs according to the model and the number of elements considered. Best combinations did not include all the elements measured, and it was not possible to define an ad hoc multielement combination for accurate site discrimination. Among all the models tested, RF and ANN performed best, especially for complex data sets (e.g., with numerous fish species and/or chemical elements involved). However, for these data, RF was less time-consuming and more interpretable than ANN, and far more efficient and less demanding in terms of assumptions than LDA or QDA
A novel statistical method for classifying habitat generalists and specialists

DEFF Research Database (Denmark)

Chazdon, Robin L; Chao, Anne; Colwell, Robert K

2011-01-01

in second-growth (SG) and old-growth (OG) rain forests in the Caribbean lowlands of northeastern Costa Rica. We evaluate the multinomial model in detail for the tree data set. Our results for birds were highly concordant with a previous nonstatistical classification, but our method classified a higher......: (1) generalist; (2) habitat A specialist; (3) habitat B specialist; and (4) too rare to classify with confidence. We illustrate our multinomial classification method using two contrasting data sets: (1) bird abundance in woodland and heath habitats in southeastern Australia and (2) tree abundance...... fraction (57.7%) of bird species with statistical confidence. Based on a conservative specialization threshold and adjustment for multiple comparisons, 64.4% of tree species in the full sample were too rare to classify with confidence. Among the species classified, OG specialists constituted the largest...
Random forests for classification in ecology

Science.gov (United States)

Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.

2007-01-01

Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.
Galaxy And Mass Assembly: automatic morphological classification of galaxies using statistical learning

Science.gov (United States)

Sreejith, Sreevarsha; Pereverzyev, Sergiy, Jr.; Kelvin, Lee S.; Marleau, Francine R.; Haltmeier, Markus; Ebner, Judith; Bland-Hawthorn, Joss; Driver, Simon P.; Graham, Alister W.; Holwerda, Benne W.; Hopkins, Andrew M.; Liske, Jochen; Loveday, Jon; Moffett, Amanda J.; Pimbblet, Kevin A.; Taylor, Edward N.; Wang, Lingyu; Wright, Angus H.

2018-03-01

We apply four statistical learning methods to a sample of 7941 galaxies (z test the feasibility of using automated algorithms to classify galaxies. Using 10 features measured for each galaxy (sizes, colours, shape parameters, and stellar mass), we apply the techniques of Support Vector Machines, Classification Trees, Classification Trees with Random Forest (CTRF) and Neural Networks, and returning True Prediction Ratios (TPRs) of 75.8 per cent, 69.0 per cent, 76.2 per cent, and 76.0 per cent, respectively. Those occasions whereby all four algorithms agree with each other yet disagree with the visual classification (`unanimous disagreement') serves as a potential indicator of human error in classification, occurring in ˜ 9 per cent of ellipticals, ˜ 9 per cent of little blue spheroids, ˜ 14 per cent of early-type spirals, ˜ 21 per cent of intermediate-type spirals, and ˜ 4 per cent of late-type spirals and irregulars. We observe that the choice of parameters rather than that of algorithms is more crucial in determining classification accuracy. Due to its simplicity in formulation and implementation, we recommend the CTRF algorithm for classifying future galaxy data sets. Adopting the CTRF algorithm, the TPRs of the five galaxy types are : E, 70.1 per cent; LBS, 75.6 per cent; S0-Sa, 63.6 per cent; Sab-Scd, 56.4 per cent, and Sd-Irr, 88.9 per cent. Further, we train a binary classifier using this CTRF algorithm that divides galaxies into spheroid-dominated (E, LBS, and S0-Sa) and disc-dominated (Sab-Scd and Sd-Irr), achieving an overall accuracy of 89.8 per cent. This translates into an accuracy of 84.9 per cent for spheroid-dominated systems and 92.5 per cent for disc-dominated systems.
A method for classification of network traffic based on C5.0 Machine Learning Algorithm

DEFF Research Database (Denmark)

Bujlow, Tomasz; Riaz, M. Tahir; Pedersen, Jens Myrup

2012-01-01

current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of statistical traffic information received from volunteers and C5.0 algorithm we constructed a boosted classifier, which was shown...... and classification, an algorithm for recognizing flow direction and the C5.0 itself. Classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming and SSH. We performed subsequent tries using different sets of parameters and both training and classification options...
The ability of current statistical classifications to separate services and manufacturing

DEFF Research Database (Denmark)

Christensen, Jesper Lindgaard

The paper explores how well our statistical classification systems perform in classifying firms and in particular how they distinguish firms doing services and/or manufacturing. It is found that a large share, almost 20%, of firms can be said to be misclassified based on their statements on activ...
A statistically harmonized alignment-classification in image space enables accurate and robust alignment of noisy images in single particle analysis.

Science.gov (United States)

Kawata, Masaaki; Sato, Chikara

2007-06-01

In determining the three-dimensional (3D) structure of macromolecular assemblies in single particle analysis, a large representative dataset of two-dimensional (2D) average images from huge number of raw images is a key for high resolution. Because alignments prior to averaging are computationally intensive, currently available multireference alignment (MRA) software does not survey every possible alignment. This leads to misaligned images, creating blurred averages and reducing the quality of the final 3D reconstruction. We present a new method, in which multireference alignment is harmonized with classification (multireference multiple alignment: MRMA). This method enables a statistical comparison of multiple alignment peaks, reflecting the similarities between each raw image and a set of reference images. Among the selected alignment candidates for each raw image, misaligned images are statistically excluded, based on the principle that aligned raw images of similar projections have a dense distribution around the correctly aligned coordinates in image space. This newly developed method was examined for accuracy and speed using model image sets with various signal-to-noise ratios, and with electron microscope images of the Transient Receptor Potential C3 and the sodium channel. In every data set, the newly developed method outperformed conventional methods in robustness against noise and in speed, creating 2D average images of higher quality. This statistically harmonized alignment-classification combination should greatly improve the quality of single particle analysis.
A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

KAUST Repository

Abusamra, Heba

2013-05-01

Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in
Classification of lung sounds using higher-order statistics: A divide-and-conquer approach.

Science.gov (United States)

Naves, Raphael; Barbosa, Bruno H G; Ferreira, Danton D

2016-06-01

Lung sound auscultation is one of the most commonly used methods to evaluate respiratory diseases. However, the effectiveness of this method depends on the physician's training. If the physician does not have the proper training, he/she will be unable to distinguish between normal and abnormal sounds generated by the human body. Thus, the aim of this study was to implement a pattern recognition system to classify lung sounds. We used a dataset composed of five types of lung sounds: normal, coarse crackle, fine crackle, monophonic and polyphonic wheezes. We used higher-order statistics (HOS) to extract features (second-, third- and fourth-order cumulants), Genetic Algorithms (GA) and Fisher's Discriminant Ratio (FDR) to reduce dimensionality, and k-Nearest Neighbors and Naive Bayes classifiers to recognize the lung sound events in a tree-based system. We used the cross-validation procedure to analyze the classifiers performance and the Tukey's Honestly Significant Difference criterion to compare the results. Our results showed that the Genetic Algorithms outperformed the Fisher's Discriminant Ratio for feature selection. Moreover, each lung class had a different signature pattern according to their cumulants showing that HOS is a promising feature extraction tool for lung sounds. Besides, the proposed divide-and-conquer approach can accurately classify different types of lung sounds. The classification accuracy obtained by the best tree-based classifier was 98.1% for classification accuracy on training, and 94.6% for validation data. The proposed approach achieved good results even using only one feature extraction tool (higher-order statistics). Additionally, the implementation of the proposed classifier in an embedded system is feasible. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Toward optimal feature selection using ranking methods and classification algorithms

Directory of Open Access Journals (Sweden)

Novaković Jasmina

2011-01-01

Full Text Available We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, C4.5 decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. Our cases confirm that, in order to be sure that a subset of features giving the highest accuracy has been selected, the use of many different indices is recommended.
ACCUWIND - Methods for classification of cup anemometers

Energy Technology Data Exchange (ETDEWEB)

Dahlberg, J.Aa.; Friis Pedersen, T.; Busche, P.

2006-05-15

Errors associated with the measurement of wind speed are the major sources of uncertainties in power performance testing of wind turbines. Field comparisons of well-calibrated anemometers show significant and not acceptable difference. The European CLASSCUP project posed the objectives to quantify the errors associated with the use of cup anemometers, and to develop a classification system for quantification of systematic errors of cup anemometers. This classification system has now been implemented in the IEC 61400-12-1 standard on power performance measurements in annex I and J. The classification of cup anemometers requires general external climatic operational ranges to be applied for the analysis of systematic errors. A Class A category classification is connected to reasonably flat sites, and another Class B category is connected to complex terrain, General classification indices are the result of assessment of systematic deviations. The present report focuses on methods that can be applied for assessment of such systematic deviations. A new alternative method for torque coefficient measurements at inclined flow have been developed, which have then been applied and compared to the existing methods developed in the CLASSCUP project and earlier. A number of approaches including the use of two cup anemometer models, two methods of torque coefficient measurement, two angular response measurements, and inclusion and exclusion of influence of friction have been implemented in the classification process in order to assess the robustness of methods. The results of the analysis are presented as classification indices, which are compared and discussed. (au)
Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

Science.gov (United States)

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

2018-06-05

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Application of Cocktail method in vegetation classification

Directory of Open Access Journals (Sweden)

Hamed Asadi

2016-09-01

Full Text Available This study intends to assess the application of Cocktail method in the classification of large vegetation databases. For this purpose, Buxus hyrcana dataset consisted of 442 relevés with 89 species were used and by the modified TWINSPAN. For running the Cocktail method, first primarily classification was done by modified TWINSPAN, and by performing phi analysis in the groups resulted five species were selected which had the highest fidelity value. Then sociological species groups were formed by examining co-occurrence of these 5 species with other species in the database. 21 plant communities belongs to 6 variant, 17 sub associations, 11 associations, 4 alliance, 1 order and 1 class were recognized by assigning 379 releves to the sociological species groups by using logical formulas. Also, 63 releves by the logical formula were not assigned to any sociological species groups, by FPFI index were assigned to the sociological species groups which had the most index value. According to 91% classification agreement with Brown-Blanquet classification and Cocktail classification, we suggest Cocktail method to vegetation scientists as an efficient alternative of Braun-Blanquet method to classify large vegetation databases.
Improving Classification of Airborne Laser Scanning Echoes in the Forest-Tundra Ecotone Using Geostatistical and Statistical Measures

Directory of Open Access Journals (Sweden)

Nadja Stumberg

2014-05-01

Full Text Available The vegetation in the forest-tundra ecotone zone is expected to be highly affected by climate change and requires effective monitoring techniques. Airborne laser scanning (ALS has been proposed as a tool for the detection of small pioneer trees for such vast areas using laser height and intensity data. The main objective of the present study was to assess a possible improvement in the performance of classifying tree and nontree laser echoes from high-density ALS data. The data were collected along a 1000 km long transect stretching from southern to northern Norway. Different geostatistical and statistical measures derived from laser height and intensity values were used to extent and potentially improve more simple models ignoring the spatial context. Generalised linear models (GLM and support vector machines (SVM were employed as classification methods. Total accuracies and Cohen’s kappa coefficients were calculated and compared to those of simpler models from a previous study. For both classification methods, all models revealed total accuracies similar to the results of the simpler models. Concerning classification performance, however, the comparison of the kappa coefficients indicated a significant improvement for some models both using GLM and SVM, with classification accuracies >94%.
[Correlation coefficient-based principle and method for the classification of jump degree in hydrological time series].

Science.gov (United States)

Wu, Zi Yi; Xie, Ping; Sang, Yan Fang; Gu, Hai Ting

2018-04-01

The phenomenon of jump is one of the importantly external forms of hydrological variabi-lity under environmental changes, representing the adaption of hydrological nonlinear systems to the influence of external disturbances. Presently, the related studies mainly focus on the methods for identifying the jump positions and jump times in hydrological time series. In contrast, few studies have focused on the quantitative description and classification of jump degree in hydrological time series, which make it difficult to understand the environmental changes and evaluate its potential impacts. Here, we proposed a theatrically reliable and easy-to-apply method for the classification of jump degree in hydrological time series, using the correlation coefficient as a basic index. The statistical tests verified the accuracy, reasonability, and applicability of this method. The relationship between the correlation coefficient and the jump degree of series were described using mathematical equation by derivation. After that, several thresholds of correlation coefficients under different statistical significance levels were chosen, based on which the jump degree could be classified into five levels: no, weak, moderate, strong and very strong. Finally, our method was applied to five diffe-rent observed hydrological time series, with diverse geographic and hydrological conditions in China. The results of the classification of jump degrees in those series were closely accorded with their physically hydrological mechanisms, indicating the practicability of our method.
Extension classification method for low-carbon product cases

Directory of Open Access Journals (Sweden)

Yanwei Zhao

2016-05-01

Full Text Available In product low-carbon design, intelligent decision systems integrated with certain classification algorithms recommend the existing design cases to designers. However, these systems mostly dependent on prior experience, and product designers not only expect to get a satisfactory case from an intelligent system but also hope to achieve assistance in modifying unsatisfactory cases. In this article, we proposed a new categorization method composed of static and dynamic classification based on extension theory. This classification method can be integrated into case-based reasoning system to get accurate classification results and to inform designers of detailed information about unsatisfactory cases. First, we establish the static classification model for cases by dependent function in a hierarchical structure. Then for dynamic classification, we make transformation for cases based on case model, attributes, attribute values, and dependent function, thus cases can take qualitative changes. Finally, the applicability of proposed method is demonstrated through a case study of screw air compressor cases.
A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

KAUST Repository

Abusamra, Heba

2013-11-01

Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.
A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

KAUST Repository

Abusamra, Heba

2013-01-01

Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

Highly Robust Statistical Methods in Medical Image Analysis

Czech Academy of Sciences Publication Activity Database

Kalina, Jan

2012-01-01

Roč. 32, č. 2 (2012), s. 3-16 ISSN 0208-5216 R&D Projects: GA MŠk(CZ) 1M06014 Institutional research plan: CEZ:AV0Z10300504 Keywords : robust statistics * classification * faces * robust image analysis * forensic science Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.208, year: 2012 http://www.ibib.waw.pl/bbe/bbefulltext/BBE_32_2_003_FT.pdf
Evaluation of normalization methods for cDNA microarray data by k-NN classification

Energy Technology Data Exchange (ETDEWEB)

Wu, Wei; Xing, Eric P; Myers, Connie; Mian, Saira; Bissell, Mina J

2004-12-17

-bias-removal normalization strategies, IGLOESS-SLFILTERW7, ISTSPLINE-SLLOESS and IGLOESS-SLLOESS, outperform other strategies for removing spatial effect, intensity effect and scale differences from cDNA microarray data. The apparent sensitivity of k-NN LOOCV classification error to dye biases suggests that this criterion provides an informative measure for evaluating normalization methods. All the computational tools used in this study were implemented using the R language for statistical computing and graphics.
AN OBJECT-BASED METHOD FOR CHINESE LANDFORM TYPES CLASSIFICATION

Directory of Open Access Journals (Sweden)

H. Ding

2016-06-01

Full Text Available Landform classification is a necessary task for various fields of landscape and regional planning, for example for landscape evaluation, erosion studies, hazard prediction, et al. This study proposes an improved object-based classification for Chinese landform types using the factor importance analysis of random forest and the gray-level co-occurrence matrix (GLCM. In this research, based on 1km DEM of China, the combination of the terrain factors extracted from DEM are selected by correlation analysis and Sheffield's entropy method. Random forest classification tree is applied to evaluate the importance of the terrain factors, which are used as multi-scale segmentation thresholds. Then the GLCM is conducted for the knowledge base of classification. The classification result was checked by using the 1:4,000,000 Chinese Geomorphological Map as reference. And the overall classification accuracy of the proposed method is 5.7% higher than ISODATA unsupervised classification, and 15.7% higher than the traditional object-based classification method.
Application of PCA and SIMCA statistical analysis of FT-IR spectra for the classification and identification of different slag types with environmental origin.

Science.gov (United States)

Stumpe, B; Engel, T; Steinweg, B; Marschner, B

2012-04-03

In the past, different slag materials were often used for landscaping and construction purposes or simply dumped. Nowadays German environmental laws strictly control the use of slags, but there is still a remaining part of 35% which is uncontrolled dumped in landfills. Since some slags have high heavy metal contents and different slag types have typical chemical and physical properties that will influence the risk potential and other characteristics of the deposits, an identification of the slag types is needed. We developed a FT-IR-based statistical method to identify different slags classes. Slags samples were collected at different sites throughout various cities within the industrial Ruhr area. Then, spectra of 35 samples from four different slags classes, ladle furnace (LF), blast furnace (BF), oxygen furnace steel (OF), and zinc furnace slags (ZF), were determined in the mid-infrared region (4000-400 cm(-1)). The spectra data sets were subject to statistical classification methods for the separation of separate spectral data of different slag classes. Principal component analysis (PCA) models for each slag class were developed and further used for soft independent modeling of class analogy (SIMCA). Precise classification of slag samples into four different slag classes were achieved using two different SIMCA models stepwise. At first, SIMCA 1 was used for classification of ZF as well as OF slags over the total spectral range. If no correct classification was found, then the spectrum was analyzed with SIMCA 2 at reduced wavenumbers for the classification of LF as well as BF spectra. As a result, we provide a time- and cost-efficient method based on FT-IR spectroscopy for processing and identifying large numbers of environmental slag samples.
Multi-agent Negotiation Mechanisms for Statistical Target Classification in Wireless Multimedia Sensor Networks

Directory of Open Access Journals (Sweden)

Sheng Wang

2007-10-01

Full Text Available The recent availability of low cost and miniaturized hardware has allowedwireless sensor networks (WSNs to retrieve audio and video data in real worldapplications, which has fostered the development of wireless multimedia sensor networks(WMSNs. Resource constraints and challenging multimedia data volume makedevelopment of efficient algorithms to perform in-network processing of multimediacontents imperative. This paper proposes solving problems in the domain of WMSNs fromthe perspective of multi-agent systems. The multi-agent framework enables flexible networkconfiguration and efficient collaborative in-network processing. The focus is placed ontarget classification in WMSNs where audio information is retrieved by microphones. Todeal with the uncertainties related to audio information retrieval, the statistical approachesof power spectral density estimates, principal component analysis and Gaussian processclassification are employed. A multi-agent negotiation mechanism is specially developed toefficiently utilize limited resources and simultaneously enhance classification accuracy andreliability. The negotiation is composed of two phases, where an auction based approach isfirst exploited to allocate the classification task among the agents and then individual agentdecisions are combined by the committee decision mechanism. Simulation experiments withreal world data are conducted and the results show that the proposed statistical approachesand negotiation mechanism not only reduce memory and computation requi
Tweet-based Target Market Classification Using Ensemble Method

Directory of Open Access Journals (Sweden)

Muhammad Adi Khairul Anshary

2016-09-01

Full Text Available Target market classification is aimed at focusing marketing activities on the right targets. Classification of target markets can be done through data mining and by utilizing data from social media, e.g. Twitter. The end result of data mining are learning models that can classify new data. Ensemble methods can improve the accuracy of the models and therefore provide better results. In this study, classification of target markets was conducted on a dataset of 3000 tweets in order to extract features. Classification models were constructed to manipulate the training data using two ensemble methods (bagging and boosting. To investigate the effectiveness of the ensemble methods, this study used the CART (classification and regression tree algorithm for comparison. Three categories of consumer goods (computers, mobile phones and cameras and three categories of sentiments (positive, negative and neutral were classified towards three target-market categories. Machine learning was performed using Weka 3.6.9. The results of the test data showed that the bagging method improved the accuracy of CART with 1.9% (to 85.20%. On the other hand, for sentiment classification, the ensemble methods were not successful in increasing the accuracy of CART. The results of this study may be taken into consideration by companies who approach their customers through social media, especially Twitter.
Statistical analysis of textural features for improved classification of oral histopathological images.

Science.gov (United States)

Muthu Rama Krishnan, M; Shah, Pratik; Chakraborty, Chandan; Ray, Ajoy K

2012-04-01

The objective of this paper is to provide an improved technique, which can assist oncopathologists in correct screening of oral precancerous conditions specially oral submucous fibrosis (OSF) with significant accuracy on the basis of collagen fibres in the sub-epithelial connective tissue. The proposed scheme is composed of collagen fibres segmentation, its textural feature extraction and selection, screening perfomance enhancement under Gaussian transformation and finally classification. In this study, collagen fibres are segmented on R,G,B color channels using back-probagation neural network from 60 normal and 59 OSF histological images followed by histogram specification for reducing the stain intensity variation. Henceforth, textural features of collgen area are extracted using fractal approaches viz., differential box counting and brownian motion curve . Feature selection is done using Kullback-Leibler (KL) divergence criterion and the screening performance is evaluated based on various statistical tests to conform Gaussian nature. Here, the screening performance is enhanced under Gaussian transformation of the non-Gaussian features using hybrid distribution. Moreover, the routine screening is designed based on two statistical classifiers viz., Bayesian classification and support vector machines (SVM) to classify normal and OSF. It is observed that SVM with linear kernel function provides better classification accuracy (91.64%) as compared to Bayesian classifier. The addition of fractal features of collagen under Gaussian transformation improves Bayesian classifier's performance from 80.69% to 90.75%. Results are here studied and discussed.
Statistical Analysis of Categorical Time Series of Atmospheric Elementary Circulation Mechanisms - Dzerdzeevski Classification for the Northern Hemisphere.

Science.gov (United States)

Brenčič, Mihael

2016-01-01

Northern hemisphere elementary circulation mechanisms, defined with the Dzerdzeevski classification and published on a daily basis from 1899-2012, are analysed with statistical methods as continuous categorical time series. Classification consists of 41 elementary circulation mechanisms (ECM), which are assigned to calendar days. Empirical marginal probabilities of each ECM were determined. Seasonality and the periodicity effect were investigated with moving dispersion filters and randomisation procedure on the ECM categories as well as with the time analyses of the ECM mode. The time series were determined as being non-stationary with strong time-dependent trends. During the investigated period, periodicity interchanges with periods when no seasonality is present. In the time series structure, the strongest division is visible at the milestone of 1986, showing that the atmospheric circulation pattern reflected in the ECM has significantly changed. This change is result of the change in the frequency of ECM categories; before 1986, the appearance of ECM was more diverse, and afterwards fewer ECMs appear. The statistical approach applied to the categorical climatic time series opens up new potential insight into climate variability and change studies that have to be performed in the future.
Classification of high resolution satellite images

OpenAIRE

Karlsson, Anders

2003-01-01

In this thesis the Support Vector Machine (SVM)is applied on classification of high resolution satellite images. Sveral different measures for classification, including texture mesasures, 1st order statistics, and simple contextual information were evaluated. Additionnally, the image was segmented, using an enhanced watershed method, in order to improve the classification accuracy.
The research on business rules classification and specification methods

OpenAIRE

Baltrušaitis, Egidijus

2005-01-01

The work is based on the research of business rules classification and specification methods. The basics of business rules approach are discussed. The most common business rules classification and modeling methods are analyzed. Business rules modeling techniques and tools for supporting them in the information systems are presented. Basing on the analysis results business rules classification method is proposed. Templates for every business rule type are presented. Business rules structuring ...
Statistical control chart and neural network classification for improving human fall detection

KAUST Repository

Harrou, Fouzi; Zerrouki, Nabil; Sun, Ying; Houacine, Amrane

2017-01-01

This paper proposes a statistical approach to detect and classify human falls based on both visual data from camera and accelerometric data captured by accelerometer. Specifically, we first use a Shewhart control chart to detect the presence of potential falls by using accelerometric data. Unfortunately, this chart cannot distinguish real falls from fall-like actions, such as lying down. To bypass this difficulty, a neural network classifier is then applied only on the detected cases through visual data. To assess the performance of the proposed method, experiments are conducted on the publicly available fall detection databases: the University of Rzeszow's fall detection (URFD) dataset. Results demonstrate that the detection phase play a key role in reducing the number of sequences used as input into the neural network classifier for classification, significantly reducing computational burden and achieving better accuracy.
Statistical control chart and neural network classification for improving human fall detection

KAUST Repository

Harrou, Fouzi

2017-01-05

This paper proposes a statistical approach to detect and classify human falls based on both visual data from camera and accelerometric data captured by accelerometer. Specifically, we first use a Shewhart control chart to detect the presence of potential falls by using accelerometric data. Unfortunately, this chart cannot distinguish real falls from fall-like actions, such as lying down. To bypass this difficulty, a neural network classifier is then applied only on the detected cases through visual data. To assess the performance of the proposed method, experiments are conducted on the publicly available fall detection databases: the University of Rzeszow\\'s fall detection (URFD) dataset. Results demonstrate that the detection phase play a key role in reducing the number of sequences used as input into the neural network classifier for classification, significantly reducing computational burden and achieving better accuracy.
Advanced Steel Microstructural Classification by Deep Learning Methods.

Science.gov (United States)

Azimi, Seyed Majid; Britz, Dominik; Engstler, Michael; Fritz, Mario; Mücklich, Frank

2018-02-01

The inner structure of a material is called microstructure. It stores the genesis of a material and determines all its physical and chemical properties. While microstructural characterization is widely spread and well known, the microstructural classification is mostly done manually by human experts, which gives rise to uncertainties due to subjectivity. Since the microstructure could be a combination of different phases or constituents with complex substructures its automatic classification is very challenging and only a few prior studies exist. Prior works focused on designed and engineered features by experts and classified microstructures separately from the feature extraction step. Recently, Deep Learning methods have shown strong performance in vision applications by learning the features from data together with the classification step. In this work, we propose a Deep Learning method for microstructural classification in the examples of certain microstructural constituents of low carbon steel. This novel method employs pixel-wise segmentation via Fully Convolutional Neural Network (FCNN) accompanied by a max-voting scheme. Our system achieves 93.94% classification accuracy, drastically outperforming the state-of-the-art method of 48.89% accuracy. Beyond the strong performance of our method, this line of research offers a more robust and first of all objective way for the difficult task of steel quality appreciation.
Microvariability in AGNs: study of different statistical methods - I. Observational analysis

Science.gov (United States)

Zibecchi, L.; Andruchow, I.; Cellone, S. A.; Carpintero, D. D.; Romero, G. E.; Combi, J. A.

2017-05-01

We present the results of a study of different statistical methods currently used in the literature to analyse the (micro)variability of active galactic nuclei (AGNs) from ground-based optical observations. In particular, we focus on the comparison between the results obtained by applying the so-called C and F statistics, which are based on the ratio of standard deviations and variances, respectively. The motivation for this is that the implementation of these methods leads to different and contradictory results, making the variability classification of the light curves of a certain source dependent on the statistics implemented. For this purpose, we re-analyse the results on an AGN sample observed along several sessions with the 2.15 m 'Jorge Sahade' telescope (CASLEO), San Juan, Argentina. For each AGN, we constructed the nightly differential light curves. We thus obtained a total of 78 light curves for 39 AGNs, and we then applied the statistical tests mentioned above, in order to re-classify the variability state of these light curves and in an attempt to find the suitable statistical methodology to study photometric (micro)variations. We conclude that, although the C criterion is not proper a statistical test, it could still be a suitable parameter to detect variability and that its application allows us to get more reliable variability results, in contrast with the F test.
Statistical methods of evaluating and comparing imaging techniques

International Nuclear Information System (INIS)

Freedman, L.S.

1987-01-01

Over the past 20 years several new methods of generating images of internal organs and the anatomy of the body have been developed and used to enhance the accuracy of diagnosis and treatment. These include ultrasonic scanning, radioisotope scanning, computerised X-ray tomography (CT) and magnetic resonance imaging (MRI). The new techniques have made a considerable impact on radiological practice in hospital departments, not least on the investigational process for patients suspected or known to have malignant disease. As a consequence of the increased range of imaging techniques now available, there has developed a need to evaluate and compare their usefulness. Over the past 10 years formal studies of the application of imaging technology have been conducted and many reports have appeared in the literature. These studies cover a range of clinical situations. Likewise, the methodologies employed for evaluating and comparing the techniques in question have differed widely. While not attempting an exhaustive review of the clinical studies which have been reported, this paper aims to examine the statistical designs and analyses which have been used. First a brief review of the different types of study is given. Examples of each type are then chosen to illustrate statistical issues related to their design and analysis. In the final sections it is argued that a form of classification for these different types of study might be helpful in clarifying relationships between them and bringing a perspective to the field. A classification based upon a limited analogy with clinical trials is suggested
Binary Classification Method of Social Network Users

Directory of Open Access Journals (Sweden)

I. A. Poryadin

2017-01-01

Full Text Available The subject of research is a binary classification method of social network users based on the data analysis they have placed. Relevance of the task to gain information about a person by examining the content of his/her pages in social networks is exemplified. The most common approach to its solution is a visual browsing. The order of the regional authority in our country illustrates that its using in school education is needed. The article shows restrictions on the visual browsing of pupil’s pages in social networks as a tool for the teacher and the school psychologist and justifies that a process of social network users’ data analysis should be automated. Explores publications, which describe such data acquisition, processing, and analysis methods and considers their advantages and disadvantages. The article also gives arguments to support a proposal to study the classification method of social network users. One such method is credit scoring, which is used in banks and credit institutions to assess the solvency of clients. Based on the high efficiency of the method there is a proposal for significant expansion of its using in other areas of society. The possibility to use logistic regression as the mathematical apparatus of the proposed method of binary classification has been justified. Such an approach enables taking into account the different types of data extracted from social networks. Among them: the personal user data, information about hobbies, friends, graphic and text information, behaviour characteristics. The article describes a number of existing methods of data transformation that can be applied to solve the problem. An experiment of binary gender-based classification of social network users is described. A logistic model obtained for this example includes multiple logical variables obtained by transforming the user surnames. This experiment confirms the feasibility of the proposed method. Further work is to define a system
Application of pedagogy reflective in statistical methods course and practicum statistical methods

Science.gov (United States)

Julie, Hongki

2017-08-01

Subject Elementary Statistics, Statistical Methods and Statistical Methods Practicum aimed to equip students of Mathematics Education about descriptive statistics and inferential statistics. The students' understanding about descriptive and inferential statistics were important for students on Mathematics Education Department, especially for those who took the final task associated with quantitative research. In quantitative research, students were required to be able to present and describe the quantitative data in an appropriate manner, to make conclusions from their quantitative data, and to create relationships between independent and dependent variables were defined in their research. In fact, when students made their final project associated with quantitative research, it was not been rare still met the students making mistakes in the steps of making conclusions and error in choosing the hypothetical testing process. As a result, they got incorrect conclusions. This is a very fatal mistake for those who did the quantitative research. There were some things gained from the implementation of reflective pedagogy on teaching learning process in Statistical Methods and Statistical Methods Practicum courses, namely: 1. Twenty two students passed in this course and and one student did not pass in this course. 2. The value of the most accomplished student was A that was achieved by 18 students. 3. According all students, their critical stance could be developed by them, and they could build a caring for each other through a learning process in this course. 4. All students agreed that through a learning process that they undergo in the course, they can build a caring for each other.
Data preprocessing methods of FT-NIR spectral data for the classification cooking oil

Science.gov (United States)

Ruah, Mas Ezatul Nadia Mohd; Rasaruddin, Nor Fazila; Fong, Sim Siong; Jaafar, Mohd Zuli

2014-12-01

This recent work describes the data pre-processing method of FT-NIR spectroscopy datasets of cooking oil and its quality parameters with chemometrics method. Pre-processing of near-infrared (NIR) spectral data has become an integral part of chemometrics modelling. Hence, this work is dedicated to investigate the utility and effectiveness of pre-processing algorithms namely row scaling, column scaling and single scaling process with Standard Normal Variate (SNV). The combinations of these scaling methods have impact on exploratory analysis and classification via Principle Component Analysis plot (PCA). The samples were divided into palm oil and non-palm cooking oil. The classification model was build using FT-NIR cooking oil spectra datasets in absorbance mode at the range of 4000cm-1-14000cm-1. Savitzky Golay derivative was applied before developing the classification model. Then, the data was separated into two sets which were training set and test set by using Duplex method. The number of each class was kept equal to 2/3 of the class that has the minimum number of sample. Then, the sample was employed t-statistic as variable selection method in order to select which variable is significant towards the classification models. The evaluation of data pre-processing were looking at value of modified silhouette width (mSW), PCA and also Percentage Correctly Classified (%CC). The results show that different data processing strategies resulting to substantial amount of model performances quality. The effects of several data pre-processing i.e. row scaling, column standardisation and single scaling process with Standard Normal Variate indicated by mSW and %CC. At two PCs model, all five classifier gave high %CC except Quadratic Distance Analysis.
Segmentation and classification of biological objects

DEFF Research Database (Denmark)

Schultz, Nette

1995-01-01

The present thesis is on segmentation and classification of biological objects using statistical methods. It is based on case studies dealing with different kinds of pork meat images, and we introduce appropriate statistical methods to solve the tasks in the case studies. The case studies concern...
Classification of Noisy Data: An Approach Based on Genetic Algorithms and Voronoi Tessellation

DEFF Research Database (Denmark)

Khan, Abdul Rauf; Schiøler, Henrik; Knudsen, Torben

Classification is one of the major constituents of the data-mining toolkit. The well-known methods for classification are built on either the principle of logic or statistical/mathematical reasoning for classification. In this article we propose: (1) a different strategy, which is based on the po......Classification is one of the major constituents of the data-mining toolkit. The well-known methods for classification are built on either the principle of logic or statistical/mathematical reasoning for classification. In this article we propose: (1) a different strategy, which is based...

Statistical classification of road pavements using near field vehicle rolling noise measurements.

Science.gov (United States)

Paulo, Joel Preto; Coelho, J L Bento; Figueiredo, Mário A T

2010-10-01

Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.
A Spectral-Texture Kernel-Based Classification Method for Hyperspectral Images

Directory of Open Access Journals (Sweden)

Yi Wang

2016-11-01

Full Text Available Classification of hyperspectral images always suffers from high dimensionality and very limited labeled samples. Recently, the spectral-spatial classification has attracted considerable attention and can achieve higher classification accuracy and smoother classification maps. In this paper, a novel spectral-spatial classification method for hyperspectral images by using kernel methods is investigated. For a given hyperspectral image, the principle component analysis (PCA transform is first performed. Then, the first principle component of the input image is segmented into non-overlapping homogeneous regions by using the entropy rate superpixel (ERS algorithm. Next, the local spectral histogram model is applied to each homogeneous region to obtain the corresponding texture features. Because this step is performed within each homogenous region, instead of within a fixed-size image window, the obtained local texture features in the image are more accurate, which can effectively benefit the improvement of classification accuracy. In the following step, a contextual spectral-texture kernel is constructed by combining spectral information in the image and the extracted texture information using the linearity property of the kernel methods. Finally, the classification map is achieved by the support vector machines (SVM classifier using the proposed spectral-texture kernel. Experiments on two benchmark airborne hyperspectral datasets demonstrate that our method can effectively improve classification accuracies, even though only a very limited training sample is available. Specifically, our method can achieve from 8.26% to 15.1% higher in terms of overall accuracy than the traditional SVM classifier. The performance of our method was further compared to several state-of-the-art classification methods of hyperspectral images using objective quantitative measures and a visual qualitative evaluation.
SEGMENTATION AND CLASSIFICATION OF CERVICAL CYTOLOGY IMAGES USING MORPHOLOGICAL AND STATISTICAL OPERATIONS

Directory of Open Access Journals (Sweden)

S Anantha Sivaprakasam

2017-02-01

Full Text Available Cervical cancer that is a disease, in which malignant (cancer cells form in the tissues of the cervix, is one of the fourth leading causes of cancer death in female community worldwide. The cervical cancer can be prevented and/or cured if it is diagnosed in the pre-cancerous lesion stage or earlier. A common physical examination technique widely used in the screening is called Papanicolaou test or Pap test which is used to detect the abnormality of the cell. Due to intricacy of the cell nature, automating of this procedure is still a herculean task for the pathologist. This paper addresses solution for the challenges in terms of a simple and novel method to segment and classify the cervical cell automatically. The primary step of this procedure is pre-processing in which de-nosing, de-correlation operation and segregation of colour components are carried out, Then, two new techniques called Morphological and Statistical Edge based segmentation and Morphological and Statistical Region Based segmentation Techniques- put forward in this paper, and that are applied on the each component of image to segment the nuclei from cervical image. Finally, all segmented colour components are combined together to make a final segmentation result. After extracting the nuclei, the morphological features are extracted from the nuclei. The performance of two techniques mentioned above outperformed than standard segmentation techniques. Besides, Morphological and Statistical Edge based segmentation is outperformed than Morphological and Statistical Region based Segmentation. Finally, the nuclei are classified based on the morphological value The segmentation accuracy is echoed in classification accuracy. The overall segmentation accuracy is 97%.
THE GROWTH POINTS OF STATISTICAL METHODS

OpenAIRE

Orlov A. I.

2014-01-01

On the basis of a new paradigm of applied mathematical statistics, data analysis and economic-mathematical methods are identified; we have also discussed five topical areas in which modern applied statistics is developing as well as the other statistical methods, i.e. five "growth points" – nonparametric statistics, robustness, computer-statistical methods, statistics of interval data, statistics of non-numeric data
Global Optimization Ensemble Model for Classification Methods

Science.gov (United States)

Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab

2014-01-01

Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382
Global Optimization Ensemble Model for Classification Methods

Directory of Open Access Journals (Sweden)

Hina Anwar

2014-01-01

Full Text Available Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity.
Video genre classification using multimodal features

Science.gov (United States)

Jin, Sung Ho; Bae, Tae Meon; Choo, Jin Ho; Ro, Yong Man

2003-12-01

We propose a video genre classification method using multimodal features. The proposed method is applied for the preprocessing of automatic video summarization or the retrieval and classification of broadcasting video contents. Through a statistical analysis of low-level and middle-level audio-visual features in video, the proposed method can achieve good performance in classifying several broadcasting genres such as cartoon, drama, music video, news, and sports. In this paper, we adopt MPEG-7 audio-visual descriptors as multimodal features of video contents and evaluate the performance of the classification by feeding the features into a decision tree-based classifier which is trained by CART. The experimental results show that the proposed method can recognize several broadcasting video genres with a high accuracy and the classification performance with multimodal features is superior to the one with unimodal features in the genre classification.
Web Page Classification Method Using Neural Networks

Science.gov (United States)

Selamat, Ali; Omatu, Sigeru; Yanagimoto, Hidekazu; Fujinaka, Toru; Yoshioka, Michifumi

Automatic categorization is the only viable method to deal with the scaling problem of the World Wide Web (WWW). In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features (CPBF). Each news web page is represented by the term-weighting scheme. As the number of unique words in the collection set is big, the principal component analysis (PCA) has been used to select the most relevant features for the classification. Then the final output of the PCA is combined with the feature vectors from the class-profile which contains the most regular words in each class before feeding them to the neural networks. We have manually selected the most regular words that exist in each class and weighted them using an entropy weighting scheme. The fixed number of regular words from each class will be used as a feature vectors together with the reduced principal components from the PCA. These feature vectors are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the WPCM method provides acceptable classification accuracy with the sports news datasets.
Statistical methods and materials characterisation

International Nuclear Information System (INIS)

Wallin, K.R.W.

2010-01-01

Statistics is a wide mathematical area, which covers a myriad of analysis and estimation options, some of which suit special cases better than others. A comprehensive coverage of the whole area of statistics would be an enormous effort and would also be outside the capabilities of this author. Therefore, this does not intend to be a textbook on statistical methods available for general data analysis and decision making. Instead it will highlight a certain special statistical case applicable to mechanical materials characterization. The methods presented here do not in any way rule out other statistical methods by which to analyze mechanical property material data. (orig.)
75 FR 56549 - National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards...

Science.gov (United States)

2010-09-16

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Public Health Data Standards Staff, NCHS, 3311 Toledo Road, Room 2337, Hyattsville, Maryland 20782, e...
Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA – A statistical learning approach

Directory of Open Access Journals (Sweden)

R. Jegadeeshwaran

2015-03-01

Full Text Available In automobile, brake system is an essential part responsible for control of the vehicle. Any failure in the brake system impacts the vehicle's motion. It will generate frequent catastrophic effects on the vehicle cum passenger's safety. Thus the brake system plays a vital role in an automobile and hence condition monitoring of the brake system is essential. Vibration based condition monitoring using machine learning techniques are gaining momentum. This study is one such attempt to perform the condition monitoring of a hydraulic brake system through vibration analysis. In this research, the performance of a Clonal Selection Classification Algorithm (CSCA for brake fault diagnosis has been reported. A hydraulic brake system test rig was fabricated. Under good and faulty conditions of a brake system, the vibration signals were acquired using a piezoelectric transducer. The statistical parameters were extracted from the vibration signal. The best feature set was identified for classification using attribute evaluator. The selected features were then classified using CSCA. The classification accuracy of such artificial intelligence technique has been compared with other machine learning approaches and discussed. The Clonal Selection Classification Algorithm performs better and gives the maximum classification accuracy (96% for the fault diagnosis of a hydraulic brake system.
Permutation statistical methods an integrated approach

CERN Document Server

Berry, Kenneth J; Johnston, Janis E

2016-01-01

This research monograph provides a synthesis of a number of statistical tests and measures, which, at first consideration, appear disjoint and unrelated. Numerous comparisons of permutation and classical statistical methods are presented, and the two methods are compared via probability values and, where appropriate, measures of effect size. Permutation statistical methods, compared to classical statistical methods, do not rely on theoretical distributions, avoid the usual assumptions of normality and homogeneity of variance, and depend only on the data at hand. This text takes a unique approach to explaining statistics by integrating a large variety of statistical methods, and establishing the rigor of a topic that to many may seem to be a nascent field in statistics. This topic is new in that it took modern computing power to make permutation methods available to people working in the mainstream of research. This research monograph addresses a statistically-informed audience, and can also easily serve as a ...
ACCUWIND - Methods for classification of cup anemometers

DEFF Research Database (Denmark)

Dahlberg, J.-Å.; Friis Pedersen, Troels; Busche, P.

2006-01-01

the errors associated with the use of cup anemometers, and to develop a classification system for quantification of systematic errors of cup anemometers. This classification system has now been implementedin the IEC 61400-12-1 standard on power performance measurements in annex I and J. The classification...... of cup anemometers requires general external climatic operational ranges to be applied for the analysis of systematic errors. A Class A categoryclassification is connected to reasonably flat sites, and another Class B category is connected to complex terrain, General classification indices are the result...... developed in the CLASSCUP projectand earlier. A number of approaches including the use of two cup anemometer models, two methods of torque coefficient measurement, two angular response measurements, and inclusion and exclusion of influence of friction have been implemented in theclassification process...
An Efficient Ensemble Learning Method for Gene Microarray Classification

Directory of Open Access Journals (Sweden)

Alireza Osareh

2013-01-01

Full Text Available The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.
Statistical-mechanics analysis of Gaussian labeled-unlabeled classification problems

International Nuclear Information System (INIS)

Tanaka, Toshiyuki

2013-01-01

The labeled-unlabeled classification problem in semi-supervised learning is studied via statistical-mechanics approach. We analytically investigate performance of a learner with an equal-weight mixture of two symmetrically-located Gaussians, performing posterior mean estimation of the parameter vector on the basis of a dataset consisting of labeled and unlabeled data generated from the same probability model as that assumed by the learner. Under the assumption of replica symmetry, we have analytically obtained a set of saddle-point equations, which allows us to numerically evaluate performance of the learner. On the basis of the analytical result we have observed interesting phenomena, in particular the coexistence of good and bad solutions, which may happen when the number of unlabeled data is relatively large compared with that of labeled data
A classification scheme for risk assessment methods.

Energy Technology Data Exchange (ETDEWEB)

Stamp, Jason Edwin; Campbell, Philip LaRoche

2004-08-01

This report presents a classification scheme for risk assessment methods. This scheme, like all classification schemes, provides meaning by imposing a structure that identifies relationships. Our scheme is based on two orthogonal aspects--level of detail, and approach. The resulting structure is shown in Table 1 and is explained in the body of the report. Each cell in the Table represent a different arrangement of strengths and weaknesses. Those arrangements shift gradually as one moves through the table, each cell optimal for a particular situation. The intention of this report is to enable informed use of the methods so that a method chosen is optimal for a situation given. This report imposes structure on the set of risk assessment methods in order to reveal their relationships and thus optimize their usage.We present a two-dimensional structure in the form of a matrix, using three abstraction levels for the rows and three approaches for the columns. For each of the nine cells in the matrix we identify the method type by name and example. The matrix helps the user understand: (1) what to expect from a given method, (2) how it relates to other methods, and (3) how best to use it. Each cell in the matrix represent a different arrangement of strengths and weaknesses. Those arrangements shift gradually as one moves through the table, each cell optimal for a particular situation. The intention of this report is to enable informed use of the methods so that a method chosen is optimal for a situation given. The matrix, with type names in the cells, is introduced in Table 2 on page 13 below. Unless otherwise stated we use the word 'method' in this report to refer to a 'risk assessment method', though often times we use the full phrase. The use of the terms 'risk assessment' and 'risk management' are close enough that we do not attempt to distinguish them in this report. The remainder of this report is organized as follows. In
Sparse Classification - Methods & Applications

DEFF Research Database (Denmark)

Einarsson, Gudmundur

for analysing such data carry the potential to revolutionize tasks such as medical diagnostics where often decisions need to be based on only a few high-dimensional observations. This explosion in data dimensionality has sparked the development of novel statistical methods. In contrast, classical statistics...
How to Move Beyond the Diagnostic and Statistical Manual of Mental Disorders/International Classification of Diseases.

Science.gov (United States)

Schildkrout, Barbara

2016-10-01

A new nosology for mental disorders is needed as a basis for effective scientific inquiry. Diagnostic and Statistical Manual of Mental Disorders and International Classification of Diseases diagnoses are not natural, biological categories, and these diagnostic systems do not address mental phenomena that exist on a spectrum. Advances in neuroscience offer the hope of breakthroughs for diagnosing and treating major mental illness in the future. At present, a neuroscience-based understanding of brain/behavior relationships can reshape clinical thinking. Neuroscience literacy allows psychiatrists to formulate biologically informed psychological theories, to follow neuroscientific literature pertinent to psychiatry, and to embark on a path toward neurologically informed clinical thinking that can help move the field away from Diagnostic and Statistical Manual of Mental Disorders and International Classification of Diseases conceptualizations. Psychiatrists are urged to work toward attaining neuroscience literacy to prepare for and contribute to the development of a new nosology.
Register-based statistics statistical methods for administrative data

CERN Document Server

Wallgren, Anders

2014-01-01

This book provides a comprehensive and up to date treatment of theory and practical implementation in Register-based statistics. It begins by defining the area, before explaining how to structure such systems, as well as detailing alternative approaches. It explains how to create statistical registers, how to implement quality assurance, and the use of IT systems for register-based statistics. Further to this, clear details are given about the practicalities of implementing such statistical methods, such as protection of privacy and the coordination and coherence of such an undertaking. Thi
Seismic texture classification. Final report

Energy Technology Data Exchange (ETDEWEB)

Vinther, R.

1997-12-31

The seismic texture classification method, is a seismic attribute that can both recognize the general reflectivity styles and locate variations from these. The seismic texture classification performs a statistic analysis for the seismic section (or volume) aiming at describing the reflectivity. Based on a set of reference reflectivities the seismic textures are classified. The result of the seismic texture classification is a display of seismic texture categories showing both the styles of reflectivity from the reference set and interpolations and extrapolations from these. The display is interpreted as statistical variations in the seismic data. The seismic texture classification is applied to seismic sections and volumes from the Danish North Sea representing both horizontal stratifications and salt diapers. The attribute succeeded in recognizing both general structure of successions and variations from these. Also, the seismic texture classification is not only able to display variations in prospective areas (1-7 sec. TWT) but can also be applied to deep seismic sections. The seismic texture classification is tested on a deep reflection seismic section (13-18 sec. TWT) from the Baltic Sea. Applied to this section the seismic texture classification succeeded in locating the Moho, which could not be located using conventional interpretation tools. The seismic texture classification is a seismic attribute which can display general reflectivity styles and deviations from these and enhance variations not found by conventional interpretation tools. (LN)

[Classification of local anesthesia methods].

Science.gov (United States)

Petricas, A Zh; Medvedev, D V; Olkhovskaya, E B

The traditional classification methods of dental local anesthesia must be modified. In this paper we proved that the vascular mechanism is leading component of spongy injection. It is necessary to take into account the high effectiveness and relative safety of spongy anesthesia, as well as versatility, ease of implementation and the growing prevalence in the world. The essence of the proposed modification is to distinguish the methods in diffusive (including surface anesthesia, infiltration and conductive anesthesia) and vascular-diffusive (including intraosseous, intraligamentary, intraseptal and intrapulpal anesthesia). For the last four methods the common term «spongy (intraosseous) anesthesia» may be used.
Coding and classification in drug statistics – From national to global application

Directory of Open Access Journals (Sweden)

Marit Rønning

2009-11-01

Full Text Available SUMMARYThe Anatomical Therapeutic Chemical (ATC classification system and the defined daily dose (DDDwas developed in Norway in the early seventies. The creation of the ATC/DDD methodology was animportant basis for presenting drug utilisation statistics in a sensible way. Norway was in 1977 also thefirst country to publish national drug utilisation statistics from wholesalers on an annual basis. Thecombination of these activities in Norway in the seventies made us a pioneer country in the area of drugutilisation research. Over the years, the use of the ATC/DDD methodology has gradually increased incountries outside Norway. Since 1996, the methodology has been recommended by WHO for use ininternational drug utilisation studies. The WHO Collaborating Centre for Drug Statistics Methodologyin Oslo handles the maintenance and development of the ATC/DDD system. The Centre is now responsiblefor the global co-ordination. After nearly 30 years of experience with ATC/DDD, the methodologyhas demonstrated its suitability in drug use research. The main challenge in the coming years is toeducate the users worldwide in how to use the methodology properly.
Statistical methods for nuclear material management

International Nuclear Information System (INIS)

Bowen, W.M.; Bennett, C.A.

1988-12-01

This book is intended as a reference manual of statistical methodology for nuclear material management practitioners. It describes statistical methods currently or potentially important in nuclear material management, explains the choice of methods for specific applications, and provides examples of practical applications to nuclear material management problems. Together with the accompanying training manual, which contains fully worked out problems keyed to each chapter, this book can also be used as a textbook for courses in statistical methods for nuclear material management. It should provide increased understanding and guidance to help improve the application of statistical methods to nuclear material management problems
Statistical methods for nuclear material management

Energy Technology Data Exchange (ETDEWEB)

Bowen W.M.; Bennett, C.A. (eds.)

1988-12-01

This book is intended as a reference manual of statistical methodology for nuclear material management practitioners. It describes statistical methods currently or potentially important in nuclear material management, explains the choice of methods for specific applications, and provides examples of practical applications to nuclear material management problems. Together with the accompanying training manual, which contains fully worked out problems keyed to each chapter, this book can also be used as a textbook for courses in statistical methods for nuclear material management. It should provide increased understanding and guidance to help improve the application of statistical methods to nuclear material management problems.
PROGRESSIVE DENSIFICATION AND REGION GROWING METHODS FOR LIDAR DATA CLASSIFICATION

Directory of Open Access Journals (Sweden)

J. L. Pérez-García

2012-07-01

Full Text Available At present, airborne laser scanner systems are one of the most frequent methods used to obtain digital terrain elevation models. While having the advantage of direct measurement on the object, the point cloud obtained has the need for classification of their points according to its belonging to the ground. This need for classification of raw data has led to appearance of multiple filters focused LiDAR classification information. According this approach, this paper presents a classification method that combines LiDAR data segmentation techniques and progressive densification to carry out the location of the points belonging to the ground. The proposed methodology is tested on several datasets with different terrain characteristics and data availability. In all case, we analyze the advantages and disadvantages that have been obtained compared with the individual techniques application and, in a special way, the benefits derived from the integration of both classification techniques. In order to provide a more comprehensive quality control of the classification process, the obtained results have been compared with the derived from a manual procedure, which is used as reference classification. The results are also compared with other automatic classification methodologies included in some commercial software packages, highly contrasted by users for LiDAR data treatment.
Classification Using Markov Blanket for Feature Selection

DEFF Research Database (Denmark)

Zeng, Yifeng; Luo, Jian

2009-01-01

Selecting relevant features is in demand when a large data set is of interest in a classification task. It produces a tractable number of features that are sufficient and possibly improve the classification performance. This paper studies a statistical method of Markov blanket induction algorithm...... for filtering features and then applies a classifier using the Markov blanket predictors. The Markov blanket contains a minimal subset of relevant features that yields optimal classification performance. We experimentally demonstrate the improved performance of several classifiers using a Markov blanket...... induction as a feature selection method. In addition, we point out an important assumption behind the Markov blanket induction algorithm and show its effect on the classification performance....
a Hyperspectral Image Classification Method Using Isomap and Rvm

Science.gov (United States)

Chang, H.; Wang, T.; Fang, H.; Su, Y.

2018-04-01

Classification is one of the most significant applications of hyperspectral image processing and even remote sensing. Though various algorithms have been proposed to implement and improve this application, there are still drawbacks in traditional classification methods. Thus further investigations on some aspects, such as dimension reduction, data mining, and rational use of spatial information, should be developed. In this paper, we used a widely utilized global manifold learning approach, isometric feature mapping (ISOMAP), to address the intrinsic nonlinearities of hyperspectral image for dimension reduction. Considering the impropriety of Euclidean distance in spectral measurement, we applied spectral angle (SA) for substitute when constructed the neighbourhood graph. Then, relevance vector machines (RVM) was introduced to implement classification instead of support vector machines (SVM) for simplicity, generalization and sparsity. Therefore, a probability result could be obtained rather than a less convincing binary result. Moreover, taking into account the spatial information of the hyperspectral image, we employ a spatial vector formed by different classes' ratios around the pixel. At last, we combined the probability results and spatial factors with a criterion to decide the final classification result. To verify the proposed method, we have implemented multiple experiments with standard hyperspectral images compared with some other methods. The results and different evaluation indexes illustrated the effectiveness of our method.
Statistical methods in quality assurance

International Nuclear Information System (INIS)

Eckhard, W.

1980-01-01

During the different phases of a production process - planning, development and design, manufacturing, assembling, etc. - most of the decision rests on a base of statistics, the collection, analysis and interpretation of data. Statistical methods can be thought of as a kit of tools to help to solve problems in the quality functions of the quality loop with respect to produce quality products and to reduce quality costs. Various statistical methods are represented, typical examples for their practical application are demonstrated. (RW)
A simple method to combine multiple molecular biomarkers for dichotomous diagnostic classification

Directory of Open Access Journals (Sweden)

Amin Manik A

2006-10-01

Full Text Available Abstract Background In spite of the recognized diagnostic potential of biomarkers, the quest for squelching noise and wringing in information from a given set of biomarkers continues. Here, we suggest a statistical algorithm that – assuming each molecular biomarker to be a diagnostic test – enriches the diagnostic performance of an optimized set of independent biomarkers employing established statistical techniques. We validated the proposed algorithm using several simulation datasets in addition to four publicly available real datasets that compared i subjects having cancer with those without; ii subjects with two different cancers; iii subjects with two different types of one cancer; and iv subjects with same cancer resulting in differential time to metastasis. Results Our algorithm comprises of three steps: estimating the area under the receiver operating characteristic curve for each biomarker, identifying a subset of biomarkers using linear regression and combining the chosen biomarkers using linear discriminant function analysis. Combining these established statistical methods that are available in most statistical packages, we observed that the diagnostic accuracy of our approach was 100%, 99.94%, 96.67% and 93.92% for the real datasets used in the study. These estimates were comparable to or better than the ones previously reported using alternative methods. In a synthetic dataset, we also observed that all the biomarkers chosen by our algorithm were indeed truly differentially expressed. Conclusion The proposed algorithm can be used for accurate diagnosis in the setting of dichotomous classification of disease states.
Statistical methods for ranking data

CERN Document Server

Alvo, Mayer

2014-01-01

This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.
Statistical methods in nuclear theory

International Nuclear Information System (INIS)

Shubin, Yu.N.

1974-01-01

The paper outlines statistical methods which are widely used for describing properties of excited states of nuclei and nuclear reactions. It discusses physical assumptions lying at the basis of known distributions between levels (Wigner, Poisson distributions) and of widths of highly excited states (Porter-Thomas distribution, as well as assumptions used in the statistical theory of nuclear reactions and in the fluctuation analysis. The author considers the random matrix method, which consists in replacing the matrix elements of a residual interaction by random variables with a simple statistical distribution. Experimental data are compared with results of calculations using the statistical model. The superfluid nucleus model is considered with regard to superconducting-type pair correlations
Using machine learning, neural networks and statistics to predict bankruptcy

NARCIS (Netherlands)

Pompe, P.P.M.; Feelders, A.J.; Feelders, A.J.

1997-01-01

Recent literature strongly suggests that machine learning approaches to classification outperform "classical" statistical methods. We make a comparison between the performance of linear discriminant analysis, classification trees, and neural networks in predicting corporate bankruptcy. Linear
A New Method for Solving Supervised Data Classification Problems

Directory of Open Access Journals (Sweden)

Parvaneh Shabanzadeh

2014-01-01

Full Text Available Supervised data classification is one of the techniques used to extract nontrivial information from data. Classification is a widely used technique in various fields, including data mining, industry, medicine, science, and law. This paper considers a new algorithm for supervised data classification problems associated with the cluster analysis. The mathematical formulations for this algorithm are based on nonsmooth, nonconvex optimization. A new algorithm for solving this optimization problem is utilized. The new algorithm uses a derivative-free technique, with robustness and efficiency. To improve classification performance and efficiency in generating classification model, a new feature selection algorithm based on techniques of convex programming is suggested. Proposed methods are tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithms.
15th Conference of the International Federation of Classification Societies

CERN Document Server

Montanari, Angela; Vichi, Maurizio

2017-01-01

This edited volume on the latest advances in data science covers a wide range of topics in the context of data analysis and classification. In particular, it includes contributions on classification methods for high-dimensional data, clustering methods, multivariate statistical methods, and various applications. The book gathers a selection of peer-reviewed contributions presented at the Fifteenth Conference of the International Federation of Classification Societies (IFCS2015), which was hosted by the Alma Mater Studiorum, University of Bologna, from July 5 to 8, 2015.
Characteristics and application study of AP1000 NPPs equipment reliability classification method

International Nuclear Information System (INIS)

Guan Gao

2013-01-01

AP1000 nuclear power plant applies an integrated approach to establish equipment reliability classification, which includes probabilistic risk assessment technique, maintenance rule administrative, power production reliability classification and functional equipment group bounding method, and eventually classify equipment reliability into 4 levels. This classification process and result are very different from classical RCM and streamlined RCM. It studied the characteristic of AP1000 equipment reliability classification approach, considered that equipment reliability classification should effectively support maintenance strategy development and work process control, recommended to use a combined RCM method to establish the future equipment reliability program of AP1000 nuclear power plants. (authors)
A SEMI-AUTOMATIC RULE SET BUILDING METHOD FOR URBAN LAND COVER CLASSIFICATION BASED ON MACHINE LEARNING AND HUMAN KNOWLEDGE

Directory of Open Access Journals (Sweden)

H. Y. Gu

2017-09-01

Full Text Available Classification rule set is important for Land Cover classification, which refers to features and decision rules. The selection of features and decision are based on an iterative trial-and-error approach that is often utilized in GEOBIA, however, it is time-consuming and has a poor versatility. This study has put forward a rule set building method for Land cover classification based on human knowledge and machine learning. The use of machine learning is to build rule sets effectively which will overcome the iterative trial-and-error approach. The use of human knowledge is to solve the shortcomings of existing machine learning method on insufficient usage of prior knowledge, and improve the versatility of rule sets. A two-step workflow has been introduced, firstly, an initial rule is built based on Random Forest and CART decision tree. Secondly, the initial rule is analyzed and validated based on human knowledge, where we use statistical confidence interval to determine its threshold. The test site is located in Potsdam City. We utilised the TOP, DSM and ground truth data. The results show that the method could determine rule set for Land Cover classification semi-automatically, and there are static features for different land cover classes.
On the Evaluation of Outlier Detection and One-Class Classification Methods

DEFF Research Database (Denmark)

Swersky, Lorne; Marques, Henrique O.; Sander, Jörg

2016-01-01

It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem. In this paper, we focus on the comparison of oneclass classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies ...
Classification of bladder cancer cell lines using Raman spectroscopy: a comparison of excitation wavelength, sample substrate and statistical algorithms

Science.gov (United States)

Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.

2014-05-01

Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.
Statistical Methods in Integrative Genomics

Science.gov (United States)

Richardson, Sylvia; Tseng, George C.; Sun, Wei

2016-01-01

Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531
Mapping patent classifications: portfolio and statistical analysis, and the comparison of strengths and weaknesses.

Science.gov (United States)

Leydesdorff, Loet; Kogler, Dieter Franz; Yan, Bowen

2017-01-01

The Cooperative Patent Classifications (CPC) recently developed cooperatively by the European and US Patent Offices provide a new basis for mapping patents and portfolio analysis. CPC replaces International Patent Classifications (IPC) of the World Intellectual Property Organization. In this study, we update our routines previously based on IPC for CPC and use the occasion for rethinking various parameter choices. The new maps are significantly different from the previous ones, although this may not always be obvious on visual inspection. We provide nested maps online and a routine for generating portfolio overlays on the maps; a new tool is provided for "difference maps" between patent portfolios of organizations or firms. This is illustrated by comparing the portfolios of patents granted to two competing firms-Novartis and MSD-in 2016. Furthermore, the data is organized for the purpose of statistical analysis.

Land-Use and Land-Cover Mapping Using a Gradable Classification Method

Directory of Open Access Journals (Sweden)

Keigo Kitada

2012-05-01

Full Text Available Conventional spectral-based classification methods have significant limitations in the digital classification of urban land-use and land-cover classes from high-resolution remotely sensed data because of the lack of consideration given to the spatial properties of images. To recognize the complex distribution of urban features in high-resolution image data, texture information consisting of a group of pixels should be considered. Lacunarity is an index used to characterize different texture appearances. It is often reported that the land-use and land-cover in urban areas can be effectively classified using the lacunarity index with high-resolution images. However, the applicability of the maximum-likelihood approach for hybrid analysis has not been reported. A more effective approach that employs the original spectral data and lacunarity index can be expected to improve the accuracy of the classification. A new classification procedure referred to as “gradable classification method” is proposed in this study. This method improves the classification accuracy in incremental steps. The proposed classification approach integrates several classification maps created from original images and lacunarity maps, which consist of lacnarity values, to create a new classification map. The results of this study confirm the suitability of the gradable classification approach, which produced a higher overall accuracy (68% and kappa coefficient (0.64 than those (65% and 0.60, respectively obtained with the maximum-likelihood approach.
Methods of statistical physics

CERN Document Server

Akhiezer, Aleksandr I

1981-01-01

Methods of Statistical Physics is an exposition of the tools of statistical mechanics, which evaluates the kinetic equations of classical and quantized systems. The book also analyzes the equations of macroscopic physics, such as the equations of hydrodynamics for normal and superfluid liquids and macroscopic electrodynamics. The text gives particular attention to the study of quantum systems. This study begins with a discussion of problems of quantum statistics with a detailed description of the basics of quantum mechanics along with the theory of measurement. An analysis of the asymptotic be
Statistical Analysis of the labor Market in Ukraine Using Multidimensional Classification Methods: the Regional Aspect

Directory of Open Access Journals (Sweden)

Korepanov Oleksiy S.

2017-12-01

Full Text Available The aim of the article is to study the labor market in Ukraine in the regional context using cluster analysis methods. The current state of the labor market in regions of Ukraine is analyzed, and a system of statistical indicators that influence the state and development of this market is formed. The expediency of using cluster analysis for grouping regions according to the level of development of the labor market is substantiated. The essence of cluster analysis is revealed, its main goal, key tasks, which can be solved by means of such analysis, are presented, basic stages of the analysis are considered. The main methods of clustering are described and, based on the results of the simulation, the advantages and disadvantages of each method are justified. In the work the clustering of regions of Ukraine by the level of labor market development using different methods of cluster analysis is carried out, conclusions on the results of the calculations performed are presented, and the main directions for further research are outlined.
Volunteer-Based System for classification of traffic in computer networks

DEFF Research Database (Denmark)

Bujlow, Tomasz; Balachandran, Kartheepan; Riaz, M. Tahir

2011-01-01

To overcome the drawbacks of existing methods for traffic classification (by ports, Deep Packet Inspection, statistical classification) a new system was developed, in which the data are collected from client machines. This paper presents design of the system, implementation, initial runs and obta...
NIM: A Node Influence Based Method for Cancer Classification

Directory of Open Access Journals (Sweden)

Yiwen Wang

2014-01-01

Full Text Available The classification of different cancer types owns great significance in the medical field. However, the great majority of existing cancer classification methods are clinical-based and have relatively weak diagnostic ability. With the rapid development of gene expression technology, it is able to classify different kinds of cancers using DNA microarray. Our main idea is to confront the problem of cancer classification using gene expression data from a graph-based view. Based on a new node influence model we proposed, this paper presents a novel high accuracy method for cancer classification, which is composed of four parts: the first is to calculate the similarity matrix of all samples, the second is to compute the node influence of training samples, the third is to obtain the similarity between every test sample and each class using weighted sum of node influence and similarity matrix, and the last is to classify each test sample based on its similarity between every class. The data sets used in our experiments are breast cancer, central nervous system, colon tumor, prostate cancer, acute lymphoblastic leukemia, and lung cancer. experimental results showed that our node influence based method (NIM is more efficient and robust than the support vector machine, K-nearest neighbor, C4.5, naive Bayes, and CART.
Multivariate statistical methods a first course

CERN Document Server

Marcoulides, George A

2014-01-01

Multivariate statistics refer to an assortment of statistical methods that have been developed to handle situations in which multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis. An introductory text for students learning multivariate statistical methods for the first time, this book keeps mathematical details to a minimum while conveying the basic principles. One of the principal strategies used throughout the book--in addition to the presentation of actual data analyses--is poin
Statistical methods for physical science

CERN Document Server

Stanford, John L

1994-01-01

This volume of Methods of Experimental Physics provides an extensive introduction to probability and statistics in many areas of the physical sciences, with an emphasis on the emerging area of spatial statistics. The scope of topics covered is wide-ranging-the text discusses a variety of the most commonly used classical methods and addresses newer methods that are applicable or potentially important. The chapter authors motivate readers with their insightful discussions, augmenting their material withKey Features* Examines basic probability, including coverage of standard distributions, time s
Application of FT-IR Classification Method in Silica-Plant Extracts Composites Quality Testing

Science.gov (United States)

Bicu, A.; Drumea, V.; Mihaiescu, D. E.; Purcareanu, B.; Florea, M. A.; Trică, B.; Vasilievici, G.; Draga, S.; Buse, E.; Olariu, L.

2018-06-01

Our present work is concerned with the validation and quality testing efforts of mesoporous silica - plant extracts composites, in order to sustain the standardization process of plant-based pharmaceutical products. The synthesis of the silica support were performed by using a TEOS based synthetic route and CTAB as a template, at room temperature and normal pressure. The silica support was analyzed by advanced characterization methods (SEM, TEM, BET, DLS and FT-IR), and loaded with Calendula officinalis and Salvia officinalis standardized extracts. Further desorption studies were performed in order to prove the sustained release properties of the final materials. Intermediate and final product identification was performed by a FT-IR classification method, using the MID-range of the IR spectra, and statistical representative samples from repetitive synthetic stages. The obtained results recommend this analytical method as a fast and cost effective alternative to the classic identification methods.
Multi-element neutron activation analysis and solution of classification problems using multidimensional statistics

International Nuclear Information System (INIS)

Vaganov, P.A.; Kol'tsov, A.A.; Kulikov, V.D.; Mejer, V.A.

1983-01-01

The multi-element instrumental neutron activation analysis of samples of mountain rocks (sandstones, aleurolites and shales of one of gold deposits) is performed. The spectra of irradiated samples are measured by Ge(Li) detector of the volume of 35 mm 3 . The content of 22 chemical elements is determined in each sample. The results of analysis serve as reliable basis for multi-dimensional statistic information processing, they constitute the basis for the generalized characteristics of rocks which brings about the solution of classification problem for rocks of different deposits
Classification of methods for measuring current-voltage characteristics of semiconductor devices

Directory of Open Access Journals (Sweden)

Iermolenko Ia. O.

2014-06-01

Full Text Available It is shown that computer systems for measuring current-voltage characteristics are very important for semiconductor devices production. The main criteria of efficiency of such systems are defined. It is shown that efficiency of such systems significantly depends on the methods for measuring current-voltage characteristics of semiconductor devices. The aim of this work is to analyze existing methods for measuring current-voltage characteristics of semiconductor devices and to create the classification of these methods in order to specify the most effective solutions in terms of defined criteria. To achieve this aim, the most common classifications of methods for measuring current-voltage characteristics of semiconductor devices and their main disadvantages are considered. Automated and manual, continuous, pulse, mixed, isothermal and isodynamic methods for measuring current-voltage characteristics are analyzed. As a result of the analysis and generalization of existing methods the next classification criteria are defined: the level of automation, the form of measurement signals, the condition of semiconductor device during the measurements, and the use of mathematical processing of the measurement results. With the use of these criteria the classification scheme of methods for measuring current-voltage characteristics of semiconductor devices is composed and the most effective methods are specified.
Statistical Methods for Environmental Pollution Monitoring

Energy Technology Data Exchange (ETDEWEB)

Gilbert, Richard O. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

1987-01-01

The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Some statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.
Classification of Children Intelligence with Fuzzy Logic Method

Science.gov (United States)

Syahminan; ika Hidayati, Permata

2018-04-01

Intelligence of children s An Important Thing To Know The Parents Early on. Typing Can be done With a Child’s intelligence Grouping Dominant Characteristics Of each Type of Intelligence. To Make it easier for Parents in Determining The type of Children’s intelligence And How to Overcome them, for It Created A Classification System Intelligence Grouping Children By Using Fuzzy logic method For determination Of a Child’s degree of intelligence type. From the analysis We concluded that The presence of Intelligence Classification systems Pendulum Children With Fuzzy Logic Method Of determining The type of The Child’s intelligence Can be Done in a way That is easier And The results More accurate Conclusions Than Manual tests.
Robust statistical methods with R

CERN Document Server

Jureckova, Jana

2005-01-01

Robust statistical methods were developed to supplement the classical procedures when the data violate classical assumptions. They are ideally suited to applied research across a broad spectrum of study, yet most books on the subject are narrowly focused, overly theoretical, or simply outdated. Robust Statistical Methods with R provides a systematic treatment of robust procedures with an emphasis on practical application.The authors work from underlying mathematical tools to implementation, paying special attention to the computational aspects. They cover the whole range of robust methods, including differentiable statistical functions, distance of measures, influence functions, and asymptotic distributions, in a rigorous yet approachable manner. Highlighting hands-on problem solving, many examples and computational algorithms using the R software supplement the discussion. The book examines the characteristics of robustness, estimators of real parameter, large sample properties, and goodness-of-fit tests. It...
A method to incorporate uncertainty in the classification of remote sensing images

OpenAIRE

Gonçalves, Luísa M. S.; Fonte, Cidália C.; Júlio, Eduardo N. B. S.; Caetano, Mario

2009-01-01

The aim of this paper is to investigate if the incorporation of the uncertainty associated with the classification of surface elements into the classification of landscape units (LUs) increases the results accuracy. To this end, a hybrid classification method is developed, including uncertainty information in the classification of very high spatial resolution multi-spectral satellite images, to obtain a map of LUs. The developed classification methodology includes the following...
Classification of methods for annual energy harvesting calculations of photovoltaic generators

International Nuclear Information System (INIS)

Rus-Casas, C.; Aguilar, J.D.; Rodrigo, P.; Almonacid, F.; Pérez-Higueras, P.J.

2014-01-01

Highlights: • The paper presents a novel classification of methods for annual energy harvesting calculation of grid-connected PV systems. • The methods are classified in direct and indirect methods. • Direct methods directly calculate the energy. Indirect methods calculate the energy from the power. • The classification can help the PV professionals in order to choose the most suitable method for each application. - Abstract: Estimating the energy provided by the generators of grid-connected photovoltaic systems is important in order to analyze their economic viability and supervise their operation. The energy harvesting calculation of a photovoltaic generator is not trivial; there are a lot of methods for this calculation. The aim of this paper is to develop a novel classification of methods for annual energy harvesting calculation of a generator of a grid-connected photovoltaic system. The methods are classified in two groups: (1) those that indirectly calculate the energy, i.e. they first calculate the power and from this, they calculate the energy, and (2) those that directly calculate the energy. Furthermore, the indirect methods are grouped in two categories: those that first calculate the I–V curve of the generator and from this, they calculate the power, and those that directly calculate the power. The study has shown that the existing methods differ in simplicity and accuracy, so that the proposed classification is useful in order to choose the most suitable method for each specific application
Hydrologic landscape regionalisation using deductive classification and random forests.

Directory of Open Access Journals (Sweden)

Stuart C Brown

Full Text Available Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic
Classification of Malaysia aromatic rice using multivariate statistical analysis

Energy Technology Data Exchange (ETDEWEB)

Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A. [School of Mechatronic Engineering, Universiti Malaysia Perlis, Kampus Pauh Putra, 02600 Arau, Perlis (Malaysia); Omar, O. [Malaysian Agriculture Research and Development Institute (MARDI), Persiaran MARDI-UPM, 43400 Serdang, Selangor (Malaysia)

2015-05-15

Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Classification of Malaysia aromatic rice using multivariate statistical analysis

Science.gov (United States)

Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.

2015-05-01

Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC-MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties.
Classification of Malaysia aromatic rice using multivariate statistical analysis

International Nuclear Information System (INIS)

Abdullah, A. H.; Adom, A. H.; Shakaff, A. Y. Md; Masnan, M. J.; Zakaria, A.; Rahim, N. A.; Omar, O.

2015-01-01

Aromatic rice (Oryza sativa L.) is considered as the best quality premium rice. The varieties are preferred by consumers because of its preference criteria such as shape, colour, distinctive aroma and flavour. The price of aromatic rice is higher than ordinary rice due to its special needed growth condition for instance specific climate and soil. Presently, the aromatic rice quality is identified by using its key elements and isotopic variables. The rice can also be classified via Gas Chromatography Mass Spectrometry (GC-MS) or human sensory panels. However, the uses of human sensory panels have significant drawbacks such as lengthy training time, and prone to fatigue as the number of sample increased and inconsistent. The GC–MS analysis techniques on the other hand, require detailed procedures, lengthy analysis and quite costly. This paper presents the application of in-house developed Electronic Nose (e-nose) to classify new aromatic rice varieties. The e-nose is used to classify the variety of aromatic rice based on the samples odour. The samples were taken from the variety of rice. The instrument utilizes multivariate statistical data analysis, including Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and K-Nearest Neighbours (KNN) to classify the unknown rice samples. The Leave-One-Out (LOO) validation approach is applied to evaluate the ability of KNN to perform recognition and classification of the unspecified samples. The visual observation of the PCA and LDA plots of the rice proves that the instrument was able to separate the samples into different clusters accordingly. The results of LDA and KNN with low misclassification error support the above findings and we may conclude that the e-nose is successfully applied to the classification of the aromatic rice varieties
Workshop on Analytical Methods in Statistics

CERN Document Server

Jurečková, Jana; Maciak, Matúš; Pešta, Michal

2017-01-01

This volume collects authoritative contributions on analytical methods and mathematical statistics. The methods presented include resampling techniques; the minimization of divergence; estimation theory and regression, eventually under shape or other constraints or long memory; and iterative approximations when the optimal solution is difficult to achieve. It also investigates probability distributions with respect to their stability, heavy-tailness, Fisher information and other aspects, both asymptotically and non-asymptotically. The book not only presents the latest mathematical and statistical methods and their extensions, but also offers solutions to real-world problems including option pricing. The selected, peer-reviewed contributions were originally presented at the workshop on Analytical Methods in Statistics, AMISTAT 2015, held in Prague, Czech Republic, November 10-13, 2015.

Flash Flood Hazard Susceptibility Mapping Using Frequency Ratio and Statistical Index Methods in Coalmine Subsidence Areas

Directory of Open Access Journals (Sweden)

Chen Cao

2016-09-01

Full Text Available This study focused on producing flash flood hazard susceptibility maps (FFHSM using frequency ratio (FR and statistical index (SI models in the Xiqu Gully (XQG of Beijing, China. First, a total of 85 flash flood hazard locations (n = 85 were surveyed in the field and plotted using geographic information system (GIS software. Based on the flash flood hazard locations, a flood hazard inventory map was built. Seventy percent (n = 60 of the flooding hazard locations were randomly selected for building the models. The remaining 30% (n = 25 of the flooded hazard locations were used for validation. Considering that the XQG used to be a coal mining area, coalmine caves and subsidence caused by coal mining exist in this catchment, as well as many ground fissures. Thus, this study took the subsidence risk level into consideration for FFHSM. The ten conditioning parameters were elevation, slope, curvature, land use, geology, soil texture, subsidence risk area, stream power index (SPI, topographic wetness index (TWI, and short-term heavy rain. This study also tested different classification schemes for the values for each conditional parameter and checked their impacts on the results. The accuracy of the FFHSM was validated using area under the curve (AUC analysis. Classification accuracies were 86.61%, 83.35%, and 78.52% using frequency ratio (FR-natural breaks, statistical index (SI-natural breaks and FR-manual classification schemes, respectively. Associated prediction accuracies were 83.69%, 81.22%, and 74.23%, respectively. It was found that FR modeling using a natural breaks classification method was more appropriate for generating FFHSM for the Xiqu Gully.
METHODOLOGICAL PRINCIPLES AND METHODS OF TERMS OF TRADE STATISTICAL EVALUATION

Directory of Open Access Journals (Sweden)

N. Kovtun

2014-09-01

Full Text Available The paper studies the methodological principles and guidance of the statistical evaluation of terms of trade for the United Nations classification model – Harmonized Commodity Description and Coding System (HS. The practical implementation of the proposed three-stage model of index analysis and estimation of terms of trade for Ukraine's commodity-members for the period of 2011-2012 are realized.
Classification of forest development stages from national low-density lidar datasets: a comparison of machine learning methods

Directory of Open Access Journals (Sweden)

R. Valbuena

2016-02-01

Full Text Available The area-based method has become a widespread approach in airborne laser scanning (ALS, being mainly employed for the estimation of continuous variables describing forest attributes: biomass, volume, density, etc. However, to date, classification methods based on machine learning, which are fairly common in other remote sensing fields, such as land use / land cover classification using multispectral sensors, have been largely overseen in forestry applications of ALS. In this article, we wish to draw the attention on statistical methods predicting discrete responses, for supervised classification of ALS datasets. A wide spectrum of approaches are reviewed: discriminant analysis (DA using various classifiers –maximum likelihood, minimum volume ellipsoid, naïve Bayes–, support vector machine (SVM, artificial neural networks (ANN, random forest (RF and nearest neighbour (NN methods. They are compared in the context of a classification of forest areas into development classes (DC used in practical silvicultural management in Finland, using their low-density national ALS dataset. We observed that RF and NN had the most balanced error matrices, with cross-validated predictions which were mainly unbiased for all DCs. Although overall accuracies were higher for SVM and ANN, their results were very dissimilar across DCs, and they can therefore be only advantageous if certain DCs are targeted. DA methods underperformed in comparison to other alternatives, and were only advantageous for the detection of seedling stands. These results show that, besides the well demonstrated capacity of ALS for quantifying forest stocks, there is a great deal of potential for predicting categorical variables in general, and forest types in particular. In conclusion, we consider that the presented methodology shall also be adapted to the type of forest classes that can be relevant to Mediterranean ecosystems, opening a range of possibilities for future research, in which
Statistical methods in personality assessment research.

Science.gov (United States)

Schinka, J A; LaLone, L; Broeckel, J A

1997-06-01

Emerging models of personality structure and advances in the measurement of personality and psychopathology suggest that research in personality and personality assessment has entered a stage of advanced development, in this article we examine whether researchers in these areas have taken advantage of new and evolving statistical procedures. We conducted a review of articles published in the Journal of Personality, Assessment during the past 5 years. Of the 449 articles that included some form of data analysis, 12.7% used only descriptive statistics, most employed only univariate statistics, and fewer than 10% used multivariate methods of data analysis. We discuss the cost of using limited statistical methods, the possible reasons for the apparent reluctance to employ advanced statistical procedures, and potential solutions to this technical shortcoming.
Statistical Methods in Psychology Journals.

Science.gov (United States)

Willkinson, Leland

1999-01-01

Proposes guidelines for revising the American Psychological Association (APA) publication manual or other APA materials to clarify the application of statistics in research reports. The guidelines are intended to induce authors and editors to recognize the thoughtless application of statistical methods. Contains 54 references. (SLD)
MAXIMUM LIKELIHOOD CLASSIFICATION OF HIGH-RESOLUTION SAR IMAGES IN URBAN AREA

Directory of Open Access Journals (Sweden)

M. Soheili Majd

2012-09-01

Full Text Available In this work, we propose a state-of-the-art on statistical analysis of polarimetric synthetic aperture radar (SAR data, through the modeling of several indices. We concentrate on eight ground classes which have been carried out from amplitudes, co-polarisation ratio, depolarization ratios, and other polarimetric descriptors. To study their different statistical behaviours, we consider Gauss, log- normal, Beta I, Weibull, Gamma, and Fisher statistical models and estimate their parameters using three methods: method of moments (MoM, maximum-likelihood (ML methodology, and log-cumulants method (MoML. Then, we study the opportunity of introducing this information in an adapted supervised classification scheme based on Maximum–Likelihood and Fisher pdf. Our work relies on an image of a suburban area, acquired by the airborne RAMSES SAR sensor of ONERA. The results prove the potential of such data to discriminate urban surfaces and show the usefulness of adapting any classical classification algorithm however classification maps present a persistant class confusion between flat gravelled or concrete roofs and trees.
Statistical methods for quality improvement

National Research Council Canada - National Science Library

Ryan, Thomas P

2011-01-01

...."-TechnometricsThis new edition continues to provide the most current, proven statistical methods for quality control and quality improvementThe use of quantitative methods offers numerous benefits...
Statistical learning methods: Basics, control and performance

Energy Technology Data Exchange (ETDEWEB)

Zimmermann, J. [Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)]. E-mail: zimmerm@mppmu.mpg.de

2006-04-01

The basics of statistical learning are reviewed with a special emphasis on general principles and problems for all different types of learning methods. Different aspects of controlling these methods in a physically adequate way will be discussed. All principles and guidelines will be exercised on examples for statistical learning methods in high energy and astrophysics. These examples prove in addition that statistical learning methods very often lead to a remarkable performance gain compared to the competing classical algorithms.
Statistical learning methods: Basics, control and performance

International Nuclear Information System (INIS)

Zimmermann, J.

2006-01-01

The basics of statistical learning are reviewed with a special emphasis on general principles and problems for all different types of learning methods. Different aspects of controlling these methods in a physically adequate way will be discussed. All principles and guidelines will be exercised on examples for statistical learning methods in high energy and astrophysics. These examples prove in addition that statistical learning methods very often lead to a remarkable performance gain compared to the competing classical algorithms
Classification of astrocyto-mas and meningiomas using statistical discriminant analysis on MRI data

International Nuclear Information System (INIS)

Siromoney, Anna; Prasad, G.N.S.; Raghuram, Lakshminarayan; Korah, Ipeson; Siromoney, Arul; Chandrasekaran, R.

2001-01-01

The objective of this study was to investigate the usefulness of Multivariate Discriminant Analysis for classifying two groups of primary brain tumours, astrocytomas and meningiomas, from Magnetic Resonance Images. Discriminant analysis is a multivariate technique concerned with separating distinct sets of objects and with allocating new objects to previously defined groups. Allocation or classification rules are usually developed from learning examples in a supervised learning environment. Data from signal intensity measurements in the multiple scan performed on each patient in routine clinical scanning was analysed using Fisher's Classification, which is one method of discriminant analysis
Statistical methods in nonlinear dynamics

Indian Academy of Sciences (India)

Sensitivity to initial conditions in nonlinear dynamical systems leads to exponential divergence of trajectories that are initially arbitrarily close, and hence to unpredictability. Statistical methods have been found to be helpful in extracting useful information about such systems. In this paper, we review briefly some statistical ...
Hierarchically structured identification and classification method for vibrational monitoring of reactor components

International Nuclear Information System (INIS)

Saedtler, E.

1981-01-01

The dissertation discusses: 1. Approximative filter algorithms for identification of systems and hierarchical structures. 2. Adaptive statistical pattern recognition and classification. 3. Parameter selection, extraction, and modelling for an automatic control system. 4. Design of a decision tree and an adaptive diagnostic system. (orig./RW) [de
BASIC METHODS OF CLASSIFICATION AND CHARACTERISTICS OF METHODS OF PRICING IN UKRAINE

OpenAIRE

A. Boguslavskiy

2014-01-01

The article provided definitions and shows the need to use different methods of pricing of enterprises. Exposed the reasons of the absence of a universal classification of pricing methods. The approaches of different authors to classify groups of pricing methods: 1) the cost method; 2) methods with a focus on competition; 3) methods for pricing based on demand, 4) pricing with a focus on maximum profit, 5) parametric methods, 6) pricing under risk and uncertainty, etc. An improved classificat...
Classification differences and maternal mortality

DEFF Research Database (Denmark)

Salanave, B; Bouvier-Colle, M H; Varnoux, N

1999-01-01

OBJECTIVES: To compare the ways maternal deaths are classified in national statistical offices in Europe and to evaluate the ways classification affects published rates. METHODS: Data on pregnancy-associated deaths were collected in 13 European countries. Cases were classified by a European panel....... This change was substantial in three countries (P statistical offices appeared to attribute fewer deaths to obstetric causes. In the other countries, no differences were detected. According to official published data, the aggregated maternal mortality rate for participating countries was 7.7 per...... of experts into obstetric or non-obstetric causes. An ICD-9 code (International Classification of Diseases) was attributed to each case. These were compared to the codes given in each country. Correction indices were calculated, giving new estimates of maternal mortality rates. SUBJECTS: There were...
Supervised Classification in the Presence of Misclassified Training Data: A Monte Carlo Simulation Study in the Three Group Case

Directory of Open Access Journals (Sweden)

Jocelyn E Bolin

2014-02-01

Full Text Available Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three-group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.
Statistical data analysis using SAS intermediate statistical methods

CERN Document Server

Marasinghe, Mervyn G

2018-01-01

The aim of this textbook (previously titled SAS for Data Analytics) is to teach the use of SAS for statistical analysis of data for advanced undergraduate and graduate students in statistics, data science, and disciplines involving analyzing data. The book begins with an introduction beyond the basics of SAS, illustrated with non-trivial, real-world, worked examples. It proceeds to SAS programming and applications, SAS graphics, statistical analysis of regression models, analysis of variance models, analysis of variance with random and mixed effects models, and then takes the discussion beyond regression and analysis of variance to conclude. Pedagogically, the authors introduce theory and methodological basis topic by topic, present a problem as an application, followed by a SAS analysis of the data provided and a discussion of results. The text focuses on applied statistical problems and methods. Key features include: end of chapter exercises, downloadable SAS code and data sets, and advanced material suitab...
Automotive System for Remote Surface Classification.

Science.gov (United States)

Bystrov, Aleksandr; Hoare, Edward; Tran, Thuy-Yung; Clarke, Nigel; Gashinova, Marina; Cherniakov, Mikhail

2017-04-01

In this paper we shall discuss a novel approach to road surface recognition, based on the analysis of backscattered microwave and ultrasonic signals. The novelty of our method is sonar and polarimetric radar data fusion, extraction of features for separate swathes of illuminated surface (segmentation), and using of multi-stage artificial neural network for surface classification. The developed system consists of 24 GHz radar and 40 kHz ultrasonic sensor. The features are extracted from backscattered signals and then the procedures of principal component analysis and supervised classification are applied to feature data. The special attention is paid to multi-stage artificial neural network which allows an overall increase in classification accuracy. The proposed technique was tested for recognition of a large number of real surfaces in different weather conditions with the average accuracy of correct classification of 95%. The obtained results thereby demonstrate that the use of proposed system architecture and statistical methods allow for reliable discrimination of various road surfaces in real conditions.
New casemix classification as an alternative method for budget allocation in thai oral healthcare service: a pilot study.

Science.gov (United States)

Wisaijohn, Thunthita; Pimkhaokham, Atiphan; Lapying, Phenkhae; Itthichaisri, Chumpot; Pannarunothai, Supasit; Igarashi, Isao; Kawabuchi, Koichi

2010-01-01

This study aimed to develop a new casemix classification system as an alternative method for the budget allocation of oral healthcare service (OHCS). Initially, the International Statistical of Diseases and Related Health Problem, 10th revision, Thai Modification (ICD-10-TM) related to OHCS was used for developing the software "Grouper". This model was designed to allow the translation of dental procedures into eight-digit codes. Multiple regression analysis was used to analyze the relationship between the factors used for developing the model and the resource consumption. Furthermore, the coefficient of variance, reduction in variance, and relative weight (RW) were applied to test the validity. The results demonstrated that 1,624 OHCS classifications, according to the diagnoses and the procedures performed, showed high homogeneity within groups and heterogeneity between groups. Moreover, the RW of the OHCS could be used to predict and control the production costs. In conclusion, this new OHCS casemix classification has a potential use in a global decision making.
New Casemix Classification as an Alternative Method for Budget Allocation in Thai Oral Healthcare Service: A Pilot Study

Directory of Open Access Journals (Sweden)

Thunthita Wisaijohn

2010-01-01

Full Text Available This study aimed to develop a new casemix classification system as an alternative method for the budget allocation of oral healthcare service (OHCS. Initially, the International Statistical of Diseases and Related Health Problem, 10th revision, Thai Modification (ICD-10-TM related to OHCS was used for developing the software “Grouper”. This model was designed to allow the translation of dental procedures into eight-digit codes. Multiple regression analysis was used to analyze the relationship between the factors used for developing the model and the resource consumption. Furthermore, the coefficient of variance, reduction in variance, and relative weight (RW were applied to test the validity. The results demonstrated that 1,624 OHCS classifications, according to the diagnoses and the procedures performed, showed high homogeneity within groups and heterogeneity between groups. Moreover, the RW of the OHCS could be used to predict and control the production costs. In conclusion, this new OHCS casemix classification has a potential use in a global decision making.
Advanced statistical methods in data science

CERN Document Server

Chen, Jiahua; Lu, Xuewen; Yi, Grace; Yu, Hao

2016-01-01

This book gathers invited presentations from the 2nd Symposium of the ICSA- CANADA Chapter held at the University of Calgary from August 4-6, 2015. The aim of this Symposium was to promote advanced statistical methods in big-data sciences and to allow researchers to exchange ideas on statistics and data science and to embraces the challenges and opportunities of statistics and data science in the modern world. It addresses diverse themes in advanced statistical analysis in big-data sciences, including methods for administrative data analysis, survival data analysis, missing data analysis, high-dimensional and genetic data analysis, longitudinal and functional data analysis, the design and analysis of studies with response-dependent and multi-phase designs, time series and robust statistics, statistical inference based on likelihood, empirical likelihood and estimating functions. The editorial group selected 14 high-quality presentations from this successful symposium and invited the presenters to prepare a fu...

A novel Neuro-fuzzy classification technique for data mining

Directory of Open Access Journals (Sweden)

Soumadip Ghosh

2014-11-01

Full Text Available In our study, we proposed a novel Neuro-fuzzy classification technique for data mining. The inputs to the Neuro-fuzzy classification system were fuzzified by applying generalized bell-shaped membership function. The proposed method utilized a fuzzification matrix in which the input patterns were associated with a degree of membership to different classes. Based on the value of degree of membership a pattern would be attributed to a specific category or class. We applied our method to ten benchmark data sets from the UCI machine learning repository for classification. Our objective was to analyze the proposed method and, therefore compare its performance with two powerful supervised classification algorithms Radial Basis Function Neural Network (RBFNN and Adaptive Neuro-fuzzy Inference System (ANFIS. We assessed the performance of these classification methods in terms of different performance measures such as accuracy, root-mean-square error, kappa statistic, true positive rate, false positive rate, precision, recall, and f-measure. In every aspect the proposed method proved to be superior to RBFNN and ANFIS algorithms.
REAL-TIME INTELLIGENT MULTILAYER ATTACK CLASSIFICATION SYSTEM

Directory of Open Access Journals (Sweden)

T. Subbhulakshmi

2014-01-01

Full Text Available Intrusion Detection Systems (IDS takes the lion’s share of the current security infrastructure. Detection of intrusions is vital for initiating the defensive procedures. Intrusion detection was done by statistical and distance based methods. A threshold value is used in these methods to indicate the level of normalcy. When the network traffic crosses the level of normalcy then above which it is flagged as anomalous. When there are occurrences of new intrusion events which are increasingly a key part of system security, the statistical techniques cannot detect them. To overcome this issue, learning techniques are used which helps in identifying new intrusion activities in a computer system. The objective of the proposed system designed in this paper is to classify the intrusions using an Intelligent Multi Layered Attack Classification System (IMLACS which helps in detecting and classifying the intrusions with improved classification accuracy. The intelligent multi layered approach contains three intelligent layers. The first layer involves Binary Support Vector Machine classification for detecting the normal and attack. The second layer involves neural network classification to classify the attacks into classes of attacks. The third layer involves fuzzy inference system to classify the attacks into various subclasses. The proposed IMLACS can be able to detect an intrusion behavior of the networks since the system contains a three intelligent layer classification and better set of rules. Feature selection is also used to improve the time of detection. The experimental results show that the IMLACS achieves the Classification Rate of 97.31%.
Fruit Detachment and Classification Method for Strawberry Harvesting Robot

Directory of Open Access Journals (Sweden)

Guo Feng

2008-03-01

Full Text Available Fruit detachment and on-line classification is important for the development of harvesting robot. With the specific requriements of robot used for harvesting strawberries growing on the ground, a fruit detachment and classification method is introduced in this paper. OHTA color spaces based image segmentation algorithm is utilized to extract strawberry from background; Principal inertia axis of binary strawberry blob is calculated to give the pose information of fruit. Strawberry is picked selectively according to its ripeness and classified according to its shape feature. Histogram matching based method for fruit shape judgment is introduced firstly. Experiment results show that this method can achieve 93% accuracy of strawberry's stem detection, 90% above accuracy of ripeness and shape quality judgment on black and white background. With the improvement of harvesting mechanism design, this method has application potential in the field operation.
Fruit Detachment and Classification Method for Strawberry Harvesting Robot

Directory of Open Access Journals (Sweden)

Guo Feng

2008-11-01

Full Text Available Fruit detachment and on-line classification is important for the development of harvesting robot. With the specific requriements of robot used for harvesting strawberries growing on the ground, a fruit detachment and classification method is introduced in this paper. OHTA color spaces based image segmentation algorithm is utilized to extract strawberry from background; Principal inertia axis of binary strawberry blob is calculated to give the pose information of fruit. Strawberry is picked selectively according to its ripeness and classified according to its shape feature. Histogram matching based method for fruit shape judgment is introduced firstly. Experiment results show that this method can achieve 93% accuracy of strawberry's stem detection, 90% above accuracy of ripeness and shape quality judgment on black and white background. With the improvement of harvesting mechanism design, this method has application potential in the field operation.
Forensic classification of counterfeit banknote paper by X-ray fluorescence and multivariate statistical methods.

Science.gov (United States)

Guo, Hongling; Yin, Baohua; Zhang, Jie; Quan, Yangke; Shi, Gaojun

2016-09-01

Counterfeiting of banknotes is a crime and seriously harmful to economy. Examination of the paper, ink and toners used to make counterfeit banknotes can provide useful information to classify and link different cases in which the suspects use the same raw materials. In this paper, 21 paper samples of counterfeit banknotes seized from 13 cases were analyzed by wavelength dispersive X-ray fluorescence. After measuring the elemental composition in paper semi-quantitatively, the normalized weight percentage data of 10 elements were processed by multivariate statistical methods of cluster analysis and principle component analysis. All these paper samples were mainly classified into 3 groups. Nine separate cases were successfully linked. It is demonstrated that elemental composition measured by XRF is a useful way to compare and classify papers used in different cases. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Statistical Methods for Fuzzy Data

CERN Document Server

Viertl, Reinhard

2011-01-01

Statistical data are not always precise numbers, or vectors, or categories. Real data are frequently what is called fuzzy. Examples where this fuzziness is obvious are quality of life data, environmental, biological, medical, sociological and economics data. Also the results of measurements can be best described by using fuzzy numbers and fuzzy vectors respectively. Statistical analysis methods have to be adapted for the analysis of fuzzy data. In this book, the foundations of the description of fuzzy data are explained, including methods on how to obtain the characterizing function of fuzzy m
Machine-learning methods in the classification of water bodies

Directory of Open Access Journals (Sweden)

Sołtysiak Marek

2016-06-01

Full Text Available Amphibian species have been considered as useful ecological indicators. They are used as indicators of environmental contamination, ecosystem health and habitat quality., Amphibian species are sensitive to changes in the aquatic environment and therefore, may form the basis for the classification of water bodies. Water bodies in which there are a large number of amphibian species are especially valuable even if they are located in urban areas. The automation of the classification process allows for a faster evaluation of the presence of amphibian species in the water bodies. Three machine-learning methods (artificial neural networks, decision trees and the k-nearest neighbours algorithm have been used to classify water bodies in Chorzów – one of 19 cities in the Upper Silesia Agglomeration. In this case, classification is a supervised data mining method consisting of several stages such as building the model, the testing phase and the prediction. Seven natural and anthropogenic features of water bodies (e.g. the type of water body, aquatic plants, the purpose of the water body (destination, position of the water body in relation to any possible buildings, condition of the water body, the degree of littering, the shore type and fishing activities have been taken into account in the classification. The data set used in this study involved information about 71 different water bodies and 9 amphibian species living in them. The results showed that the best average classification accuracy was obtained with the multilayer perceptron neural network.
Simple Fully Automated Group Classification on Brain fMRI

International Nuclear Information System (INIS)

Honorio, J.; Goldstein, R.; Samaras, D.; Tomasi, D.; Goldstein, R.Z.

2010-01-01

We propose a simple, well grounded classification technique which is suited for group classification on brain fMRI data sets that have high dimensionality, small number of subjects, high noise level, high subject variability, imperfect registration and capture subtle cognitive effects. We propose threshold-split region as a new feature selection method and majority voteas the classification technique. Our method does not require a predefined set of regions of interest. We use average acros ssessions, only one feature perexperimental condition, feature independence assumption, and simple classifiers. The seeming counter-intuitive approach of using a simple design is supported by signal processing and statistical theory. Experimental results in two block design data sets that capture brain function under distinct monetary rewards for cocaine addicted and control subjects, show that our method exhibits increased generalization accuracy compared to commonly used feature selection and classification techniques.
Simple Fully Automated Group Classification on Brain fMRI

Energy Technology Data Exchange (ETDEWEB)

Honorio, J.; Goldstein, R.; Honorio, J.; Samaras, D.; Tomasi, D.; Goldstein, R.Z.

2010-04-14

We propose a simple, well grounded classification technique which is suited for group classification on brain fMRI data sets that have high dimensionality, small number of subjects, high noise level, high subject variability, imperfect registration and capture subtle cognitive effects. We propose threshold-split region as a new feature selection method and majority voteas the classification technique. Our method does not require a predefined set of regions of interest. We use average acros ssessions, only one feature perexperimental condition, feature independence assumption, and simple classifiers. The seeming counter-intuitive approach of using a simple design is supported by signal processing and statistical theory. Experimental results in two block design data sets that capture brain function under distinct monetary rewards for cocaine addicted and control subjects, show that our method exhibits increased generalization accuracy compared to commonly used feature selection and classification techniques.
Statistical inference for remote sensing-based estimates of net deforestation

Science.gov (United States)

Ronald E. McRoberts; Brian F. Walters

2012-01-01

Statistical inference requires expression of an estimate in probabilistic terms, usually in the form of a confidence interval. An approach to constructing confidence intervals for remote sensing-based estimates of net deforestation is illustrated. The approach is based on post-classification methods using two independent forest/non-forest classifications because...
Generalized t-statistic for two-group classification.

Science.gov (United States)

Komori, Osamu; Eguchi, Shinto; Copas, John B

2015-06-01

In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples. © 2014, The International Biometric Society.
Applied multivariate statistics with R

CERN Document Server

Zelterman, Daniel

2015-01-01

This book brings the power of multivariate statistics to graduate-level practitioners, making these analytical methods accessible without lengthy mathematical derivations. Using the open source, shareware program R, Professor Zelterman demonstrates the process and outcomes for a wide array of multivariate statistical applications. Chapters cover graphical displays, linear algebra, univariate, bivariate and multivariate normal distributions, factor methods, linear regression, discrimination and classification, clustering, time series models, and additional methods. Zelterman uses practical examples from diverse disciplines to welcome readers from a variety of academic specialties. Those with backgrounds in statistics will learn new methods while they review more familiar topics. Chapters include exercises, real data sets, and R implementations. The data are interesting, real-world topics, particularly from health and biology-related contexts. As an example of the approach, the text examines a sample from the B...
Classification of Patients Treated for Infertility Using the IVF Method

Directory of Open Access Journals (Sweden)

Malinowski Paweł

2015-12-01

Full Text Available One of the most effective methods of infertility treatment is in vitro fertilization (IVF. Effectiveness of the treatment, as well as classification of the data obtained from it, is still an ongoing issue. Classifiers obtained so far are powerful, but even the best ones do not exhibit equal quality concerning possible treatment outcome predictions. Usually, lack of pregnancy is predicted far too often. This creates a constant need for further exploration of this issue. Careful use of different classification methods can, however, help to achieve that goal.
A comparison of change detection measurements using object-based and pixel-based classification methods on western juniper dominated woodlands in eastern Oregon

Directory of Open Access Journals (Sweden)

Ryan G. Howell

2017-03-01

Full Text Available Encroachment of pinyon (Pinus spp and juniper (Juniperus spp. woodlands in western North America is considered detrimental due to its effects on ecohydrology, plant community structure, and soil stability. Management plans at the federal, state, and private level often include juniper removal for improving habitat of sensitive species and maintaining sustainable ecosystem processes. Remote sensing has become a useful tool in determining changes in juniper woodland structure because of its uses in comparing archived historic imagery with newly available multispectral images to provide information on changes that are no longer detectable by field measurements. Change in western juniper (J. occidentalis cover was detected following juniper removal treatments between 1995 and 2011 using panchromatic 1-meter NAIP and 4-band 1-meter NAIP imagery, respectively. Image classification was conducted using remotely sensed images taken at the Roaring Springs Ranch in southeastern Oregon. Feature Analyst for ArcGIS (object-based extraction and a supervised classification with ENVI 5.2 (pixel-based extraction were used to delineate juniper canopy cover. Image classification accuracy was calculated using an Accuracy Assessment and Kappa Statistic. Both methods showed approximately a 76% decrease in western juniper cover, although differing in total canopy cover area, with object-based classification being more accurate. Classification results for the 2011 imagery were much more accurate (0.99 Kappa statistic because of its low juniper density and the presence of an infrared band. The development of methods for detecting change in juniper cover can lead to more accurate and efficient data acquisition and subsequently improved land management and monitoring practices. These data can subsequently be used to assess and quantify juniper invasion and succession, potential ecological impacts, and plant community resilience.
A Two-Layer Method for Sedentary Behaviors Classification Using Smartphone and Bluetooth Beacons.

Science.gov (United States)

Cerón, Jesús D; López, Diego M; Hofmann, Christian

2017-01-01

Among the factors that outline the health of populations, person's lifestyle is the more important one. This work focuses on the caracterization and prevention of sedentary lifestyles. A sedentary behavior is defined as "any waking behavior characterized by an energy expenditure of 1.5 METs (Metabolic Equivalent) or less while in a sitting or reclining posture". To propose a method for sedentary behaviors classification using a smartphone and Bluetooth beacons considering different types of classification models: personal, hybrid or impersonal. Following the CRISP-DM methodology, a method based on a two-layer approach for the classification of sedentary behaviors is proposed. Using data collected from a smartphones' accelerometer, gyroscope and barometer; the first layer classifies between performing a sedentary behavior and not. The second layer of the method classifies the specific sedentary activity performed using only the smartphone's accelerometer and barometer data, but adding indoor location data, using Bluetooth Low Energy (BLE) beacons. To improve the precision of the classification, both layers implemented the Random Forest algorithm and the personal model. This study presents the first available method for the automatic classification of specific sedentary behaviors. The layered classification approach has the potential to improve processing, memory and energy consumption of mobile devices and wearables used.
A Novel Classification Method for Syndrome Differentiation of Patients with AIDS

Directory of Open Access Journals (Sweden)

Yufeng Zhao

2015-01-01

Full Text Available We consider the analysis of an AIDS dataset where each patient is characterized by a list of symptoms and is labeled with one or more TCM syndromes. The task is to build a classifier that maps symptoms to TCM syndromes. We use the minimum reference set-based multiple instance learning (MRS-MIL method. The method identifies a list of representative symptoms for each syndrome and builds a Gaussian mixture model based on them. The models for all syndromes are then used for classification via Bayes rule. By relying on a subset of key symptoms for classification, MRS-MIL can produce reliable and high quality classification rules even on datasets with small sample size. On the AIDS dataset, it achieves average precision and recall 0.7736 and 0.7111, respectively. Those are superior to results achieved by alternative methods.
New Statistics for Texture Classification Based on Gabor Filters

Directory of Open Access Journals (Sweden)

J. Pavlovicova

2007-09-01

Full Text Available The paper introduces a new method of texture segmentation efficiency evaluation. One of the well known texture segmentation methods is based on Gabor filters because of their orientation and spatial frequency character. Several statistics are used to extract more information from results obtained by Gabor filtering. Big amount of input parameters causes a wide set of results which need to be evaluated. The evaluation method is based on the normal distributions Gaussian curves intersection assessment and provides a new point of view to the segmentation method selection.
A survey of available margin in a PWR RIA with statistical methods and 3D kinetics

International Nuclear Information System (INIS)

Riverola Gurruchaga, J.; Nunez Rodriguez, T.

2010-01-01

This paper investigates the recovery of margin in a PWR RIA simulation with 3D kinetics, due to statistical techniques. The chosen reference core is a typical 12 feet, 17*17 PWR, with very low leakage loading pattern strategy and gadolinium oxide as burnable poison. The PARCS calculated average nuclear power and nodal power are transferred to a hot spot model for a sequential calculation of fuel temperature and enthalpy responses allowing for independent hypothesis in both calculations. The hot spot analysis is done with a pellet type model with RELAP. The analysis is done at HZP and EOC, since this state is the most limiting one respect to the enthalpy rise criterion, compared to other burn-up condition or initial power cases. In this work, the enthalpy increase is estimated with several statistical methods of propagation of uncertainties: order statistics, parametric statistics, surface response and sensitivities. A discussion on the advantages and disadvantages of each method is also presented. This statistical analysis is also useful to confirm a previous classification of parameters and assumptions according to their importance for the simulation, and found to be consistent with the state of the art in the published literature. These parameters include ejected rod worth and ejection time, delayed neutron fraction and yields, nuclear power peaking factor, and Doppler. (authors)
New KF-PP-SVM classification method for EEG in brain-computer interfaces.

Science.gov (United States)

Yang, Banghua; Han, Zhijun; Zan, Peng; Wang, Qian

2014-01-01

Classification methods are a crucial direction in the current study of brain-computer interfaces (BCIs). To improve the classification accuracy for electroencephalogram (EEG) signals, a novel KF-PP-SVM (kernel fisher, posterior probability, and support vector machine) classification method is developed. Its detailed process entails the use of common spatial patterns to obtain features, based on which the within-class scatter is calculated. Then the scatter is added into the kernel function of a radial basis function to construct a new kernel function. This new kernel is integrated into the SVM to obtain a new classification model. Finally, the output of SVM is calculated based on posterior probability and the final recognition result is obtained. To evaluate the effectiveness of the proposed KF-PP-SVM method, EEG data collected from laboratory are processed with four different classification schemes (KF-PP-SVM, KF-SVM, PP-SVM, and SVM). The results showed that the overall average improvements arising from the use of the KF-PP-SVM scheme as opposed to KF-SVM, PP-SVM and SVM schemes are 2.49%, 5.83 % and 6.49 % respectively.
A novel fruit shape classification method based on multi-scale analysis

Science.gov (United States)

Gui, Jiangsheng; Ying, Yibin; Rao, Xiuqin

2005-11-01

Shape is one of the major concerns and which is still a difficult problem in automated inspection and sorting of fruits. In this research, we proposed the multi-scale energy distribution (MSED) for object shape description, the relationship between objects shape and its boundary energy distribution at multi-scale was explored for shape extraction. MSED offers not only the mainly energy which represent primary shape information at the lower scales, but also subordinate energy which represent local shape information at higher differential scales. Thus, it provides a natural tool for multi resolution representation and can be used as a feature for shape classification. We addressed the three main processing steps in the MSED-based shape classification. They are namely, 1) image preprocessing and citrus shape extraction, 2) shape resample and shape feature normalization, 3) energy decomposition by wavelet and classification by BP neural network. Hereinto, shape resample is resample 256 boundary pixel from a curve which is approximated original boundary by using cubic spline in order to get uniform raw data. A probability function was defined and an effective method to select a start point was given through maximal expectation, which overcame the inconvenience of traditional methods in order to have a property of rotation invariants. The experiment result is relatively well normal citrus and serious abnormality, with a classification rate superior to 91.2%. The global correct classification rate is 89.77%, and our method is more effective than traditional method. The global result can meet the request of fruit grading.

Classification and regression trees

CERN Document Server

Breiman, Leo; Olshen, Richard A; Stone, Charles J

1984-01-01

The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Speech emotion recognition based on statistical pitch model

Institute of Scientific and Technical Information of China (English)

WANG Zhiping; ZHAO Li; ZOU Cairong

2006-01-01

A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech.The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85%if the traditional parameters are utilized.
An object-oriented classification method of high resolution imagery based on improved AdaTree

International Nuclear Information System (INIS)

Xiaohe, Zhang; Liang, Zhai; Jixian, Zhang; Huiyong, Sang

2014-01-01

With the popularity of the application using high spatial resolution remote sensing image, more and more studies paid attention to object-oriented classification on image segmentation as well as automatic classification after image segmentation. This paper proposed a fast method of object-oriented automatic classification. First, edge-based or FNEA-based segmentation was used to identify image objects and the values of most suitable attributes of image objects for classification were calculated. Then a certain number of samples from the image objects were selected as training data for improved AdaTree algorithm to get classification rules. Finally, the image objects could be classified easily using these rules. In the AdaTree, we mainly modified the final hypothesis to get classification rules. In the experiment with WorldView2 image, the result of the method based on AdaTree showed obvious accuracy and efficient improvement compared with the method based on SVM with the kappa coefficient achieving 0.9242
Protein structure: geometry, topology and classification

Energy Technology Data Exchange (ETDEWEB)

Taylor, William R.; May, Alex C.W.; Brown, Nigel P.; Aszodi, Andras [Division of Mathematical Biology, National Institute for Medical Research, London (United Kingdom)

2001-04-01

The structural principals of proteins are reviewed and analysed from a geometric perspective with a view to revealing the underlying regularities in their construction. Computer methods for the automatic comparison and classification of these structures are then reviewed with an analysis of the statistical significance of comparing different shapes. Following an analysis of the current state of the classification of proteins, more abstract geometric and topological representations are explored, including the occurrence of knotted topologies. The review concludes with a consideration of the origin of higher-level symmetries in protein structure. (author)
Statistical Redundancy Testing for Improved Gene Selection in Cancer Classification Using Microarray Data

Directory of Open Access Journals (Sweden)

J. Sunil Rao

2007-01-01

Full Text Available In gene selection for cancer classifi cation using microarray data, we define an eigenvalue-ratio statistic to measure a gene’s contribution to the joint discriminability when this gene is included into a set of genes. Based on this eigenvalueratio statistic, we define a novel hypothesis testing for gene statistical redundancy and propose two gene selection methods. Simulation studies illustrate the agreement between statistical redundancy testing and gene selection methods. Real data examples show the proposed gene selection methods can select a compact gene subset which can not only be used to build high quality cancer classifiers but also show biological relevance.
A minimum spanning forest based classification method for dedicated breast CT images

International Nuclear Information System (INIS)

Pike, Robert; Sechopoulos, Ioannis; Fei, Baowei

2015-01-01

Purpose: To develop and test an automated algorithm to classify different types of tissue in dedicated breast CT images. Methods: Images of a single breast of five different patients were acquired with a dedicated breast CT clinical prototype. The breast CT images were processed by a multiscale bilateral filter to reduce noise while keeping edge information and were corrected to overcome cupping artifacts. As skin and glandular tissue have similar CT values on breast CT images, morphologic processing is used to identify the skin based on its position information. A support vector machine (SVM) is trained and the resulting model used to create a pixelwise classification map of fat and glandular tissue. By combining the results of the skin mask with the SVM results, the breast tissue is classified as skin, fat, and glandular tissue. This map is then used to identify markers for a minimum spanning forest that is grown to segment the image using spatial and intensity information. To evaluate the authors’ classification method, they use DICE overlap ratios to compare the results of the automated classification to those obtained by manual segmentation on five patient images. Results: Comparison between the automatic and the manual segmentation shows that the minimum spanning forest based classification method was able to successfully classify dedicated breast CT image with average DICE ratios of 96.9%, 89.8%, and 89.5% for fat, glandular, and skin tissue, respectively. Conclusions: A 2D minimum spanning forest based classification method was proposed and evaluated for classifying the fat, skin, and glandular tissue in dedicated breast CT images. The classification method can be used for dense breast tissue quantification, radiation dose assessment, and other applications in breast imaging
Identification of AE Bursts by Classification of Physical and Statistical Parameters

International Nuclear Information System (INIS)

Mieza, J.I.; Oliveto, M.E.; Lopez Pumarega, M.I.; Armeite, M.; Ruzzante, J.E.; Piotrkowski, R.

2005-01-01

Physical and statistical parameters obtained with the Principal Components method, extracted from Acoustic Emission bursts coming from triaxial deformation tests were analyzed. The samples came from seamless steel tubes used in the petroleum industry and some of them were provided with a protective coating. The purpose of our work was to identify bursts originated in the breakage of the coating, from those originated in damage mechanisms in the bulk steel matrix. Analysis was performed by statistical distributions, fractal analysis and clustering methods
A Method of Spatial Mapping and Reclassification for High-Spatial-Resolution Remote Sensing Image Classification

Directory of Open Access Journals (Sweden)

Guizhou Wang

2013-01-01

Full Text Available This paper presents a new classification method for high-spatial-resolution remote sensing images based on a strategic mechanism of spatial mapping and reclassification. The proposed method includes four steps. First, the multispectral image is classified by a traditional pixel-based classification method (support vector machine. Second, the panchromatic image is subdivided by watershed segmentation. Third, the pixel-based multispectral image classification result is mapped to the panchromatic segmentation result based on a spatial mapping mechanism and the area dominant principle. During the mapping process, an area proportion threshold is set, and the regional property is defined as unclassified if the maximum area proportion does not surpass the threshold. Finally, unclassified regions are reclassified based on spectral information using the minimum distance to mean algorithm. Experimental results show that the classification method for high-spatial-resolution remote sensing images based on the spatial mapping mechanism and reclassification strategy can make use of both panchromatic and multispectral information, integrate the pixel- and object-based classification methods, and improve classification accuracy.
Applied Statistics Using SPSS, STATISTICA, MATLAB and R

CERN Document Server

De Sá, Joaquim P Marques

2007-01-01

This practical reference provides a comprehensive introduction and tutorial on the main statistical analysis topics, demonstrating their solution with the most common software package. Intended for anyone needing to apply statistical analysis to a large variety of science and enigineering problems, the book explains and shows how to use SPSS, MATLAB, STATISTICA and R for analysis such as data description, statistical inference, classification and regression, factor analysis, survival data and directional statistics. It concisely explains key concepts and methods, illustrated by practical examp
Evaluating Method Engineer Performance: an error classification and preliminary empirical study

Directory of Open Access Journals (Sweden)

Steven Kelly

1998-11-01

Full Text Available We describe an approach to empirically test the use of metaCASE environments to model methods. Both diagrams and matrices have been proposed as a means for presenting the methods. These different paradigms may have their own effects on how easily and well users can model methods. We extend Batra's classification of errors in data modelling to cover metamodelling, and use it to measure the performance of a group of metamodellers using either diagrams or matrices. The tentative results from this pilot study confirm the usefulness of the classification, and show some interesting differences between the paradigms.
Comparison of an automated classification system with an empirical classification of circulation patterns over the Pannonian basin, Central Europe

Science.gov (United States)

Maheras, Panagiotis; Tolika, Konstantia; Tegoulias, Ioannis; Anagnostopoulou, Christina; Szpirosz, Klicász; Károssy, Csaba; Makra, László

2018-04-01

The aim of the study is to compare the performance of the two classification methods, based on the atmospheric circulation types over the Pannonian basin in Central Europe. Moreover, relationships including seasonal occurrences and correlation coefficients, as well as comparative diagrams of the seasonal occurrences of the circulation types of the two classification systems are presented. When comparing of the automated (objective) and empirical (subjective) classification methods, it was found that the frequency of the empirical anticyclonic (cyclonic) types is much higher (lower) than that of the automated anticyclonic (cyclonic) types both on an annual and seasonal basis. The highest and statistically significant correlations between the circulation types of the two classification systems, as well as those between the cumulated seasonal anticyclonic and cyclonic types occur in winter for both classifications, since the weather-influencing effect of the atmospheric circulation in this season is the most prevalent. Precipitation amounts in Budapest display a decreasing trend in accordance with the decrease in the occurrence of the automated cyclonic types. In contrast, the occurrence of the empirical cyclonic types displays an increasing trend. There occur types in a given classification that are usually accompanied by high ratios of certain types in the other classification.
Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

Science.gov (United States)

Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

2014-01-01

Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

Science.gov (United States)

Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

1999-01-01

Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
Understanding common statistical methods, Part I: descriptive methods, probability, and continuous data.

Science.gov (United States)

Skinner, Carl G; Patel, Manish M; Thomas, Jerry D; Miller, Michael A

2011-01-01

Statistical methods are pervasive in medical research and general medical literature. Understanding general statistical concepts will enhance our ability to critically appraise the current literature and ultimately improve the delivery of patient care. This article intends to provide an overview of the common statistical methods relevant to medicine.
Cloud field classification based on textural features

Science.gov (United States)

Sengupta, Sailes Kumar

1989-01-01

An essential component in global climate research is accurate cloud cover and type determination. Of the two approaches to texture-based classification (statistical and textural), only the former is effective in the classification of natural scenes such as land, ocean, and atmosphere. In the statistical approach that was adopted, parameters characterizing the stochastic properties of the spatial distribution of grey levels in an image are estimated and then used as features for cloud classification. Two types of textural measures were used. One is based on the distribution of the grey level difference vector (GLDV), and the other on a set of textural features derived from the MaxMin cooccurrence matrix (MMCM). The GLDV method looks at the difference D of grey levels at pixels separated by a horizontal distance d and computes several statistics based on this distribution. These are then used as features in subsequent classification. The MaxMin tectural features on the other hand are based on the MMCM, a matrix whose (I,J)th entry give the relative frequency of occurrences of the grey level pair (I,J) that are consecutive and thresholded local extremes separated by a given pixel distance d. Textural measures are then computed based on this matrix in much the same manner as is done in texture computation using the grey level cooccurrence matrix. The database consists of 37 cloud field scenes from LANDSAT imagery using a near IR visible channel. The classification algorithm used is the well known Stepwise Discriminant Analysis. The overall accuracy was estimated by the percentage or correct classifications in each case. It turns out that both types of classifiers, at their best combination of features, and at any given spatial resolution give approximately the same classification accuracy. A neural network based classifier with a feed forward architecture and a back propagation training algorithm is used to increase the classification accuracy, using these two classes
Statistical models and methods for reliability and survival analysis

CERN Document Server

Couallier, Vincent; Huber-Carol, Catherine; Mesbah, Mounir; Huber -Carol, Catherine; Limnios, Nikolaos; Gerville-Reache, Leo

2013-01-01

Statistical Models and Methods for Reliability and Survival Analysis brings together contributions by specialists in statistical theory as they discuss their applications providing up-to-date developments in methods used in survival analysis, statistical goodness of fit, stochastic processes for system reliability, amongst others. Many of these are related to the work of Professor M. Nikulin in statistics over the past 30 years. The authors gather together various contributions with a broad array of techniques and results, divided into three parts - Statistical Models and Methods, Statistical
Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

Directory of Open Access Journals (Sweden)

Hoefsloot Huub CJ

2009-05-01

Full Text Available Abstract Background Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Results Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high. We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated. Conclusion We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre
Statistical learning in high energy and astrophysics

International Nuclear Information System (INIS)

Zimmermann, J.

2005-01-01

This thesis studies the performance of statistical learning methods in high energy and astrophysics where they have become a standard tool in physics analysis. They are used to perform complex classification or regression by intelligent pattern recognition. This kind of artificial intelligence is achieved by the principle ''learning from examples'': The examples describe the relationship between detector events and their classification. The application of statistical learning methods is either motivated by the lack of knowledge about this relationship or by tight time restrictions. In the first case learning from examples is the only possibility since no theory is available which would allow to build an algorithm in the classical way. In the second case a classical algorithm exists but is too slow to cope with the time restrictions. It is therefore replaced by a pattern recognition machine which implements a fast statistical learning method. But even in applications where some kind of classical algorithm had done a good job, statistical learning methods convinced by their remarkable performance. This thesis gives an introduction to statistical learning methods and how they are applied correctly in physics analysis. Their flexibility and high performance will be discussed by showing intriguing results from high energy and astrophysics. These include the development of highly efficient triggers, powerful purification of event samples and exact reconstruction of hidden event parameters. The presented studies also show typical problems in the application of statistical learning methods. They should be only second choice in all cases where an algorithm based on prior knowledge exists. Some examples in physics analyses are found where these methods are not used in the right way leading either to wrong predictions or bad performance. Physicists also often hesitate to profit from these methods because they fear that statistical learning methods cannot be controlled in a
Statistical learning in high energy and astrophysics

Energy Technology Data Exchange (ETDEWEB)

Zimmermann, J.

2005-06-16

This thesis studies the performance of statistical learning methods in high energy and astrophysics where they have become a standard tool in physics analysis. They are used to perform complex classification or regression by intelligent pattern recognition. This kind of artificial intelligence is achieved by the principle ''learning from examples'': The examples describe the relationship between detector events and their classification. The application of statistical learning methods is either motivated by the lack of knowledge about this relationship or by tight time restrictions. In the first case learning from examples is the only possibility since no theory is available which would allow to build an algorithm in the classical way. In the second case a classical algorithm exists but is too slow to cope with the time restrictions. It is therefore replaced by a pattern recognition machine which implements a fast statistical learning method. But even in applications where some kind of classical algorithm had done a good job, statistical learning methods convinced by their remarkable performance. This thesis gives an introduction to statistical learning methods and how they are applied correctly in physics analysis. Their flexibility and high performance will be discussed by showing intriguing results from high energy and astrophysics. These include the development of highly efficient triggers, powerful purification of event samples and exact reconstruction of hidden event parameters. The presented studies also show typical problems in the application of statistical learning methods. They should be only second choice in all cases where an algorithm based on prior knowledge exists. Some examples in physics analyses are found where these methods are not used in the right way leading either to wrong predictions or bad performance. Physicists also often hesitate to profit from these methods because they fear that statistical learning methods cannot
Discriminant forest classification method and system

Science.gov (United States)

Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

2012-11-06

A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

Improving Hyperspectral Image Classification Method for Fine Land Use Assessment Application Using Semisupervised Machine Learning

Directory of Open Access Journals (Sweden)

Chunyang Wang

2015-01-01

Full Text Available Study on land use/cover can reflect changing rules of population, economy, agricultural structure adjustment, policy, and traffic and provide better service for the regional economic development and urban evolution. The study on fine land use/cover assessment using hyperspectral image classification is a focal growing area in many fields. Semisupervised learning method which takes a large number of unlabeled samples and minority labeled samples, improving classification and predicting the accuracy effectively, has been a new research direction. In this paper, we proposed improving fine land use/cover assessment based on semisupervised hyperspectral classification method. The test analysis of study area showed that the advantages of semisupervised classification method could improve the high precision overall classification and objective assessment of land use/cover results.
Secondary structure classification of amino-acid sequences using state-space modeling

OpenAIRE

Brunnert, Marcus; Krahnke, Tillmann; Urfer, Wolfgang

2001-01-01

The secondary structure classification of amino acid sequences can be carried out by a statistical analysis of sequence and structure data using state-space models. Aiming at this classification, a modified filter algorithm programmed in S is applied to data of three proteins. The application leads to correct classifications of two proteins even when using relatively simple estimation methods for the parameters of the state-space models. Furthermore, it has been shown that the assumed initial...
Prediction and classification of respiratory motion

CERN Document Server

Lee, Suk Jin

2014-01-01

This book describes recent radiotherapy technologies including tools for measuring target position during radiotherapy and tracking-based delivery systems. This book presents a customized prediction of respiratory motion with clustering from multiple patient interactions. The proposed method contributes to the improvement of patient treatments by considering breathing pattern for the accurate dose calculation in radiotherapy systems. Real-time tumor-tracking, where the prediction of irregularities becomes relevant, has yet to be clinically established. The statistical quantitative modeling for irregular breathing classification, in which commercial respiration traces are retrospectively categorized into several classes based on breathing pattern are discussed as well. The proposed statistical classification may provide clinical advantages to adjust the dose rate before and during the external beam radiotherapy for minimizing the safety margin. In the first chapter following the Introduction to this book, we...
Measuring methods and classification in the muscoskeletal radiology

International Nuclear Information System (INIS)

Waldt, Simone; Eiber, Matthias; Woertler, Klaus

2011-01-01

The book on measuring methods and classification in the musculoskeletal radiology covers the following topics: legs; hip joint; knee joint; foot; shoulder joint; elbow joint; wrist joint; spinal column; craniocervical transition region and cervical spine; muscular-skeletal carcinomas; osteoporosis; arthrosis; articular cartilage; hemophilia; rheumatic arthritis; muscular injuries; skeleton age.
HEp-2 cell image classification method based on very deep convolutional networks with small datasets

Science.gov (United States)

Lu, Mengchi; Gao, Long; Guo, Xifeng; Liu, Qiang; Yin, Jianping

2017-07-01

Human Epithelial-2 (HEp-2) cell images staining patterns classification have been widely used to identify autoimmune diseases by the anti-Nuclear antibodies (ANA) test in the Indirect Immunofluorescence (IIF) protocol. Because manual test is time consuming, subjective and labor intensive, image-based Computer Aided Diagnosis (CAD) systems for HEp-2 cell classification are developing. However, methods proposed recently are mostly manual features extraction with low accuracy. Besides, the scale of available benchmark datasets is small, which does not exactly suitable for using deep learning methods. This issue will influence the accuracy of cell classification directly even after data augmentation. To address these issues, this paper presents a high accuracy automatic HEp-2 cell classification method with small datasets, by utilizing very deep convolutional networks (VGGNet). Specifically, the proposed method consists of three main phases, namely image preprocessing, feature extraction and classification. Moreover, an improved VGGNet is presented to address the challenges of small-scale datasets. Experimental results over two benchmark datasets demonstrate that the proposed method achieves superior performance in terms of accuracy compared with existing methods.
A hierarchical approach of hybrid image classification for land use and land cover mapping

Directory of Open Access Journals (Sweden)

Rahdari Vahid

2018-01-01

Full Text Available Remote sensing data analysis can provide thematic maps describing land-use and land-cover (LULC in a short period. Using proper image classification method in an area, is important to overcome the possible limitations of satellite imageries for producing land-use and land-cover maps. In the present study, a hierarchical hybrid image classification method was used to produce LULC maps using Landsat Thematic mapper TM for the year of 1998 and operational land imager OLI for the year of 2016. Images were classified using the proposed hybrid image classification method, vegetation cover crown percentage map from normalized difference vegetation index, Fisher supervised classification and object-based image classification methods. Accuracy assessment results showed that the hybrid classification method produced maps with total accuracy up to 84 percent with kappa statistic value 0.81. Results of this study showed that the proposed classification method worked better with OLI sensor than with TM. Although OLI has a higher radiometric resolution than TM, the produced LULC map using TM is almost accurate like OLI, which is because of LULC definitions and image classification methods used.
Statistical Methods for Stochastic Differential Equations

CERN Document Server

Kessler, Mathieu; Sorensen, Michael

2012-01-01

The seventh volume in the SemStat series, Statistical Methods for Stochastic Differential Equations presents current research trends and recent developments in statistical methods for stochastic differential equations. Written to be accessible to both new students and seasoned researchers, each self-contained chapter starts with introductions to the topic at hand and builds gradually towards discussing recent research. The book covers Wiener-driven equations as well as stochastic differential equations with jumps, including continuous-time ARMA processes and COGARCH processes. It presents a sp
Simple statistical methods for software engineering data and patterns

CERN Document Server

Pandian, C Ravindranath

2015-01-01

Although there are countless books on statistics, few are dedicated to the application of statistical methods to software engineering. Simple Statistical Methods for Software Engineering: Data and Patterns fills that void. Instead of delving into overly complex statistics, the book details simpler solutions that are just as effective and connect with the intuition of problem solvers.Sharing valuable insights into software engineering problems and solutions, the book not only explains the required statistical methods, but also provides many examples, review questions, and case studies that prov
Diffeomorphic Statistical Deformation Models

DEFF Research Database (Denmark)

Hansen, Michael Sass; Hansen, Mads/Fogtman; Larsen, Rasmus

2007-01-01

In this paper we present a new method for constructing diffeomorphic statistical deformation models in arbitrary dimensional images with a nonlinear generative model and a linear parameter space. Our deformation model is a modified version of the diffeomorphic model introduced by Cootes et al....... The modifications ensure that no boundary restriction has to be enforced on the parameter space to prevent folds or tears in the deformation field. For straightforward statistical analysis, principal component analysis and sparse methods, we assume that the parameters for a class of deformations lie on a linear...... with ground truth in form of manual expert annotations, and compared to Cootes's model. We anticipate applications in unconstrained diffeomorphic synthesis of images, e.g. for tracking, segmentation, registration or classification purposes....
Application of blended learning in teaching statistical methods

Directory of Open Access Journals (Sweden)

Barbara Dębska

2012-12-01

Full Text Available The paper presents the application of a hybrid method (blended learning - linking traditional education with on-line education to teach selected problems of mathematical statistics. This includes the teaching of the application of mathematical statistics to evaluate laboratory experimental results. An on-line statistics course was developed to form an integral part of the module ‘methods of statistical evaluation of experimental results’. The course complies with the principles outlined in the Polish National Framework of Qualifications with respect to the scope of knowledge, skills and competencies that students should have acquired at course completion. The paper presents the structure of the course and the educational content provided through multimedia lessons made accessible on the Moodle platform. Following courses which used the traditional method of teaching and courses which used the hybrid method of teaching, students test results were compared and discussed to evaluate the effectiveness of the hybrid method of teaching when compared to the effectiveness of the traditional method of teaching.
Improving Land Use/Land Cover Classification by Integrating Pixel Unmixing and Decision Tree Methods

Directory of Open Access Journals (Sweden)

Chao Yang

2017-11-01

Full Text Available Decision tree classification is one of the most efficient methods for obtaining land use/land cover (LULC information from remotely sensed imageries. However, traditional decision tree classification methods cannot effectively eliminate the influence of mixed pixels. This study aimed to integrate pixel unmixing and decision tree to improve LULC classification by removing mixed pixel influence. The abundance and minimum noise fraction (MNF results that were obtained from mixed pixel decomposition were added to decision tree multi-features using a three-dimensional (3D Terrain model, which was created using an image fusion digital elevation model (DEM, to select training samples (ROIs, and improve ROI separability. A Landsat-8 OLI image of the Yunlong Reservoir Basin in Kunming was used to test this proposed method. Study results showed that the Kappa coefficient and the overall accuracy of integrated pixel unmixing and decision tree method increased by 0.093% and 10%, respectively, as compared with the original decision tree method. This proposed method could effectively eliminate the influence of mixed pixels and improve the accuracy in complex LULC classifications.
Semi-Automated Classification of Seafloor Data Collected on the Delmarva Inner Shelf

Science.gov (United States)

Sweeney, E. M.; Pendleton, E. A.; Brothers, L. L.; Mahmud, A.; Thieler, E. R.

2017-12-01

We tested automated classification methods on acoustic bathymetry and backscatter data collected by the U.S. Geological Survey (USGS) and National Oceanic and Atmospheric Administration (NOAA) on the Delmarva inner continental shelf to efficiently and objectively identify sediment texture and geomorphology. Automated classification techniques are generally less subjective and take significantly less time than manual classification methods. We used a semi-automated process combining unsupervised and supervised classification techniques to characterize seafloor based on bathymetric slope and relative backscatter intensity. Statistical comparison of our automated classification results with those of a manual classification conducted on a subset of the acoustic imagery indicates that our automated method was highly accurate (95% total accuracy and 93% Kappa). Our methods resolve sediment ridges, zones of flat seafloor and areas of high and low backscatter. We compared our classification scheme with mean grain size statistics of samples collected in the study area and found that strong correlations between backscatter intensity and sediment texture exist. High backscatter zones are associated with the presence of gravel and shells mixed with sand, and low backscatter areas are primarily clean sand or sand mixed with mud. Slope classes further elucidate textural and geomorphologic differences in the seafloor, such that steep slopes (>0.35°) with high backscatter are most often associated with the updrift side of sand ridges and bedforms, whereas low slope with high backscatter correspond to coarse lag or shell deposits. Low backscatter and high slopes are most often found on the downdrift side of ridges and bedforms, and low backscatter and low slopes identify swale areas and sand sheets. We found that poor acoustic data quality was the most significant cause of inaccurate classification results, which required additional user input to mitigate. Our method worked well
A decision-theoretic approach for segmental classification

OpenAIRE

Yau, Christopher; Holmes, Christopher C.

2013-01-01

This paper is concerned with statistical methods for the segmental classification of linear sequence data where the task is to segment and classify the data according to an underlying hidden discrete state sequence. Such analysis is commonplace in the empirical sciences including genomics, finance and speech processing. In particular, we are interested in answering the following question: given data $y$ and a statistical model $\\pi(x,y)$ of the hidden states $x$, what should we report as the ...
Geospatial Method for Computing Supplemental Multi-Decadal U.S. Coastal Land-Use and Land-Cover Classification Products, Using Landsat Data and C-CAP Products

Science.gov (United States)

Spruce, J. P.; Smoot, James; Ellis, Jean; Hilbert, Kent; Swann, Roberta

2012-01-01

This paper discusses the development and implementation of a geospatial data processing method and multi-decadal Landsat time series for computing general coastal U.S. land-use and land-cover (LULC) classifications and change products consisting of seven classes (water, barren, upland herbaceous, non-woody wetland, woody upland, woody wetland, and urban). Use of this approach extends the observational period of the NOAA-generated Coastal Change and Analysis Program (C-CAP) products by almost two decades, assuming the availability of one cloud free Landsat scene from any season for each targeted year. The Mobile Bay region in Alabama was used as a study area to develop, demonstrate, and validate the method that was applied to derive LULC products for nine dates at approximate five year intervals across a 34-year time span, using single dates of data for each classification in which forests were either leaf-on, leaf-off, or mixed senescent conditions. Classifications were computed and refined using decision rules in conjunction with unsupervised classification of Landsat data and C-CAP value-added products. Each classification's overall accuracy was assessed by comparing stratified random locations to available reference data, including higher spatial resolution satellite and aerial imagery, field survey data, and raw Landsat RGBs. Overall classification accuracies ranged from 83 to 91% with overall Kappa statistics ranging from 0.78 to 0.89. The accuracies are comparable to those from similar, generalized LULC products derived from C-CAP data. The Landsat MSS-based LULC product accuracies are similar to those from Landsat TM or ETM+ data. Accurate classifications were computed for all nine dates, yielding effective results regardless of season. This classification method yielded products that were used to compute LULC change products via additive GIS overlay techniques.
Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours.

Directory of Open Access Journals (Sweden)

Monique A Ladds

Full Text Available Constructing activity budgets for marine animals when they are at sea and cannot be directly observed is challenging, but recent advances in bio-logging technology offer solutions to this problem. Accelerometers can potentially identify a wide range of behaviours for animals based on unique patterns of acceleration. However, when analysing data derived from accelerometers, there are many statistical techniques available which when applied to different data sets produce different classification accuracies. We investigated a selection of supervised machine learning methods for interpreting behavioural data from captive otariids (fur seals and sea lions. We conducted controlled experiments with 12 seals, where their behaviours were filmed while they were wearing 3-axis accelerometers. From video we identified 26 behaviours that could be grouped into one of four categories (foraging, resting, travelling and grooming representing key behaviour states for wild seals. We used data from 10 seals to train four predictive classification models: stochastic gradient boosting (GBM, random forests, support vector machine using four different kernels and a baseline model: penalised logistic regression. We then took the best parameters from each model and cross-validated the results on the two seals unseen so far. We also investigated the influence of feature statistics (describing some characteristic of the seal, testing the models both with and without these. Cross-validation accuracies were lower than training accuracy, but the SVM with a polynomial kernel was still able to classify seal behaviour with high accuracy (>70%. Adding feature statistics improved accuracies across all models tested. Most categories of behaviour -resting, grooming and feeding-were all predicted with reasonable accuracy (52-81% by the SVM while travelling was poorly categorised (31-41%. These results show that model selection is important when classifying behaviour and that by using
Development of a Research Methods and Statistics Concept Inventory

Science.gov (United States)

Veilleux, Jennifer C.; Chapman, Kate M.

2017-01-01

Research methods and statistics are core courses in the undergraduate psychology major. To assess learning outcomes, it would be useful to have a measure that assesses research methods and statistical literacy beyond course grades. In two studies, we developed and provided initial validation results for a research methods and statistical knowledge…
Statistical error estimation of the Feynman-α method using the bootstrap method

International Nuclear Information System (INIS)

Endo, Tomohiro; Yamamoto, Akio; Yagi, Takahiro; Pyeon, Cheol Ho

2016-01-01

Applicability of the bootstrap method is investigated to estimate the statistical error of the Feynman-α method, which is one of the subcritical measurement techniques on the basis of reactor noise analysis. In the Feynman-α method, the statistical error can be simply estimated from multiple measurements of reactor noise, however it requires additional measurement time to repeat the multiple times of measurements. Using a resampling technique called 'bootstrap method' standard deviation and confidence interval of measurement results obtained by the Feynman-α method can be estimated as the statistical error, using only a single measurement of reactor noise. In order to validate our proposed technique, we carried out a passive measurement of reactor noise without any external source, i.e. with only inherent neutron source by spontaneous fission and (α,n) reactions in nuclear fuels at the Kyoto University Criticality Assembly. Through the actual measurement, it is confirmed that the bootstrap method is applicable to approximately estimate the statistical error of measurement results obtained by the Feynman-α method. (author)
Application of Classification Methods for Forecasting Mid-Term Power Load Patterns

Science.gov (United States)

Piao, Minghao; Lee, Heon Gyu; Park, Jin Hyoung; Ryu, Keun Ho

Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed approach in this paper consists of three stages: (i) data preprocessing: noise or outlier is removed and the continuous attribute-valued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.
Statistical Models and Methods for Lifetime Data

CERN Document Server

Lawless, Jerald F

2011-01-01

Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,
Hyperspectral Image Classification Using Discriminative Dictionary Learning

International Nuclear Information System (INIS)

Zongze, Y; Hao, S; Kefeng, J; Huanxin, Z

2014-01-01

The hyperspectral image (HSI) processing community has witnessed a surge of papers focusing on the utilization of sparse prior for effective HSI classification. In sparse representation based HSI classification, there are two phases: sparse coding with an over-complete dictionary and classification. In this paper, we first apply a novel fisher discriminative dictionary learning method, which capture the relative difference in different classes. The competitive selection strategy ensures that atoms in the resulting over-complete dictionary are the most discriminative. Secondly, motivated by the assumption that spatially adjacent samples are statistically related and even belong to the same materials (same class), we propose a majority voting scheme incorporating contextual information to predict the category label. Experiment results show that the proposed method can effectively strengthen relative discrimination of the constructed dictionary, and incorporating with the majority voting scheme achieve generally an improved prediction performance

Statistical methods in spatial genetics

DEFF Research Database (Denmark)

Guillot, Gilles; Leblois, Raphael; Coulon, Aurelie

2009-01-01

The joint analysis of spatial and genetic data is rapidly becoming the norm in population genetics. More and more studies explicitly describe and quantify the spatial organization of genetic variation and try to relate it to underlying ecological processes. As it has become increasingly difficult...... to keep abreast with the latest methodological developments, we review the statistical toolbox available to analyse population genetic data in a spatially explicit framework. We mostly focus on statistical concepts but also discuss practical aspects of the analytical methods, highlighting not only...
Visuanimation in statistics

KAUST Repository

Genton, Marc G.

2015-04-14

This paper explores the use of visualization through animations, coined visuanimation, in the field of statistics. In particular, it illustrates the embedding of animations in the paper itself and the storage of larger movies in the online supplemental material. We present results from statistics research projects using a variety of visuanimations, ranging from exploratory data analysis of image data sets to spatio-temporal extreme event modelling; these include a multiscale analysis of classification methods, the study of the effects of a simulated explosive volcanic eruption and an emulation of climate model output. This paper serves as an illustration of visuanimation for future publications in Stat. Copyright © 2015 John Wiley & Sons, Ltd.
Pixel-Wise Classification Method for High Resolution Remote Sensing Imagery Using Deep Neural Networks

Directory of Open Access Journals (Sweden)

Rui Guo

2018-03-01

Full Text Available Considering the classification of high spatial resolution remote sensing imagery, this paper presents a novel classification method for such imagery using deep neural networks. Deep learning methods, such as a fully convolutional network (FCN model, achieve state-of-the-art performance in natural image semantic segmentation when provided with large-scale datasets and respective labels. To use data efficiently in the training stage, we first pre-segment training images and their labels into small patches as supplements of training data using graph-based segmentation and the selective search method. Subsequently, FCN with atrous convolution is used to perform pixel-wise classification. In the testing stage, post-processing with fully connected conditional random fields (CRFs is used to refine results. Extensive experiments based on the Vaihingen dataset demonstrate that our method performs better than the reference state-of-the-art networks when applied to high-resolution remote sensing imagery classification.
Why is the Diagnostic and Statistical Manual of Mental Disorders so hard to revise? Path-dependence and "lock-in" in classification.

Science.gov (United States)

Cooper, Rachel

2015-06-01

The latest edition of the Diagnostic and Statistical Manual of Mental Disorders, the D.S.M.-5, was published in May 2013. In the lead up to publication, radical changes to the classification were anticipated; there was widespread dissatisfaction with the previous edition and it was accepted that a "paradigm shift" might be required. In the end, however, and despite huge efforts at revision, the published D.S.M.-5 differs far less than originally envisaged from its predecessor. This paper considers why it is that revising the D.S.M. has become so difficult. The D.S.M. is such an important classification that this question is worth asking in its own right. The case of the D.S.M. can also serve as a study for considering stasis in classification more broadly; why and how can classifications become resistant to change? I suggest that classifications like the D.S.M. can be thought of as forming part of the infrastructure of science, and have much in common with material infrastructure. In particular, as with material technologies, it is possible for "path dependent" development to cause a sub-optimal classification to become "locked in" and hard to replace. Copyright © 2015 Elsevier Ltd. All rights reserved.
Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

Directory of Open Access Journals (Sweden)

Santana Isabel

2011-08-01

Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.
Statistical learning methods in high-energy and astrophysics analysis

Energy Technology Data Exchange (ETDEWEB)

Zimmermann, J. [Forschungszentrum Juelich GmbH, Zentrallabor fuer Elektronik, 52425 Juelich (Germany) and Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)]. E-mail: zimmerm@mppmu.mpg.de; Kiesling, C. [Max-Planck-Institut fuer Physik, Foehringer Ring 6, 80805 Munich (Germany)

2004-11-21

We discuss several popular statistical learning methods used in high-energy- and astro-physics analysis. After a short motivation for statistical learning we present the most popular algorithms and discuss several examples from current research in particle- and astro-physics. The statistical learning methods are compared with each other and with standard methods for the respective application.
Statistical learning methods in high-energy and astrophysics analysis

International Nuclear Information System (INIS)

Zimmermann, J.; Kiesling, C.

2004-01-01

We discuss several popular statistical learning methods used in high-energy- and astro-physics analysis. After a short motivation for statistical learning we present the most popular algorithms and discuss several examples from current research in particle- and astro-physics. The statistical learning methods are compared with each other and with standard methods for the respective application
A preliminary study for investigating idiopatic normal pressure hydrocephalus by means of statistical parameters classification of intracranial pressure recordings.

Science.gov (United States)

Calisto, A; Bramanti, A; Galeano, M; Angileri, F; Campobello, G; Serrano, S; Azzerboni, B

2009-01-01

The objective of this study is to investigate Id-iopatic Normal Pressure Hydrocephalus (INPH) through a multidimensional and multiparameter analysis of statistical data obtained from accurate analysis of Intracranial Pressure (ICP) recordings. Such a study could permit to detect new factors, correlated with therapeutic response, which are able to validate a predicting significance for infusion test. The algorithm developed by the authors computes 13 ICP parameter trends on each of the recording, afterward 9 statistical information from each trend is determined. All data are transferred to the datamining software WEKA. According to the exploited feature-selection techniques, the WEKA has revealed that the most significant statistical parameter is the maximum of Single-Wave-Amplitude: setting a 27 mmHg threshold leads to over 90% of correct classification.
A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

KAUST Repository

Abusamra, Heba

2013-01-01

Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.
Zubarev's Nonequilibrium Statistical Operator Method in the Generalized Statistics of Multiparticle Systems

Science.gov (United States)

Glushak, P. A.; Markiv, B. B.; Tokarchuk, M. V.

2018-01-01

We present a generalization of Zubarev's nonequilibrium statistical operator method based on the principle of maximum Renyi entropy. In the framework of this approach, we obtain transport equations for the basic set of parameters of the reduced description of nonequilibrium processes in a classical system of interacting particles using Liouville equations with fractional derivatives. For a classical systems of particles in a medium with a fractal structure, we obtain a non-Markovian diffusion equation with fractional spatial derivatives. For a concrete model of the frequency dependence of a memory function, we obtain generalized Kettano-type diffusion equation with the spatial and temporal fractality taken into account. We present a generalization of nonequilibrium thermofield dynamics in Zubarev's nonequilibrium statistical operator method in the framework of Renyi statistics.
Statistical methods and their applications in constructional engineering

International Nuclear Information System (INIS)

1977-01-01

An introduction into the basic terms of statistics is followed by a discussion of elements of the probability theory, customary discrete and continuous distributions, simulation methods, statistical supporting framework dynamics, and a cost-benefit analysis of the methods introduced. (RW) [de
Embedding filtering criteria into a wrapper marker selection method for brain tumor classification: an application on metabolic peak area ratios

International Nuclear Information System (INIS)

Kounelakis, M G; Zervakis, M E; Giakos, G C; Postma, G J; Buydens, L M C; Kotsiakis, X

2011-01-01

The purpose of this study is to identify reliable sets of metabolic markers that provide accurate classification of complex brain tumors and facilitate the process of clinical diagnosis. Several ratios of metabolites are tested alone or in combination with imaging markers. A wrapper feature selection and classification methodology is studied, employing Fisher's criterion for ranking the markers. The set of extracted markers that express statistical significance is further studied in terms of biological behavior with respect to the brain tumor type and grade. The outcome of this study indicates that the proposed method by exploiting the intrinsic properties of data can actually reveal reliable and biologically relevant sets of metabolic markers, which form an important adjunct toward a more accurate type and grade discrimination of complex brain tumors
Comparison of Danish dichotomous and BI-RADS classifications of mammographic density

DEFF Research Database (Denmark)

Hodge, Rebecca; Hellmann, Sophie Sell; von Euler-Chelpin, My

2014-01-01

BACKGROUND: In the Copenhagen mammography screening program from 1991 to 2001, mammographic density was classified either as fatty or mixed/dense. This dichotomous mammographic density classification system is unique internationally, and has not been validated before. PURPOSE: To compare the Danish...... dichotomous mammographic density classification system from 1991 to 2001 with the density BI-RADS classifications, in an attempt to validate the Danish classification system. MATERIAL AND METHODS: The study sample consisted of 120 mammograms taken in Copenhagen in 1991-2001, which tested false positive......, and which were in 2012 re-assessed and classified according to the BI-RADS classification system. We calculated inter-rater agreement between the Danish dichotomous mammographic classification as fatty or mixed/dense and the four-level BI-RADS classification by the linear weighted Kappa statistic. RESULTS...
An enhanced data visualization method for diesel engine malfunction classification using multi-sensor signals.

Science.gov (United States)

Li, Yiqing; Wang, Yu; Zi, Yanyang; Zhang, Mingquan

2015-10-21

The various multi-sensor signal features from a diesel engine constitute a complex high-dimensional dataset. The non-linear dimensionality reduction method, t-distributed stochastic neighbor embedding (t-SNE), provides an effective way to implement data visualization for complex high-dimensional data. However, irrelevant features can deteriorate the performance of data visualization, and thus, should be eliminated a priori. This paper proposes a feature subset score based t-SNE (FSS-t-SNE) data visualization method to deal with the high-dimensional data that are collected from multi-sensor signals. In this method, the optimal feature subset is constructed by a feature subset score criterion. Then the high-dimensional data are visualized in 2-dimension space. According to the UCI dataset test, FSS-t-SNE can effectively improve the classification accuracy. An experiment was performed with a large power marine diesel engine to validate the proposed method for diesel engine malfunction classification. Multi-sensor signals were collected by a cylinder vibration sensor and a cylinder pressure sensor. Compared with other conventional data visualization methods, the proposed method shows good visualization performance and high classification accuracy in multi-malfunction classification of a diesel engine.
[Galaxy/quasar classification based on nearest neighbor method].

Science.gov (United States)

Li, Xiang-Ru; Lu, Yu; Zhou, Jian-Ming; Wang, Yong-Jun

2011-09-01

With the wide application of high-quality CCD in celestial spectrum imagery and the implementation of many large sky survey programs (e. g., Sloan Digital Sky Survey (SDSS), Two-degree-Field Galaxy Redshift Survey (2dF), Spectroscopic Survey Telescope (SST), Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) program and Large Synoptic Survey Telescope (LSST) program, etc.), celestial observational data are coming into the world like torrential rain. Therefore, to utilize them effectively and fully, research on automated processing methods for celestial data is imperative. In the present work, we investigated how to recognizing galaxies and quasars from spectra based on nearest neighbor method. Galaxies and quasars are extragalactic objects, they are far away from earth, and their spectra are usually contaminated by various noise. Therefore, it is a typical problem to recognize these two types of spectra in automatic spectra classification. Furthermore, the utilized method, nearest neighbor, is one of the most typical, classic, mature algorithms in pattern recognition and data mining, and often is used as a benchmark in developing novel algorithm. For applicability in practice, it is shown that the recognition ratio of nearest neighbor method (NN) is comparable to the best results reported in the literature based on more complicated methods, and the superiority of NN is that this method does not need to be trained, which is useful in incremental learning and parallel computation in mass spectral data processing. In conclusion, the results in this work are helpful for studying galaxies and quasars spectra classification.
A hierarchical inferential method for indoor scene classification

Directory of Open Access Journals (Sweden)

Jiang Jingzhe

2017-12-01

Full Text Available Indoor scene classification forms a basis for scene interaction for service robots. The task is challenging because the layout and decoration of a scene vary considerably. Previous studies on knowledge-based methods commonly ignore the importance of visual attributes when constructing the knowledge base. These shortcomings restrict the performance of classification. The structure of a semantic hierarchy was proposed to describe similarities of different parts of scenes in a fine-grained way. Besides the commonly used semantic features, visual attributes were also introduced to construct the knowledge base. Inspired by the processes of human cognition and the characteristics of indoor scenes, we proposed an inferential framework based on the Markov logic network. The framework is evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.
Methodical approaches to development of classification state methods of regulation business activity in fishery

OpenAIRE

She Son Gun

2014-01-01

Approaches to development of classification of the state methods of regulation of economy are considered. On the basis of the provided review the complex method of state regulation of business activity is reasonable. The offered principles allow improving public administration and can be used in industry concepts and state programs on support of small business in fishery.
CCM: A Text Classification Method by Clustering

DEFF Research Database (Denmark)

Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

2011-01-01

In this paper, a new Cluster based Classification Model (CCM) for suspicious email detection and other text classification tasks, is presented. Comparative experiments of the proposed model against traditional classification models and the boosting algorithm are also discussed. Experimental results...... show that the CCM outperforms traditional classification models as well as the boosting algorithm for the task of suspicious email detection on terrorism domain email dataset and topic categorization on the Reuters-21578 and 20 Newsgroups datasets. The overall finding is that applying a cluster based...
Online Statistics Labs in MSW Research Methods Courses: Reducing Reluctance toward Statistics

Science.gov (United States)

Elliott, William; Choi, Eunhee; Friedline, Terri

2013-01-01

This article presents results from an evaluation of an online statistics lab as part of a foundations research methods course for master's-level social work students. The article discusses factors that contribute to an environment in social work that fosters attitudes of reluctance toward learning and teaching statistics in research methods…
Spatial analysis statistics, visualization, and computational methods

CERN Document Server

Oyana, Tonny J

2015-01-01

An introductory text for the next generation of geospatial analysts and data scientists, Spatial Analysis: Statistics, Visualization, and Computational Methods focuses on the fundamentals of spatial analysis using traditional, contemporary, and computational methods. Outlining both non-spatial and spatial statistical concepts, the authors present practical applications of geospatial data tools, techniques, and strategies in geographic studies. They offer a problem-based learning (PBL) approach to spatial analysis-containing hands-on problem-sets that can be worked out in MS Excel or ArcGIS-as well as detailed illustrations and numerous case studies. The book enables readers to: Identify types and characterize non-spatial and spatial data Demonstrate their competence to explore, visualize, summarize, analyze, optimize, and clearly present statistical data and results Construct testable hypotheses that require inferential statistical analysis Process spatial data, extract explanatory variables, conduct statisti...

Dermal and inhalation acute toxic class methods: test procedures and biometric evaluations for the Globally Harmonized Classification System.

Science.gov (United States)

Holzhütter, H G; Genschow, E; Diener, W; Schlede, E

2003-05-01

The acute toxic class (ATC) methods were developed for determining LD(50)/LC(50) estimates of chemical substances with significantly fewer animals than needed when applying conventional LD(50)/LC(50) tests. The ATC methods are sequential stepwise procedures with fixed starting doses/concentrations and a maximum of six animals used per dose/concentration. The numbers of dead/moribund animals determine whether further testing is necessary or whether the test is terminated. In recent years we have developed classification procedures for the oral, dermal and inhalation routes of administration by using biometric methods. The biometric approach assumes a probit model for the mortality probability of a single animal and assigns the chemical to that toxicity class for which the best concordance is achieved between the statistically expected and the observed numbers of dead/moribund animals at the various steps of the test procedure. In previous publications we have demonstrated the validity of the biometric ATC methods on the basis of data obtained for the oral ATC method in two-animal ring studies with 15 participants from six countries. Although the test procedures and biometric evaluations for the dermal and inhalation ATC methods have already been published, there was a need for an adaptation of the classification schemes to the starting doses/concentrations of the Globally Harmonized Classification System (GHS) recently adopted by the Organization for Economic Co-operation and Development (OECD). Here we present the biometric evaluation of the dermal and inhalation ATC methods for the starting doses/concentrations of the GHS and of some other international classification systems still in use. We have developed new test procedures and decision rules for the dermal and inhalation ATC methods, which require significantly fewer animals to provide predictions of toxicity classes, that are equally good or even better than those achieved by using the conventional LD(50)/LC
A Multi-Classification Method of Improved SVM-based Information Fusion for Traffic Parameters Forecasting

Directory of Open Access Journals (Sweden)

Hongzhuan Zhao

2016-04-01

Full Text Available With the enrichment of perception methods, modern transportation system has many physical objects whose states are influenced by many information factors so that it is a typical Cyber-Physical System (CPS. Thus, the traffic information is generally multi-sourced, heterogeneous and hierarchical. Existing research results show that the multisourced traffic information through accurate classification in the process of information fusion can achieve better parameters forecasting performance. For solving the problem of traffic information accurate classification, via analysing the characteristics of the multi-sourced traffic information and using redefined binary tree to overcome the shortcomings of the original Support Vector Machine (SVM classification in information fusion, a multi-classification method using improved SVM in information fusion for traffic parameters forecasting is proposed. The experiment was conducted to examine the performance of the proposed scheme, and the results reveal that the method can get more accurate and practical outcomes.
Site Classification using Multichannel Channel Analysis of Surface Wave (MASW) method on Soft and Hard Ground

Science.gov (United States)

Ashraf, M. A. M.; Kumar, N. S.; Yusoh, R.; Hazreek, Z. A. M.; Aziman, M.

2018-04-01

Site classification utilizing average shear wave velocity (Vs(30) up to 30 meters depth is a typical parameter. Numerous geophysical methods have been proposed for estimation of shear wave velocity by utilizing assortment of testing configuration, processing method, and inversion algorithm. Multichannel Analysis of Surface Wave (MASW) method is been rehearsed by numerous specialist and professional to geotechnical engineering for local site characterization and classification. This study aims to determine the site classification on soft and hard ground using MASW method. The subsurface classification was made utilizing National Earthquake Hazards Reduction Program (NERHP) and international Building Code (IBC) classification. Two sites are chosen to acquire the shear wave velocity which is in the state of Pulau Pinang for soft soil and Perlis for hard rock. Results recommend that MASW technique can be utilized to spatially calculate the distribution of shear wave velocity (Vs(30)) in soil and rock to characterize areas.
Intelligent Computer Vision System for Automated Classification

International Nuclear Information System (INIS)

Jordanov, Ivan; Georgieva, Antoniya

2010-01-01

In this paper we investigate an Intelligent Computer Vision System applied for recognition and classification of commercially available cork tiles. The system is capable of acquiring and processing gray images using several feature generation and analysis techniques. Its functionality includes image acquisition, feature extraction and preprocessing, and feature classification with neural networks (NN). We also discuss system test and validation results from the recognition and classification tasks. The system investigation also includes statistical feature processing (features number and dimensionality reduction techniques) and classifier design (NN architecture, target coding, learning complexity and performance, and training with our own metaheuristic optimization method). The NNs trained with our genetic low-discrepancy search method (GLPτS) for global optimisation demonstrated very good generalisation abilities. In our view, the reported testing success rate of up to 95% is due to several factors: combination of feature generation techniques; application of Analysis of Variance (ANOVA) and Principal Component Analysis (PCA), which appeared to be very efficient for preprocessing the data; and use of suitable NN design and learning method.
Faults Classification Of Power Electronic Circuits Based On A Support Vector Data Description Method

Directory of Open Access Journals (Sweden)

Cui Jiang

2015-06-01

Full Text Available Power electronic circuits (PECs are prone to various failures, whose classification is of paramount importance. This paper presents a data-driven based fault diagnosis technique, which employs a support vector data description (SVDD method to perform fault classification of PECs. In the presented method, fault signals (e.g. currents, voltages, etc. are collected from accessible nodes of circuits, and then signal processing techniques (e.g. Fourier analysis, wavelet transform, etc. are adopted to extract feature samples, which are subsequently used to perform offline machine learning. Finally, the SVDD classifier is used to implement fault classification task. However, in some cases, the conventional SVDD cannot achieve good classification performance, because this classifier may generate some so-called refusal areas (RAs, and in our design these RAs are resolved with the one-against-one support vector machine (SVM classifier. The obtained experiment results from simulated and actual circuits demonstrate that the improved SVDD has a classification performance close to the conventional one-against-one SVM, and can be applied to fault classification of PECs in practice.
Application of texture analysis method for mammogram density classification

Science.gov (United States)

Nithya, R.; Santhi, B.

2017-07-01

Mammographic density is considered a major risk factor for developing breast cancer. This paper proposes an automated approach to classify breast tissue types in digital mammogram. The main objective of the proposed Computer-Aided Diagnosis (CAD) system is to investigate various feature extraction methods and classifiers to improve the diagnostic accuracy in mammogram density classification. Texture analysis methods are used to extract the features from the mammogram. Texture features are extracted by using histogram, Gray Level Co-Occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Difference Matrix (GLDM), Local Binary Pattern (LBP), Entropy, Discrete Wavelet Transform (DWT), Wavelet Packet Transform (WPT), Gabor transform and trace transform. These extracted features are selected using Analysis of Variance (ANOVA). The features selected by ANOVA are fed into the classifiers to characterize the mammogram into two-class (fatty/dense) and three-class (fatty/glandular/dense) breast density classification. This work has been carried out by using the mini-Mammographic Image Analysis Society (MIAS) database. Five classifiers are employed namely, Artificial Neural Network (ANN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), K-Nearest Neighbor (KNN), and Support Vector Machine (SVM). Experimental results show that ANN provides better performance than LDA, NB, KNN and SVM classifiers. The proposed methodology has achieved 97.5% accuracy for three-class and 99.37% for two-class density classification.
Building an asynchronous web-based tool for machine learning classification.

Science.gov (United States)

Weber, Griffin; Vinterbo, Staal; Ohno-Machado, Lucila

2002-01-01

Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not been conducted. We developed free software that implements logistic regression with stepwise variable selection as a quick and simple method for initial exploration of important genetic markers in disease classification. To implement the algorithm and allow our collaborators in remote locations to evaluate and compare its results against those of other methods, we developed a user-friendly asynchronous web-based application with a minimal amount of programming using free, downloadable software tools. With this program, we show that classification using logistic regression can perform as well as other more sophisticated algorithms, and it has the advantages of being easy to interpret and reproduce. By making the tool freely and easily available, we hope to promote the comparison of classification methods. In addition, we believe our web application can be used as a model for other bioinformatics laboratories that need to develop web-based analysis tools in a short amount of time and on a limited budget.
Data Processing And Machine Learning Methods For Multi-Modal Operator State Classification Systems

Science.gov (United States)

Hearn, Tristan A.

2015-01-01

This document is intended as an introduction to a set of common signal processing learning methods that may be used in the software portion of a functional crew state monitoring system. This includes overviews of both the theory of the methods involved, as well as examples of implementation. Practical considerations are discussed for implementing modular, flexible, and scalable processing and classification software for a multi-modal, multi-channel monitoring system. Example source code is also given for all of the discussed processing and classification methods.
Statistical-mechanical entropy by the thin-layer method

International Nuclear Information System (INIS)

Feng, He; Kim, Sung Won

2003-01-01

G. Hooft first studied the statistical-mechanical entropy of a scalar field in a Schwarzschild black hole background by the brick-wall method and hinted that the statistical-mechanical entropy is the statistical origin of the Bekenstein-Hawking entropy of the black hole. However, according to our viewpoint, the statistical-mechanical entropy is only a quantum correction to the Bekenstein-Hawking entropy of the black-hole. The brick-wall method based on thermal equilibrium at a large scale cannot be applied to the cases out of equilibrium such as a nonstationary black hole. The statistical-mechanical entropy of a scalar field in a nonstationary black hole background is calculated by the thin-layer method. The condition of local equilibrium near the horizon of the black hole is used as a working postulate and is maintained for a black hole which evaporates slowly enough and whose mass is far greater than the Planck mass. The statistical-mechanical entropy is also proportional to the area of the black hole horizon. The difference from the stationary black hole is that the result relies on a time-dependent cutoff
Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection.

Science.gov (United States)

Chen, Yifei; Sun, Yuxing; Han, Bing-Qing

2015-01-01

Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.
Method for statistical data analysis of multivariate observations

CERN Document Server

Gnanadesikan, R

1997-01-01

A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte
The Statistics of Health and Longevity

DEFF Research Database (Denmark)

Zarulli, Virginia

Increases in human longevity have made it critical to distinguish healthy longevity from longevity without regard to health. We present a new method for calculating the statistics of healthy longevity which extends, in several directions, current calculations of health expectancy (HE) and disabil......Increases in human longevity have made it critical to distinguish healthy longevity from longevity without regard to health. We present a new method for calculating the statistics of healthy longevity which extends, in several directions, current calculations of health expectancy (HE......) and disability-adjusted life years (DALYs), from data on prevalence of health conditions. Current methods focus on binary conditions (e.g., disabled or not disabled) or on categorical classifications (e.g. in good, poor, or very bad health) and report only expectations. Our method, based on Markov chain theory...
Scaling theory and the classification of phase transitions

International Nuclear Information System (INIS)

Hilfer, R.

1992-01-01

In this paper, the recent classification theory for phase transitions and its relation with the foundations of statistical physics is reviewed. First it is outlined how Ehrenfests classification scheme can be generalized into a general thermodynamic classification theory for phase transitions. The classification theory implies scaling and multiscaling thereby eliminating the need to postulate the scaling hypothesis as a fourth law of thermodynamics. The new classification has also led to the discovery and distinction of nonequilibrium transitions within equilibrium statistical physics. Nonequilibrium phase transitions are distinguished from equilibrium transitions by orders less than unity and by the fact the equilibrium thermodynamics and statistical mechanics become inapplicable at the critical point. The latter fact requires a change in the Gibbs assumption underlying the canonical and grandcanonical ensembles in order to recover the thermodynamic description in the critical limit
Analysis of Statistical Methods Currently used in Toxicology Journals.

Science.gov (United States)

Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min

2014-09-01

Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.
Tumor Classification Using High-Order Gene Expression Profiles Based on Multilinear ICA

Directory of Open Access Journals (Sweden)

Ming-gang Du

2009-01-01

Full Text Available Motivation. Independent Components Analysis (ICA maximizes the statistical independence of the representational components of a training gene expression profiles (GEP ensemble, but it cannot distinguish relations between the different factors, or different modes, and it is not available to high-order GEP Data Mining. In order to generalize ICA, we introduce Multilinear-ICA and apply it to tumor classification using high order GEP. Firstly, we introduce the basis conceptions and operations of tensor and recommend Support Vector Machine (SVM classifier and Multilinear-ICA. Secondly, the higher score genes of original high order GEP are selected by using t-statistics and tabulate tensors. Thirdly, the tensors are performed by Multilinear-ICA. Finally, the SVM is used to classify the tumor subtypes. Results. To show the validity of the proposed method, we apply it to tumor classification using high order GEP. Though we only use three datasets, the experimental results show that the method is effective and feasible. Through this survey, we hope to gain some insight into the problem of high order GEP tumor classification, in aid of further developing more effective tumor classification algorithms.
Statistical inference for template aging

Science.gov (United States)

Schuckers, Michael E.

2006-04-01

A change in classification error rates for a biometric device is often referred to as template aging. Here we offer two methods for determining whether the effect of time is statistically significant. The first of these is the use of a generalized linear model to determine if these error rates change linearly over time. This approach generalizes previous work assessing the impact of covariates using generalized linear models. The second approach uses of likelihood ratio tests methodology. The focus here is on statistical methods for estimation not the underlying cause of the change in error rates over time. These methodologies are applied to data from the National Institutes of Standards and Technology Biometric Score Set Release 1. The results of these applications are discussed.
Effects of Feature Extraction and Classification Methods on Cyberbully Detection

Directory of Open Access Journals (Sweden)

Esra SARAÇ

2016-12-01

Full Text Available Cyberbullying is defined as an aggressive, intentional action against a defenseless person by using the Internet, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended in suicides; hence automatic detection of cyberbullying has become important. In this study we show the effects of feature extraction, feature selection, and classification methods that are used, on the performance of automatic detection of cyberbullying. To perform the experiments FormSpring.me dataset is used and the effects of preprocessing methods; several classifiers like C4.5, Naïve Bayes, kNN, and SVM; and information gain and chi square feature selection methods are investigated. Experimental results indicate that the best classification results are obtained when alphabetic tokenization, no stemming, and no stopwords removal are applied. Using feature selection also improves cyberbully detection performance. When classifiers are compared, C4.5 performs the best for the used dataset.
Application of nonparametric statistic method for DNBR limit calculation

International Nuclear Information System (INIS)

Dong Bo; Kuang Bo; Zhu Xuenong

2013-01-01

Background: Nonparametric statistical method is a kind of statistical inference method not depending on a certain distribution; it calculates the tolerance limits under certain probability level and confidence through sampling methods. The DNBR margin is one important parameter of NPP design, which presents the safety level of NPP. Purpose and Methods: This paper uses nonparametric statistical method basing on Wilks formula and VIPER-01 subchannel analysis code to calculate the DNBR design limits (DL) of 300 MW NPP (Nuclear Power Plant) during the complete loss of flow accident, simultaneously compared with the DL of DNBR through means of ITDP to get certain DNBR margin. Results: The results indicate that this method can gain 2.96% DNBR margin more than that obtained by ITDP methodology. Conclusions: Because of the reduction of the conservation during analysis process, the nonparametric statistical method can provide greater DNBR margin and the increase of DNBR margin is benefited for the upgrading of core refuel scheme. (authors)
Conjugate-Gradient Neural Networks in Classification of Multisource and Very-High-Dimensional Remote Sensing Data

Science.gov (United States)

Benediktsson, J. A.; Swain, P. H.; Ersoy, O. K.

1993-01-01

Application of neural networks to classification of remote sensing data is discussed. Conventional two-layer backpropagation is found to give good results in classification of remote sensing data but is not efficient in training. A more efficient variant, based on conjugate-gradient optimization, is used for classification of multisource remote sensing and geographic data and very-high-dimensional data. The conjugate-gradient neural networks give excellent performance in classification of multisource data, but do not compare as well with statistical methods in classification of very-high-dimentional data.
The paradox of atheoretical classification

DEFF Research Database (Denmark)

Hjørland, Birger

2016-01-01

A distinction can be made between “artificial classifications” and “natural classifications,” where artificial classifications may adequately serve some limited purposes, but natural classifications are overall most fruitful by allowing inference and thus many different purposes. There is strong...... support for the view that a natural classification should be based on a theory (and, of course, that the most fruitful theory provides the most fruitful classification). Nevertheless, atheoretical (or “descriptive”) classifications are often produced. Paradoxically, atheoretical classifications may...... be very successful. The best example of a successful “atheoretical” classification is probably the prestigious Diagnostic and Statistical Manual of Mental Disorders (DSM) since its third edition from 1980. Based on such successes one may ask: Should the claim that classifications ideally are natural...

Classification across gene expression microarray studies

Directory of Open Access Journals (Sweden)

Kuner Ruprecht

2009-12-01

Full Text Available Abstract Background The increasing number of gene expression microarray studies represents an important resource in biomedical research. As a result, gene expression based diagnosis has entered clinical practice for patient stratification in breast cancer. However, the integration and combined analysis of microarray studies remains still a challenge. We assessed the potential benefit of data integration on the classification accuracy and systematically evaluated the generalization performance of selected methods on four breast cancer studies comprising almost 1000 independent samples. To this end, we introduced an evaluation framework which aims to establish good statistical practice and a graphical way to monitor differences. The classification goal was to correctly predict estrogen receptor status (negative/positive and histological grade (low/high of each tumor sample in an independent study which was not used for the training. For the classification we chose support vector machines (SVM, predictive analysis of microarrays (PAM, random forest (RF and k-top scoring pairs (kTSP. Guided by considerations relevant for classification across studies we developed a generalization of kTSP which we evaluated in addition. Our derived version (DV aims to improve the robustness of the intrinsic invariance of kTSP with respect to technologies and preprocessing. Results For each individual study the generalization error was benchmarked via complete cross-validation and was found to be similar for all classification methods. The misclassification rates were substantially higher in classification across studies, when each single study was used as an independent test set while all remaining studies were combined for the training of the classifier. However, with increasing number of independent microarray studies used in the training, the overall classification performance improved. DV performed better than the average and showed slightly less variance. In
Thermodynamics, Gibbs Method and Statistical Physics of Electron Gases Gibbs Method and Statistical Physics of Electron Gases

CERN Document Server

Askerov, Bahram M

2010-01-01

This book deals with theoretical thermodynamics and the statistical physics of electron and particle gases. While treating the laws of thermodynamics from both classical and quantum theoretical viewpoints, it posits that the basis of the statistical theory of macroscopic properties of a system is the microcanonical distribution of isolated systems, from which all canonical distributions stem. To calculate the free energy, the Gibbs method is applied to ideal and non-ideal gases, and also to a crystalline solid. Considerable attention is paid to the Fermi-Dirac and Bose-Einstein quantum statistics and its application to different quantum gases, and electron gas in both metals and semiconductors is considered in a nonequilibrium state. A separate chapter treats the statistical theory of thermodynamic properties of an electron gas in a quantizing magnetic field.
Recognition Number of The Vehicle Plate Using Otsu Method and K-Nearest Neighbour Classification

Directory of Open Access Journals (Sweden)

Maulidia Rahmah Hidayah

2017-05-01

Full Text Available The current topic that is interesting as a solution of the impact of public service improvement toward vehicle is License Plate Recognition (LPR, but it still needs to develop the research of LPR method. Some of the previous researchs showed that K-Nearest Neighbour (KNN succeed in car license plate recognition. The Objectives of this research was to determine the implementation and accuracy of Otsu Method toward license plate recognition. The method of this research was Otsu method to extract the characteristics and image of the plate into binary image and KNN as recognition classification method of each character. The development of the license plate recognition program by using Otsu method and classification of KNN is following the steps of pattern recognition, such as input and sensing, pre-processing, extraction feature Otsu method binary, segmentation, KNN classification method and post-processing by calculating the level of accuracy. The study showed that this program can recognize by 82% from 100 test plate with 93,75% of number recognition accuracy and 91,92% of letter recognition accuracy.
The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm

Directory of Open Access Journals (Sweden)

Jianning Wu

2015-01-01

Full Text Available The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis.
The novel quantitative technique for assessment of gait symmetry using advanced statistical learning algorithm.

Science.gov (United States)

Wu, Jianning; Wu, Bin

2015-01-01

The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis.
Improved algorithms for the classification of rough rice using a bionic electronic nose based on PCA and the Wilks distribution.

Science.gov (United States)

Xu, Sai; Zhou, Zhiyan; Lu, Huazhong; Luo, Xiwen; Lan, Yubin

2014-03-19

Principal Component Analysis (PCA) is one of the main methods used for electronic nose pattern recognition. However, poor classification performance is common in classification and recognition when using regular PCA. This paper aims to improve the classification performance of regular PCA based on the existing Wilks Λ-statistic (i.e., combined PCA with the Wilks distribution). The improved algorithms, which combine regular PCA with the Wilks Λ-statistic, were developed after analysing the functionality and defects of PCA. Verification tests were conducted using a PEN3 electronic nose. The collected samples consisted of the volatiles of six varieties of rough rice (Zhongxiang1, Xiangwan13, Yaopingxiang, WufengyouT025, Pin 36, and Youyou122), grown in same area and season. The first two principal components used as analysis vectors cannot perform the rough rice varieties classification task based on a regular PCA. Using the improved algorithms, which combine the regular PCA with the Wilks Λ-statistic, many different principal components were selected as analysis vectors. The set of data points of the Mahalanobis distance between each of the varieties of rough rice was selected to estimate the performance of the classification. The result illustrates that the rough rice varieties classification task is achieved well using the improved algorithm. A Probabilistic Neural Networks (PNN) was also established to test the effectiveness of the improved algorithms. The first two principal components (namely PC1 and PC2) and the first and fifth principal component (namely PC1 and PC5) were selected as the inputs of PNN for the classification of the six rough rice varieties. The results indicate that the classification accuracy based on the improved algorithm was improved by 6.67% compared to the results of the regular method. These results prove the effectiveness of using the Wilks Λ-statistic to improve the classification accuracy of the regular PCA approach. The results
Methods library of embedded R functions at Statistics Norway

Directory of Open Access Journals (Sweden)

Øyvind Langsrud

2017-11-01

Full Text Available Statistics Norway is modernising the production processes. An important element in this work is a library of functions for statistical computations. In principle, the functions in such a methods library can be programmed in several languages. A modernised production environment demand that these functions can be reused for different statistics products, and that they are embedded within a common IT system. The embedding should be done in such a way that the users of the methods do not need to know the underlying programming language. As a proof of concept, Statistics Norway soon has established a methods library offering a limited number of methods for macro-editing, imputation and confidentiality. This is done within an area of municipal statistics with R as the only programming language. This paper presents the details and experiences from this work. The problem of fitting real word applications to simple and strict standards is discussed and exemplified by the development of solutions to regression imputation and table suppression.
Complex Data Modeling and Computationally Intensive Statistical Methods

CERN Document Server

Mantovan, Pietro

2010-01-01

The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici
Controlling a human-computer interface system with a novel classification method that uses electrooculography signals.

Science.gov (United States)

Wu, Shang-Lin; Liao, Lun-De; Lu, Shao-Wei; Jiang, Wei-Ling; Chen, Shi-An; Lin, Chin-Teng

2013-08-01

Electrooculography (EOG) signals can be used to control human-computer interface (HCI) systems, if properly classified. The ability to measure and process these signals may help HCI users to overcome many of the physical limitations and inconveniences in daily life. However, there are currently no effective multidirectional classification methods for monitoring eye movements. Here, we describe a classification method used in a wireless EOG-based HCI device for detecting eye movements in eight directions. This device includes wireless EOG signal acquisition components, wet electrodes and an EOG signal classification algorithm. The EOG classification algorithm is based on extracting features from the electrical signals corresponding to eight directions of eye movement (up, down, left, right, up-left, down-left, up-right, and down-right) and blinking. The recognition and processing of these eight different features were achieved in real-life conditions, demonstrating that this device can reliably measure the features of EOG signals. This system and its classification procedure provide an effective method for identifying eye movements. Additionally, it may be applied to study eye functions in real-life conditions in the near future.
Image Classification Workflow Using Machine Learning Methods

Science.gov (United States)

Christoffersen, M. S.; Roser, M.; Valadez-Vergara, R.; Fernández-Vega, J. A.; Pierce, S. A.; Arora, R.

2016-12-01

Recent increases in the availability and quality of remote sensing datasets have fueled an increasing number of scientifically significant discoveries based on land use classification and land use change analysis. However, much of the software made to work with remote sensing data products, specifically multispectral images, is commercial and often prohibitively expensive. The free to use solutions that are currently available come bundled up as small parts of much larger programs that are very susceptible to bugs and difficult to install and configure. What is needed is a compact, easy to use set of tools to perform land use analysis on multispectral images. To address this need, we have developed software using the Python programming language with the sole function of land use classification and land use change analysis. We chose Python to develop our software because it is relatively readable, has a large body of relevant third party libraries such as GDAL and Spectral Python, and is free to install and use on Windows, Linux, and Macintosh operating systems. In order to test our classification software, we performed a K-means unsupervised classification, Gaussian Maximum Likelihood supervised classification, and a Mahalanobis Distance based supervised classification. The images used for testing were three Landsat rasters of Austin, Texas with a spatial resolution of 60 meters for the years of 1984 and 1999, and 30 meters for the year 2015. The testing dataset was easily downloaded using the Earth Explorer application produced by the USGS. The software should be able to perform classification based on any set of multispectral rasters with little to no modification. Our software makes the ease of land use classification using commercial software available without an expensive license.
Deep-learnt classification of light curves

DEFF Research Database (Denmark)

Mahabal, Ashish; Gieseke, Fabian; Pai, Akshay Sadananda Uppinakudru

2017-01-01

is to derive statistical features from the time series and to use machine learning methods, generally supervised, to separate objects into a few of the standard classes. In this work, we transform the time series to two-dimensional light curve representations in order to classify them using modern deep......Astronomy light curves are sparse, gappy, and heteroscedastic. As a result standard time series methods regularly used for financial and similar datasets are of little help and astronomers are usually left to their own instruments and techniques to classify light curves. A common approach...... learning techniques. In particular, we show that convolutional neural networks based classifiers work well for broad characterization and classification. We use labeled datasets of periodic variables from CRTS survey and show how this opens doors for a quick classification of diverse classes with several...
A NEW CLASSIFICATION METHOD FOR GAMMA-RAY BURSTS

International Nuclear Information System (INIS)

Lue Houjun; Liang Enwei; Zhang Binbin; Zhang Bing

2010-01-01

Recent Swift observations suggest that the traditional long versus short gamma-ray burst (GRB) classification scheme does not always associate GRBs to the two physically motivated model types, i.e., Type II (massive star origin) versus Type I (compact star origin). We propose a new phenomenological classification method of GRBs by introducing a new parameter ε = E γ,iso,52 /E 5/3 p,z,2 , where E γ,iso is the isotropic gamma-ray energy (in units of 10 52 erg) and E p,z is the cosmic rest-frame spectral peak energy (in units of 100 keV). For those short GRBs with 'extended emission', both quantities are defined for the short/hard spike only. With the current complete sample of GRBs with redshift and E p measurements, the ε parameter shows a clear bimodal distribution with a separation at ε ∼ 0.03. The high-ε region encloses the typical long GRBs with high luminosity, some high-z 'rest-frame-short' GRBs (such as GRB 090423 and GRB 080913), as well as some high-z short GRBs (such as GRB 090426). All these GRBs have been claimed to be of Type II origin based on other observational properties in the literature. All the GRBs that are argued to be of Type I origin are found to be clustered in the low-ε region. They can be separated from some nearby low-luminosity long GRBs (in 3σ) by an additional T 90 criterion, i.e., T 90,z ∼< 5 s in the Swift/BAT band. We suggest that this new classification scheme can better match the physically motivated Type II/I classification scheme.
Comparison between Possibilistic c-Means (PCM and Artificial Neural Network (ANN Classification Algorithms in Land use/ Land cover Classification

Directory of Open Access Journals (Sweden)

Ganchimeg Ganbold

2017-03-01

Full Text Available There are several statistical classification algorithms available for landuse/land cover classification. However, each has a certain bias orcompromise. Some methods like the parallel piped approach in supervisedclassification, cannot classify continuous regions within a feature. Onthe other hand, while unsupervised classification method takes maximumadvantage of spectral variability in an image, the maximally separableclusters in spectral space may not do much for our perception of importantclasses in a given study area. In this research, the output of an ANNalgorithm was compared with the Possibilistic c-Means an improvementof the fuzzy c-Means on both moderate resolutions Landsat8 and a highresolution Formosat 2 images. The Formosat 2 image comes with an8m spectral resolution on the multispectral data. This multispectral imagedata was resampled to 10m in order to maintain a uniform ratio of1:3 against Landsat 8 image. Six classes were chosen for analysis including:Dense forest, eucalyptus, water, grassland, wheat and riverine sand. Using a standard false color composite (FCC, the six features reflecteddifferently in the infrared region with wheat producing the brightestpixel values. Signature collection per class was therefore easily obtainedfor all classifications. The output of both ANN and FCM, were analyzedseparately for accuracy and an error matrix generated to assess the qualityand accuracy of the classification algorithms. When you compare theresults of the two methods on a per-class-basis, ANN had a crisperoutput compared to PCM which yielded clusters with pixels especiallyon the moderate resolution Landsat 8 imagery.
On the statistical assessment of classifiers using DNA microarray data

Directory of Open Access Journals (Sweden)

Carella M

2006-08-01

Full Text Available Abstract Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22 and tumor (25 specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045 as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS and Support Vector Machines (SVM classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035 and e = 18% (p = 0.037 respectively. Moreover, the error rate
Descriptive and inferential statistical methods used in burns research.

Science.gov (United States)

Al-Benna, Sammy; Al-Ajam, Yazan; Way, Benjamin; Steinstraesser, Lars

2010-05-01

Burns research articles utilise a variety of descriptive and inferential methods to present and analyse data. The aim of this study was to determine the descriptive methods (e.g. mean, median, SD, range, etc.) and survey the use of inferential methods (statistical tests) used in articles in the journal Burns. This study defined its population as all original articles published in the journal Burns in 2007. Letters to the editor, brief reports, reviews, and case reports were excluded. Study characteristics, use of descriptive statistics and the number and types of statistical methods employed were evaluated. Of the 51 articles analysed, 11(22%) were randomised controlled trials, 18(35%) were cohort studies, 11(22%) were case control studies and 11(22%) were case series. The study design and objectives were defined in all articles. All articles made use of continuous and descriptive data. Inferential statistics were used in 49(96%) articles. Data dispersion was calculated by standard deviation in 30(59%). Standard error of the mean was quoted in 19(37%). The statistical software product was named in 33(65%). Of the 49 articles that used inferential statistics, the tests were named in 47(96%). The 6 most common tests used (Student's t-test (53%), analysis of variance/co-variance (33%), chi(2) test (27%), Wilcoxon & Mann-Whitney tests (22%), Fisher's exact test (12%)) accounted for the majority (72%) of statistical methods employed. A specified significance level was named in 43(88%) and the exact significance levels were reported in 28(57%). Descriptive analysis and basic statistical techniques account for most of the statistical tests reported. This information should prove useful in deciding which tests should be emphasised in educating burn care professionals. These results highlight the need for burn care professionals to have a sound understanding of basic statistics, which is crucial in interpreting and reporting data. Advice should be sought from professionals
Automated classification of Permanent Scatterers time-series based on statistical characterization tests

Science.gov (United States)

Berti, Matteo; Corsini, Alessandro; Franceschini, Silvia; Iannacone, Jean Pascal

2013-04-01

The application of space borne synthetic aperture radar interferometry has progressed, over the last two decades, from the pioneer use of single interferograms for analyzing changes on the earth's surface to the development of advanced multi-interferogram techniques to analyze any sort of natural phenomena which involves movements of the ground. The success of multi-interferograms techniques in the analysis of natural hazards such as landslides and subsidence is widely documented in the scientific literature and demonstrated by the consensus among the end-users. Despite the great potential of this technique, radar interpretation of slope movements is generally based on the sole analysis of average displacement velocities, while the information embraced in multi interferogram time series is often overlooked if not completely neglected. The underuse of PS time series is probably due to the detrimental effect of residual atmospheric errors, which make the PS time series characterized by erratic, irregular fluctuations often difficult to interpret, and also to the difficulty of performing a visual, supervised analysis of the time series for a large dataset. In this work is we present a procedure for automatic classification of PS time series based on a series of statistical characterization tests. The procedure allows to classify the time series into six distinctive target trends (0=uncorrelated; 1=linear; 2=quadratic; 3=bilinear; 4=discontinuous without constant velocity; 5=discontinuous with change in velocity) and retrieve for each trend a series of descriptive parameters which can be efficiently used to characterize the temporal changes of ground motion. The classification algorithms were developed and tested using an ENVISAT datasets available in the frame of EPRS-E project (Extraordinary Plan of Environmental Remote Sensing) of the Italian Ministry of Environment (track "Modena", Northern Apennines). This dataset was generated using standard processing, then the
High Dimensional Classification Using Features Annealed Independence Rules.

Science.gov (United States)

Fan, Jianqing; Fan, Yingying

2008-01-01

Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
Statistical Methods for Particle Physics (4/4)

CERN Multimedia

CERN. Geneva

2012-01-01

The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.
Statistical Methods for Particle Physics (1/4)

CERN Multimedia

CERN. Geneva

2012-01-01

The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.
Statistical Methods for Particle Physics (2/4)

CERN Multimedia

CERN. Geneva

2012-01-01

The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.

Statistical Methods for Particle Physics (3/4)

CERN Multimedia

CERN. Geneva

2012-01-01

The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.
Statistical methods for spatio-temporal systems

CERN Document Server

Finkenstadt, Barbel

2006-01-01

Statistical Methods for Spatio-Temporal Systems presents current statistical research issues on spatio-temporal data modeling and will promote advances in research and a greater understanding between the mechanistic and the statistical modeling communities.Contributed by leading researchers in the field, each self-contained chapter starts with an introduction of the topic and progresses to recent research results. Presenting specific examples of epidemic data of bovine tuberculosis, gastroenteric disease, and the U.K. foot-and-mouth outbreak, the first chapter uses stochastic models, such as point process models, to provide the probabilistic backbone that facilitates statistical inference from data. The next chapter discusses the critical issue of modeling random growth objects in diverse biological systems, such as bacteria colonies, tumors, and plant populations. The subsequent chapter examines data transformation tools using examples from ecology and air quality data, followed by a chapter on space-time co...
Mental Task Classification Scheme Utilizing Correlation Coefficient Extracted from Interchannel Intrinsic Mode Function.

Science.gov (United States)

Rahman, Md Mostafizur; Fattah, Shaikh Anowarul

2017-01-01

In view of recent increase of brain computer interface (BCI) based applications, the importance of efficient classification of various mental tasks has increased prodigiously nowadays. In order to obtain effective classification, efficient feature extraction scheme is necessary, for which, in the proposed method, the interchannel relationship among electroencephalogram (EEG) data is utilized. It is expected that the correlation obtained from different combination of channels will be different for different mental tasks, which can be exploited to extract distinctive feature. The empirical mode decomposition (EMD) technique is employed on a test EEG signal obtained from a channel, which provides a number of intrinsic mode functions (IMFs), and correlation coefficient is extracted from interchannel IMF data. Simultaneously, different statistical features are also obtained from each IMF. Finally, the feature matrix is formed utilizing interchannel correlation features and intrachannel statistical features of the selected IMFs of EEG signal. Different kernels of the support vector machine (SVM) classifier are used to carry out the classification task. An EEG dataset containing ten different combinations of five different mental tasks is utilized to demonstrate the classification performance and a very high level of accuracy is achieved by the proposed scheme compared to existing methods.
Mapping US Urban Extents from MODIS Data Using One-Class Classification Method

Directory of Open Access Journals (Sweden)

Bo Wan

2015-08-01

Full Text Available Urban areas are one of the most important components of human society. Their extents have been continuously growing during the last few decades. Accurate and timely measurements of the extents of urban areas can help in analyzing population densities and urban sprawls and in studying environmental issues related to urbanization. Urban extents detected from remotely sensed data are usually a by-product of land use classification results, and their interpretation requires a full understanding of land cover types. In this study, for the first time, we mapped urban extents in the continental United States using a novel one-class classification method, i.e., positive and unlabeled learning (PUL, with multi-temporal Moderate Resolution Imaging Spectroradiometer (MODIS data for the year 2010. The Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS night stable light data were used to calibrate the urban extents obtained from the one-class classification scheme. Our results demonstrated the effectiveness of the use of the PUL algorithm in mapping large-scale urban areas from coarse remote-sensing images, for the first time. The total accuracy of mapped urban areas was 92.9% and the kappa coefficient was 0.85. The use of DMSP-OLS night stable light data can significantly reduce false detection rates from bare land and cropland far from cities. Compared with traditional supervised classification methods, the one-class classification scheme can greatly reduce the effort involved in collecting training datasets, without losing predictive accuracy.
Statistical methods for forecasting

CERN Document Server

Abraham, Bovas

2009-01-01

The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists."This book, it must be said, lives up to the words on its advertising cover: ''Bridging the gap between introductory, descriptive approaches and highly advanced theoretical treatises, it provides a practical, intermediate level discussion of a variety of forecasting tools, and explains how they relate to one another, both in theory and practice.'' It does just that!"-Journal of the Royal Statistical Society"A well-written work that deals with statistical methods and models that can be used to produce short-term forecasts, this book has wide-ranging applications. It could be used in the context of a study of regression, forecasting, and time series ...
A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

Directory of Open Access Journals (Sweden)

Yong Wang

2016-02-01

Full Text Available Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.
Advances in Statistical Methods for Substance Abuse Prevention Research

Science.gov (United States)

MacKinnon, David P.; Lockwood, Chondra M.

2010-01-01

The paper describes advances in statistical methods for prevention research with a particular focus on substance abuse prevention. Standard analysis methods are extended to the typical research designs and characteristics of the data collected in prevention research. Prevention research often includes longitudinal measurement, clustering of data in units such as schools or clinics, missing data, and categorical as well as continuous outcome variables. Statistical methods to handle these features of prevention data are outlined. Developments in mediation, moderation, and implementation analysis allow for the extraction of more detailed information from a prevention study. Advancements in the interpretation of prevention research results include more widespread calculation of effect size and statistical power, the use of confidence intervals as well as hypothesis testing, detailed causal analysis of research findings, and meta-analysis. The increased availability of statistical software has contributed greatly to the use of new methods in prevention research. It is likely that the Internet will continue to stimulate the development and application of new methods. PMID:12940467
Statistical Discriminability Estimation for Pattern Classification Based on Neural Incremental Attribute Learning

DEFF Research Database (Denmark)

Wang, Ting; Guan, Sheng-Uei; Puthusserypady, Sadasivan

2014-01-01

Feature ordering is a significant data preprocessing method in Incremental Attribute Learning (IAL), a novel machine learning approach which gradually trains features according to a given order. Previous research has shown that, similar to feature selection, feature ordering is also important based...... estimation. Moreover, a criterion that summarizes all the produced values of AD is employed with a GA (Genetic Algorithm)-based approach to obtain the optimum feature ordering for classification problems based on neural networks by means of IAL. Compared with the feature ordering obtained by other approaches...
The Monte Carlo method the method of statistical trials

CERN Document Server

Shreider, YuA

1966-01-01

The Monte Carlo Method: The Method of Statistical Trials is a systematic account of the fundamental concepts and techniques of the Monte Carlo method, together with its range of applications. Some of these applications include the computation of definite integrals, neutron physics, and in the investigation of servicing processes. This volume is comprised of seven chapters and begins with an overview of the basic features of the Monte Carlo method and typical examples of its application to simple problems in computational mathematics. The next chapter examines the computation of multi-dimensio
A simple and robust method for automated photometric classification of supernovae using neural networks

Science.gov (United States)

Karpenka, N. V.; Feroz, F.; Hobson, M. P.

2013-02-01

A method is presented for automated photometric classification of supernovae (SNe) as Type Ia or non-Ia. A two-step approach is adopted in which (i) the SN light curve flux measurements in each observing filter are fitted separately to an analytical parametrized function that is sufficiently flexible to accommodate virtually all types of SNe and (ii) the fitted function parameters and their associated uncertainties, along with the number of flux measurements, the maximum-likelihood value of the fit and Bayesian evidence for the model, are used as the input feature vector to a classification neural network that outputs the probability that the SN under consideration is of Type Ia. The method is trained and tested using data released following the Supernova Photometric Classification Challenge (SNPCC), consisting of light curves for 20 895 SNe in total. We consider several random divisions of the data into training and testing sets: for instance, for our sample D_1 (D_4), a total of 10 (40) per cent of the data are involved in training the algorithm and the remainder used for blind testing of the resulting classifier; we make no selection cuts. Assigning a canonical threshold probability of pth = 0.5 on the network output to class an SN as Type Ia, for the sample D_1 (D_4) we obtain a completeness of 0.78 (0.82), purity of 0.77 (0.82) and SNPCC figure of merit of 0.41 (0.50). Including the SN host-galaxy redshift and its uncertainty as additional inputs to the classification network results in a modest 5-10 per cent increase in these values. We find that the quality of the classification does not vary significantly with SN redshift. Moreover, our probabilistic classification method allows one to calculate the expected completeness, purity and figure of merit (or other measures of classification quality) as a function of the threshold probability pth, without knowing the true classes of the SNe in the testing sample, as is the case in the classification of real SNe
Feature-Based Classification of Amino Acid Substitutions outside Conserved Functional Protein Domains

Directory of Open Access Journals (Sweden)

Branislava Gemovic

2013-01-01

Full Text Available There are more than 500 amino acid substitutions in each human genome, and bioinformatics tools irreplaceably contribute to determination of their functional effects. We have developed feature-based algorithm for the detection of mutations outside conserved functional domains (CFDs and compared its classification efficacy with the most commonly used phylogeny-based tools, PolyPhen-2 and SIFT. The new algorithm is based on the informational spectrum method (ISM, a feature-based technique, and statistical analysis. Our dataset contained neutral polymorphisms and mutations associated with myeloid malignancies from epigenetic regulators ASXL1, DNMT3A, EZH2, and TET2. PolyPhen-2 and SIFT had significantly lower accuracies in predicting the effects of amino acid substitutions outside CFDs than expected, with especially low sensitivity. On the other hand, only ISM algorithm showed statistically significant classification of these sequences. It outperformed PolyPhen-2 and SIFT by 15% and 13%, respectively. These results suggest that feature-based methods, like ISM, are more suitable for the classification of amino acid substitutions outside CFDs than phylogeny-based tools.
Couinaud's classification v.s. Cho's classification. Their feasibility in the right hepatic lobe

International Nuclear Information System (INIS)

Shioyama, Yasukazu; Ikeda, Hiroaki; Sato, Motohito; Yoshimi, Fuyo; Kishi, Kazushi; Sato, Morio; Kimura, Masashi

2008-01-01

The objective of this study was to investigate if the new classification system proposed by Cho is feasible to clinical usage comparing with the classical Couinaud's one. One hundred consecutive cases of abdominal CT were studied using a 64 or an 8 slice multislice CT and created three dimensional portal vein images for analysis by the Workstation. We applied both Cho's classification and the classical Couinaud's one for each cases according to their definitions. Three diagnostic radiologists assessed their feasibility as category one (unable to classify) to five (clear to classify with total suit with the original classification criteria). And in each cases, we tried to judge whether Cho's or the classical Couinaud' classification could more easily transmit anatomical information. Analyzers could classified portal veins clearly (category 5) in 77 to 80% of cases and clearly (category 5) or almost clearly (category 4) in 86-93% along with both classifications. In the feasibility of classification, there was no statistically significant difference between two classifications. In 15 cases we felt that using Couinaud's classification is more convenient for us to transmit anatomical information to physicians than using Cho's one, because in these cases we noticed two large portal veins ramify from right main portal vein cranialy and caudaly and then we could not classify P5 as a branch of antero-ventral segment (AVS). Conversely in 17 cases we felt Cho's classification is more convenient because we could not divide right posterior branch as P6 and P7 and in these cases the right posterior portal vein ramified to several small branches. The anterior fissure vein was clearly noticed in only 60 cases. Comparing the classical Couinaud's classification and Cho's one in feasility of classification, there was no statistically significant difference. We propose we routinely report hepatic anatomy with the classical Couinauds classification and in the preoperative cases we
Academic Training Lecture: Statistical Methods for Particle Physics

CERN Multimedia

PH Department

2012-01-01

2, 3, 4 and 5 April 2012 Academic Training Lecture Regular Programme from 11:00 to 12:00 - Bldg. 222-R-001 - Filtration Plant Statistical Methods for Particle Physics by Glen Cowan (Royal Holloway) The series of four lectures will introduce some of the important statistical methods used in Particle Physics, and should be particularly relevant to those involved in the analysis of LHC data. The lectures will include an introduction to statistical tests, parameter estimation, and the application of these tools to searches for new phenomena. Both frequentist and Bayesian methods will be described, with particular emphasis on treatment of systematic uncertainties. The lectures will also cover unfolding, that is, estimation of a distribution in binned form where the variable in question is subject to measurement errors.
Assessment of Quadrivalent Human Papillomavirus Vaccine Safety Using the Self-Controlled Tree-Temporal Scan Statistic Signal-Detection Method in the Sentinel System.

Science.gov (United States)

Yih, W Katherine; Maro, Judith C; Nguyen, Michael; Baker, Meghan A; Balsbaugh, Carolyn; Cole, David V; Dashevsky, Inna; Mba-Jonas, Adamma; Kulldorff, Martin

2018-06-01

The self-controlled tree-temporal scan statistic-a new signal-detection method-can evaluate whether any of a wide variety of health outcomes are temporally associated with receipt of a specific vaccine, while adjusting for multiple testing. Neither health outcomes nor postvaccination potential periods of increased risk need be prespecified. Using US medical claims data in the Food and Drug Administration's Sentinel system, we employed the method to evaluate adverse events occurring after receipt of quadrivalent human papillomavirus vaccine (4vHPV). Incident outcomes recorded in emergency department or inpatient settings within 56 days after first doses of 4vHPV received by 9- through 26.9-year-olds in 2006-2014 were identified using International Classification of Diseases, Ninth Revision, diagnosis codes and analyzed by pairing the new method with a standard hierarchical classification of diagnoses. On scanning diagnoses of 1.9 million 4vHPV recipients, 2 statistically significant categories of adverse events were found: cellulitis on days 2-3 after vaccination and "other complications of surgical and medical procedures" on days 1-3 after vaccination. Cellulitis is a known adverse event. Clinically informed investigation of electronic claims records of the patients with "other complications" did not suggest any previously unknown vaccine safety problem. Considering that thousands of potential short-term adverse events and hundreds of potential risk intervals were evaluated, these findings add significantly to the growing safety record of 4vHPV.
A hierarchical classification method for finger knuckle print recognition

Science.gov (United States)

Kong, Tao; Yang, Gongping; Yang, Lu

2014-12-01

Finger knuckle print has recently been seen as an effective biometric technique. In this paper, we propose a hierarchical classification method for finger knuckle print recognition, which is rooted in traditional score-level fusion methods. In the proposed method, we firstly take Gabor feature as the basic feature for finger knuckle print recognition and then a new decision rule is defined based on the predefined threshold. Finally, the minor feature speeded-up robust feature is conducted for these users, who cannot be recognized by the basic feature. Extensive experiments are performed to evaluate the proposed method, and experimental results show that it can achieve a promising performance.
Statistical Methods for Unusual Count Data

DEFF Research Database (Denmark)

Guthrie, Katherine A.; Gammill, Hilary S.; Kamper-Jørgensen, Mads

2016-01-01

microchimerism data present challenges for statistical analysis, including a skewed distribution, excess zero values, and occasional large values. Methods for comparing microchimerism levels across groups while controlling for covariates are not well established. We compared statistical models for quantitative...... microchimerism values, applied to simulated data sets and 2 observed data sets, to make recommendations for analytic practice. Modeling the level of quantitative microchimerism as a rate via Poisson or negative binomial model with the rate of detection defined as a count of microchimerism genome equivalents per...
Comparison analysis for classification algorithm in data mining and the study of model use

Science.gov (United States)

Chen, Junde; Zhang, Defu

2018-04-01

As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.
Signal classification using global dynamical models, Part I: Theory

International Nuclear Information System (INIS)

Kadtke, J.; Kremliovsky, M.

1996-01-01

Detection and classification of signals is one of the principal areas of signal processing, and the utilization of nonlinear information has long been considered as a way of improving performance beyond standard linear (e.g. spectral) techniques. Here, we develop a method for using global models of chaotic dynamical systems theory to define a signal classification processing chain, which is sensitive to nonlinear correlations in the data. We use it to demonstrate classification in high noise regimes (negative SNR), and argue that classification probabilities can be directly computed from ensemble statistics in the model coefficient space. We also develop a modification for non-stationary signals (i.e. transients) using non-autonomous ODEs. In Part II of this paper, we demonstrate the analysis on actual open ocean acoustic data from marine biologics. copyright 1996 American Institute of Physics
Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

Science.gov (United States)

Le, Trang T; Simmons, W Kyle; Misaki, Masaya; Bodurka, Jerzy; White, Bill C; Savitz, Jonathan; McKinney, Brett A

2017-09-15

Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. Code
Semi-supervised vibration-based classification and condition monitoring of compressors

Science.gov (United States)

Potočnik, Primož; Govekar, Edvard

2017-09-01

Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.

An inter-comparison of similarity-based methods for organisation and classification of groundwater hydrographs

Science.gov (United States)

Haaf, Ezra; Barthel, Roland

2018-04-01

Classification and similarity based methods, which have recently received major attention in the field of surface water hydrology, namely through the PUB (prediction in ungauged basins) initiative, have not yet been applied to groundwater systems. However, it can be hypothesised, that the principle of "similar systems responding similarly to similar forcing" applies in subsurface hydrology as well. One fundamental prerequisite to test this hypothesis and eventually to apply the principle to make "predictions for ungauged groundwater systems" is efficient methods to quantify the similarity of groundwater system responses, i.e. groundwater hydrographs. In this study, a large, spatially extensive, as well as geologically and geomorphologically diverse dataset from Southern Germany and Western Austria was used, to test and compare a set of 32 grouping methods, which have previously only been used individually in local-scale studies. The resulting groupings are compared to a heuristic visual classification, which serves as a baseline. A performance ranking of these classification methods is carried out and differences in homogeneity of grouping results were shown, whereby selected groups were related to hydrogeological indices and geological descriptors. This exploratory empirical study shows that the choice of grouping method has a large impact on the object distribution within groups, as well as on the homogeneity of patterns captured in groups. The study provides a comprehensive overview of a large number of grouping methods, which can guide researchers when attempting similarity-based groundwater hydrograph classification.
Classification of methods and equipment recovery secondary waters

Directory of Open Access Journals (Sweden)

G. V. Kalashnikov

2017-01-01

Full Text Available The issues of purification of secondary waters of industrial production have an important place and are relevant in the environmental activities of all food and chemical industries. For cleaning the transporter-washing water of beet-sugar production the key role is played by the equipment of treatment plants. A wide variety of wastewater treatment equipment is classified according to various methods. Typical structures used are sedimentation tanks, hydrocyclones, separators, centrifuges. In turn, they have a different degree of purification, productivity through the incoming suspension and purified secondary water. This is equipment is divided into designs, depending on the range of particles to be removed. A general classification of methods for cleaning the transporter-washing water, as well as the corresponding equipment, is made. Based on the analysis of processes and instrumentation, the main methods of wastewater treatment are identified: mechanical, physicochemical, combined, biological and disinfection. To increase the degree of purification and reduce technical and economic costs, a combined method is widely used. The main task of the site for cleaning the transporter-washing waters of sugar beet production is to provide the enterprise with water in the required quantity and quality, with economical use of water resources, taking into account the absence of pollution of surface and groundwater by industrial wastewater. In the sugar industry is currently new types of washing equipment of foreign production are widely used, which require high quality and a large amount of purified transporter-washing water for normal operation. The proposed classification makes it possible to carry out a comparative technical and economic analysis when choosing the methods and equipment for recuperation of secondary waters. The main equipment secondary water recovery used at the beet-sugar plant is considered. The most common beet processing plant is a
Nonequilibrium statistical mechanics ensemble method

CERN Document Server

Eu, Byung Chan

1998-01-01

In this monograph, nonequilibrium statistical mechanics is developed by means of ensemble methods on the basis of the Boltzmann equation, the generic Boltzmann equations for classical and quantum dilute gases, and a generalised Boltzmann equation for dense simple fluids The theories are developed in forms parallel with the equilibrium Gibbs ensemble theory in a way fully consistent with the laws of thermodynamics The generalised hydrodynamics equations are the integral part of the theory and describe the evolution of macroscopic processes in accordance with the laws of thermodynamics of systems far removed from equilibrium Audience This book will be of interest to researchers in the fields of statistical mechanics, condensed matter physics, gas dynamics, fluid dynamics, rheology, irreversible thermodynamics and nonequilibrium phenomena
Possible classification of the methods of operational research applicable in the field of defense

Directory of Open Access Journals (Sweden)

Mučibabić Spasoje

2006-01-01

Full Text Available The overall dynamic development of operational research in various fields of human activities urges the need for a clearer and mathematically more explicit classification of its methods. This need is also very urgent in the field of defense, particularly because of the complications of modern conflicts, as well as of new security requirements. One of the possible classifications of methods based on the theory of games as a mathematical model for solving conflict situations is presented in this paper. The connections between methods and their mathematical description are underlined.
Mass detection, localization and estimation for wind turbine blades based on statistical pattern recognition

DEFF Research Database (Denmark)

Colone, L.; Hovgaard, K.; Glavind, Lars

2018-01-01

A method for mass change detection on wind turbine blades using natural frequencies is presented. The approach is based on two statistical tests. The first test decides if there is a significant mass change and the second test is a statistical group classification based on Linear Discriminant Ana...
Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

Science.gov (United States)

Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

2014-01-01

Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727
An NCME Instructional Module on Data Mining Methods for Classification and Regression

Science.gov (United States)

Sinharay, Sandip

2016-01-01

Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…
Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

Science.gov (United States)

Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

2017-12-01

Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
Statistical method for resolving the photon-photoelectron-counting inversion problem

International Nuclear Information System (INIS)

Wu Jinlong; Li Tiejun; Peng, Xiang; Guo Hong

2011-01-01

A statistical inversion method is proposed for the photon-photoelectron-counting statistics in quantum key distribution experiment. With the statistical viewpoint, this problem is equivalent to the parameter estimation for an infinite binomial mixture model. The coarse-graining idea and Bayesian methods are applied to deal with this ill-posed problem, which is a good simple example to show the successful application of the statistical methods to the inverse problem. Numerical results show the applicability of the proposed strategy. The coarse-graining idea for the infinite mixture models should be general to be used in the future.
Comparison of Single and Multi-Scale Method for Leaf and Wood Points Classification from Terrestrial Laser Scanning Data

Science.gov (United States)

Wei, Hongqiang; Zhou, Guiyun; Zhou, Junjie

2018-04-01

The classification of leaf and wood points is an essential preprocessing step for extracting inventory measurements and canopy characterization of trees from the terrestrial laser scanning (TLS) data. The geometry-based approach is one of the widely used classification method. In the geometry-based method, it is common practice to extract salient features at one single scale before the features are used for classification. It remains unclear how different scale(s) used affect the classification accuracy and efficiency. To assess the scale effect on the classification accuracy and efficiency, we extracted the single-scale and multi-scale salient features from the point clouds of two oak trees of different sizes and conducted the classification on leaf and wood. Our experimental results show that the balanced accuracy of the multi-scale method is higher than the average balanced accuracy of the single-scale method by about 10 % for both trees. The average speed-up ratio of single scale classifiers over multi-scale classifier for each tree is higher than 30.
Evaluation of Different Methods for Soil Classifications by Using Geographic Information Systems and Remote Sensing

Directory of Open Access Journals (Sweden)

S. H Sanaeinejad

2012-12-01

Full Text Available Soil salinity is an important factor that affects plant growth and reduces production of plantat different growth stages Remote sensing technology and GIS have a great potential for monitoring dynamic soil processes such as salinity. In the present study the efficiency of remote sensing technology and its integration with GIS was examined to estimate soil salinity for Neyshabour basin. Different classification methods for soil salinity were also investigated. We used 6 bands of LandSat ETM+ for this study. Classification results obtained from applying mathematical models for the images were compared with different band combinations results. The area of saline and non saline soil classes were identified in the study area based on the both methods and also based on the combination of the two methods. The results showed that the best method for soil classification was using of the two methods in the first stage to separate two classes of saline and non saline soils and then classifying the non saline soils in the second stage. As the variation in the numerical values of the image for different soil salinity in the study area was small, it was concluded that there is a limit potential of LandSat ETM+ images for identifying and classification of soil salinity in such an area.
A new method to determine the number of experimental data using statistical modeling methods

Energy Technology Data Exchange (ETDEWEB)

Jung, Jung-Ho; Kang, Young-Jin; Lim, O-Kaung; Noh, Yoojeong [Pusan National University, Busan (Korea, Republic of)

2017-06-15

For analyzing the statistical performance of physical systems, statistical characteristics of physical parameters such as material properties need to be estimated by collecting experimental data. For accurate statistical modeling, many such experiments may be required, but data are usually quite limited owing to the cost and time constraints of experiments. In this study, a new method for determining a rea- sonable number of experimental data is proposed using an area metric, after obtaining statistical models using the information on the underlying distribution, the Sequential statistical modeling (SSM) approach, and the Kernel density estimation (KDE) approach. The area metric is used as a convergence criterion to determine the necessary and sufficient number of experimental data to be acquired. The pro- posed method is validated in simulations, using different statistical modeling methods, different true models, and different convergence criteria. An example data set with 29 data describing the fatigue strength coefficient of SAE 950X is used for demonstrating the performance of the obtained statistical models that use a pre-determined number of experimental data in predicting the probability of failure for a target fatigue life.
Gene Expression Profiles for Predicting Metastasis in Breast Cancer: A Cross-Study Comparison of Classification Methods

Directory of Open Access Journals (Sweden)

Mark Burton

2012-01-01

Full Text Available Machine learning has increasingly been used with microarray gene expression data and for the development of classifiers using a variety of methods. However, method comparisons in cross-study datasets are very scarce. This study compares the performance of seven classification methods and the effect of voting for predicting metastasis outcome in breast cancer patients, in three situations: within the same dataset or across datasets on similar or dissimilar microarray platforms. Combining classification results from seven classifiers into one voting decision performed significantly better during internal validation as well as external validation in similar microarray platforms than the underlying classification methods. When validating between different microarray platforms, random forest, another voting-based method, proved to be the best performing method. We conclude that voting based classifiers provided an advantage with respect to classifying metastasis outcome in breast cancer patients.
Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

Directory of Open Access Journals (Sweden)

Alessandro Bugatti

2002-04-01

Full Text Available We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on Zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron. In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used.
The application of statistical methods to assess economic assets

Directory of Open Access Journals (Sweden)

D. V. Dianov

2017-01-01

Full Text Available The article is devoted to consideration and evaluation of machinery, equipment and special equipment, methodological aspects of the use of standards for assessment of buildings and structures in current prices, the valuation of residential, specialized houses, office premises, assessment and reassessment of existing and inactive military assets, the application of statistical methods to obtain the relevant cost estimates.The objective of the scientific article is to consider possible application of statistical tools in the valuation of the assets, composing the core group of elements of national wealth – the fixed assets. Firstly, capital tangible assets constitute the basis of material base of a new value creation, products and non-financial services. The gain, accumulated of tangible assets of a capital nature is a part of the gross domestic product, and from its volume and specific weight in the composition of GDP we can judge the scope of reproductive processes in the country.Based on the methodological materials of the state statistics bodies of the Russian Federation, regulations of the theory of statistics, which describe the methods of statistical analysis such as the index, average values, regression, the methodical approach is structured in the application of statistical tools to obtain value estimates of property, plant and equipment with significant accumulated depreciation. Until now, the use of statistical methodology in the practice of economic assessment of assets is only fragmentary. This applies to both Federal Legislation (Federal law № 135 «On valuation activities in the Russian Federation» dated 16.07.1998 in edition 05.07.2016 and the methodological documents and regulations of the estimated activities, in particular, the valuation activities’ standards. A particular problem is the use of a digital database of Rosstat (Federal State Statistics Service, as to the specific fixed assets the comparison should be carried
Classification methods to detect sleep apnea in adults based on respiratory and oximetry signals: a systematic review.

Science.gov (United States)

Uddin, M B; Chow, C M; Su, S W

2018-03-26

Sleep apnea (SA), a common sleep disorder, can significantly decrease the quality of life, and is closely associated with major health risks such as cardiovascular disease, sudden death, depression, and hypertension. The normal diagnostic process of SA using polysomnography is costly and time consuming. In addition, the accuracy of different classification methods to detect SA varies with the use of different physiological signals. If an effective, reliable, and accurate classification method is developed, then the diagnosis of SA and its associated treatment will be time-efficient and economical. This study aims to systematically review the literature and present an overview of classification methods to detect SA using respiratory and oximetry signals and address the automated detection approach. Sixty-two included studies revealed the application of single and multiple signals (respiratory and oximetry) for the diagnosis of SA. Both airflow and oxygen saturation signals alone were effective in detecting SA in the case of binary decision-making, whereas multiple signals were good for multi-class detection. In addition, some machine learning methods were superior to the other classification methods for SA detection using respiratory and oximetry signals. To deal with the respiratory and oximetry signals, a good choice of classification method as well as the consideration of associated factors would result in high accuracy in the detection of SA. An accurate classification method should provide a high detection rate with an automated (independent of human action) analysis of respiratory and oximetry signals. Future high-quality automated studies using large samples of data from multiple patient groups or record batches are recommended.
Screening tests for hazard classification of complex waste materials – Selection of methods

International Nuclear Information System (INIS)

Weltens, R.; Vanermen, G.; Tirez, K.; Robbens, J.; Deprez, K.; Michiels, L.

2012-01-01

In this study we describe the development of an alternative methodology for hazard characterization of waste materials. Such an alternative methodology for hazard assessment of complex waste materials is urgently needed, because the lack of a validated instrument leads to arbitrary hazard classification of such complex waste materials. False classification can lead to human and environmental health risks and also has important financial consequences for the waste owner. The Hazardous Waste Directive (HWD) describes the methodology for hazard classification of waste materials. For mirror entries the HWD classification is based upon the hazardous properties (H1–15) of the waste which can be assessed from the hazardous properties of individual identified waste compounds or – if not all compounds are identified – from test results of hazard assessment tests performed on the waste material itself. For the latter the HWD recommends toxicity tests that were initially designed for risk assessment of chemicals in consumer products (pharmaceuticals, cosmetics, biocides, food, etc.). These tests (often using mammals) are not designed nor suitable for the hazard characterization of waste materials. With the present study we want to contribute to the development of an alternative and transparent test strategy for hazard assessment of complex wastes that is in line with the HWD principles for waste classification. It is necessary to cope with this important shortcoming in hazardous waste classification and to demonstrate that alternative methods are available that can be used for hazard assessment of waste materials. Next, by describing the pros and cons of the available methods, and by identifying the needs for additional or further development of test methods, we hope to stimulate research efforts and development in this direction. In this paper we describe promising techniques and argument on the test selection for the pilot study that we have performed on different
Application of texture analysis method for classification of benign and malignant thyroid nodules in ultrasound images.

Science.gov (United States)

Abbasian Ardakani, Ali; Gharbali, Akbar; Mohammadi, Afshin

2015-01-01

The aim of this study was to evaluate computer aided diagnosis (CAD) system with texture analysis (TA) to improve radiologists' accuracy in identification of thyroid nodules as malignant or benign. A total of 70 cases (26 benign and 44 malignant) were analyzed in this study. We extracted up to 270 statistical texture features as a descriptor for each selected region of interests (ROIs) in three normalization schemes (default, 3s and 1%-99%). Then features by the lowest probability of classification error and average correlation coefficients (POE+ACC), and Fisher coefficient (Fisher) eliminated to 10 best and most effective features. These features were analyzed under standard and nonstandard states. For TA of the thyroid nodules, Principle Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Non-Linear Discriminant Analysis (NDA) were applied. First Nearest-Neighbour (1-NN) classifier was performed for the features resulting from PCA and LDA. NDA features were classified by artificial neural network (A-NN). Receiver operating characteristic (ROC) curve analysis was used for examining the performance of TA methods. The best results were driven in 1-99% normalization with features extracted by POE+ACC algorithm and analyzed by NDA with the area under the ROC curve ( Az) of 0.9722 which correspond to sensitivity of 94.45%, specificity of 100%, and accuracy of 97.14%. Our results indicate that TA is a reliable method, can provide useful information help radiologist in detection and classification of benign and malignant thyroid nodules.
A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

Directory of Open Access Journals (Sweden)

Yongjun Piao

2015-01-01

Full Text Available Ensemble data mining methods, also known as classifier combination, are often used to improve the performance of classification. Various classifier combination methods such as bagging, boosting, and random forest have been devised and have received considerable attention in the past. However, data dimensionality increases rapidly day by day. Such a trend poses various challenges as these methods are not suitable to directly apply to high-dimensional datasets. In this paper, we propose an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features. In our method, the redundancy of features is considered to divide the original feature space. Then, each generated feature subset is trained by a support vector machine, and the results of each classifier are combined by majority voting. The efficiency and effectiveness of our method are demonstrated through comparisons with other ensemble techniques, and the results show that our method outperforms other methods.
MRI Brain Images Healthy and Pathological Tissues Classification with the Aid of Improved Particle Swarm Optimization and Neural Network

Science.gov (United States)

Sheejakumari, V.; Sankara Gomathi, B.

2015-01-01

The advantages of magnetic resonance imaging (MRI) over other diagnostic imaging modalities are its higher spatial resolution and its better discrimination of soft tissue. In the previous tissues classification method, the healthy and pathological tissues are classified from the MRI brain images using HGANN. But the method lacks sensitivity and accuracy measures. The classification method is inadequate in its performance in terms of these two parameters. So, to avoid these drawbacks, a new classification method is proposed in this paper. Here, new tissues classification method is proposed with improved particle swarm optimization (IPSO) technique to classify the healthy and pathological tissues from the given MRI images. Our proposed classification method includes the same four stages, namely, tissue segmentation, feature extraction, heuristic feature selection, and tissue classification. The method is implemented and the results are analyzed in terms of various statistical performance measures. The results show the effectiveness of the proposed classification method in classifying the tissues and the achieved improvement in sensitivity and accuracy measures. Furthermore, the performance of the proposed technique is evaluated by comparing it with the other segmentation methods. PMID:25977706

Different methods for quark/gluon jet classification on real data from the DELPHI detector

Energy Technology Data Exchange (ETDEWEB)

Transtroemer, G

1999-05-01

Different methods to separate quark jets from gluon jets have been investigated and tested on data from the DELPHI experiment. A test sample of gluon jets was selected from bb-barg threejet events where the two b-jets had been identified using a lifetime tag and quark jet sample was obtained from qq-bar{gamma} events where the photon was required to have a high energy and to be well separated from the two jets. Three types of tests were made. Firstly, the jet energy, which is the variable most frequently used for quark/gluon jet separation, was compared with methods based of the differences in the fragmentation of quark and gluon jets. It was found that the fragmentation based classification provides significantly better identification than the jet energy only in events where the jets all have approximately the same energy. In Monte Carlo generated symmetric e{sup +}e{sup -} {yields} qq-barg threejet events, where the jet energy does not provide any identification at all, the gluon jet was correctly assigned in 58 % of the events. More important, however, is that the identification has been divided into two independent parts, the energy part and the fragmentation part. Secondly, two different sets of fragmentation sensitive variables were tested. It was found that a slightly better identification could be achieved using information from all the particles of the jet rather than using only the leading ones. Thirdly, three types of statistical discrimination methods were compared: a cut on a single fragmentation variable; a cut on the Fisher statistical discriminant calculated from one set of variables; a cut on the output from an Artificial Neural Networks (ANN) trained on different sets of variables. The three types of classifiers gave about the same performance and one conclusion from this study was that the use of ANNs or Fisher statistical discrimination do not seem to improve the results significantly in quark/gluon jet separation on a jet to jet basis 45 refs
Different methods for quark/gluon jet classification on real data from the DELPHI detector

International Nuclear Information System (INIS)

Transtroemer, G.

1999-05-01

Different methods to separate quark jets from gluon jets have been investigated and tested on data from the DELPHI experiment. A test sample of gluon jets was selected from bb-barg threejet events where the two b-jets had been identified using a lifetime tag and quark jet sample was obtained from qq-barγ events where the photon was required to have a high energy and to be well separated from the two jets. Three types of tests were made. Firstly, the jet energy, which is the variable most frequently used for quark/gluon jet separation, was compared with methods based of the differences in the fragmentation of quark and gluon jets. It was found that the fragmentation based classification provides significantly better identification than the jet energy only in events where the jets all have approximately the same energy. In Monte Carlo generated symmetric e + e - → qq-barg threejet events, where the jet energy does not provide any identification at all, the gluon jet was correctly assigned in 58 % of the events. More important, however, is that the identification has been divided into two independent parts, the energy part and the fragmentation part. Secondly, two different sets of fragmentation sensitive variables were tested. It was found that a slightly better identification could be achieved using information from all the particles of the jet rather than using only the leading ones. Thirdly, three types of statistical discrimination methods were compared: a cut on a single fragmentation variable; a cut on the Fisher statistical discriminant calculated from one set of variables; a cut on the output from an Artificial Neural Networks (ANN) trained on different sets of variables. The three types of classifiers gave about the same performance and one conclusion from this study was that the use of ANNs or Fisher statistical discrimination do not seem to improve the results significantly in quark/gluon jet separation on a jet to jet basis
SVM-based Partial Discharge Pattern Classification for GIS

Science.gov (United States)

Ling, Yin; Bai, Demeng; Wang, Menglin; Gong, Xiaojin; Gu, Chao

2018-01-01

Partial discharges (PD) occur when there are localized dielectric breakdowns in small regions of gas insulated substations (GIS). It is of high importance to recognize the PD patterns, through which we can diagnose the defects caused by different sources so that predictive maintenance can be conducted to prevent from unplanned power outage. In this paper, we propose an approach to perform partial discharge pattern classification. It first recovers the PRPD matrices from the PRPD2D images; then statistical features are extracted from the recovered PRPD matrix and fed into SVM for classification. Experiments conducted on a dataset containing thousands of images demonstrates the high effectiveness of the method.
Chemometrics Methods for Specificity, Authenticity and Traceability Analysis of Olive Oils: Principles, Classifications and Applications

Directory of Open Access Journals (Sweden)

Habib Messai

2016-11-01

Full Text Available Background. Olive oils (OOs show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends’ preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i characterization by specific markers; (ii authentication by fingerprint patterns; and (iii monitoring by traceability analysis. Methods. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. Results. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Conclusion. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors.
Chemometrics Methods for Specificity, Authenticity and Traceability Analysis of Olive Oils: Principles, Classifications and Applications

Science.gov (United States)

Messai, Habib; Farman, Muhammad; Sarraj-Laabidi, Abir; Hammami-Semmar, Asma; Semmar, Nabil

2016-01-01

Background. Olive oils (OOs) show high chemical variability due to several factors of genetic, environmental and anthropic types. Genetic and environmental factors are responsible for natural compositions and polymorphic diversification resulting in different varietal patterns and phenotypes. Anthropic factors, however, are at the origin of different blends’ preparation leading to normative, labelled or adulterated commercial products. Control of complex OO samples requires their (i) characterization by specific markers; (ii) authentication by fingerprint patterns; and (iii) monitoring by traceability analysis. Methods. These quality control and management aims require the use of several multivariate statistical tools: specificity highlighting requires ordination methods; authentication checking calls for classification and pattern recognition methods; traceability analysis implies the use of network-based approaches able to separate or extract mixed information and memorized signals from complex matrices. Results. This chapter presents a review of different chemometrics methods applied for the control of OO variability from metabolic and physical-chemical measured characteristics. The different chemometrics methods are illustrated by different study cases on monovarietal and blended OO originated from different countries. Conclusion. Chemometrics tools offer multiple ways for quantitative evaluations and qualitative control of complex chemical variability of OO in relation to several intrinsic and extrinsic factors. PMID:28231172
Slip estimation methods for proprioceptive terrain classification using tracked mobile robots

CSIR Research Space (South Africa)

Masha, Ditebogo F

2017-11-01

Full Text Available Recent work has shown that proprioceptive measurements such as terrain slip can be used for terrain classification. This paper investigates the suitability of four simple slip estimation methods for differentiating between indoor and outdoor terrain...
Optimizing Multiple Kernel Learning for the Classification of UAV Data

Directory of Open Access Journals (Sweden)

Caroline M. Gevaert

2016-12-01

Full Text Available Unmanned Aerial Vehicles (UAVs are capable of providing high-quality orthoimagery and 3D information in the form of point clouds at a relatively low cost. Their increasing popularity stresses the necessity of understanding which algorithms are especially suited for processing the data obtained from UAVs. The features that are extracted from the point cloud and imagery have different statistical characteristics and can be considered as heterogeneous, which motivates the use of Multiple Kernel Learning (MKL for classification problems. In this paper, we illustrate the utility of applying MKL for the classification of heterogeneous features obtained from UAV data through a case study of an informal settlement in Kigali, Rwanda. Results indicate that MKL can achieve a classification accuracy of 90.6%, a 5.2% increase over a standard single-kernel Support Vector Machine (SVM. A comparison of seven MKL methods indicates that linearly-weighted kernel combinations based on simple heuristics are competitive with respect to computationally-complex, non-linear kernel combination methods. We further underline the importance of utilizing appropriate feature grouping strategies for MKL, which has not been directly addressed in the literature, and we propose a novel, automated feature grouping method that achieves a high classification accuracy for various MKL methods.
SoFoCles: feature filtering for microarray classification based on gene ontology.

Science.gov (United States)

Papachristoudis, Georgios; Diplaris, Sotiris; Mitkas, Pericles A

2010-02-01

Marker gene selection has been an important research topic in the classification analysis of gene expression data. Current methods try to reduce the "curse of dimensionality" by using statistical intra-feature set calculations, or classifiers that are based on the given dataset. In this paper, we present SoFoCles, an interactive tool that enables semantic feature filtering in microarray classification problems with the use of external, well-defined knowledge retrieved from the Gene Ontology. The notion of semantic similarity is used to derive genes that are involved in the same biological path during the microarray experiment, by enriching a feature set that has been initially produced with legacy methods. Among its other functionalities, SoFoCles offers a large repository of semantic similarity methods that are used in order to derive feature sets and marker genes. The structure and functionality of the tool are discussed in detail, as well as its ability to improve classification accuracy. Through experimental evaluation, SoFoCles is shown to outperform other classification schemes in terms of classification accuracy in two real datasets using different semantic similarity computation approaches.
Brief guidelines for methods and statistics in medical research

CERN Document Server

Ab Rahman, Jamalludin

2015-01-01

This book serves as a practical guide to methods and statistics in medical research. It includes step-by-step instructions on using SPSS software for statistical analysis, as well as relevant examples to help those readers who are new to research in health and medical fields. Simple texts and diagrams are provided to help explain the concepts covered, and print screens for the statistical steps and the SPSS outputs are provided, together with interpretations and examples of how to report on findings. Brief Guidelines for Methods and Statistics in Medical Research offers a valuable quick reference guide for healthcare students and practitioners conducting research in health related fields, written in an accessible style.
Fast-HPLC Fingerprinting to Discriminate Olive Oil from Other Edible Vegetable Oils by Multivariate Classification Methods.

Science.gov (United States)

Jiménez-Carvelo, Ana M; González-Casado, Antonio; Pérez-Castaño, Estefanía; Cuadros-Rodríguez, Luis

2017-03-01

A new analytical method for the differentiation of olive oil from other vegetable oils using reversed-phase LC and applying chemometric techniques was developed. A 3 cm short column was used to obtain the chromatographic fingerprint of the methyl-transesterified fraction of each vegetable oil. The chromatographic analysis took only 4 min. The multivariate classification methods used were k-nearest neighbors, partial least-squares (PLS) discriminant analysis, one-class PLS, support vector machine classification, and soft independent modeling of class analogies. The discrimination of olive oil from other vegetable edible oils was evaluated by several classification quality metrics. Several strategies for the classification of the olive oil were used: one input-class, two input-class, and pseudo two input-class.
Fractal dimension and image statistics of anal intraepithelial neoplasia

International Nuclear Information System (INIS)

Ahammer, H.; Kroepfl, J.M.; Hackl, Ch.; Sedivy, R.

2011-01-01

Research Highlights: → Human papillomaviruses cause anal intraepithelial neoplasia (AIN). → Digital image processing was carried out to classify the grades of AIN quantitatively. → The fractal dimension as well as grey value statistics was calculated. → Higher grades of AIN yielded higher values of the fractal dimension. → An automatic detection system is feasible. - Abstract: It is well known that human papillomaviruses (HPV) induce a variety of tumorous lesions of the skin. HPV-subtypes also cause premalignant lesions which are termed anal intraepithelial neoplasia (AIN). The clinical classification of AIN is of growing interest in clinical practice, due to increasing HPV infection rates throughout human population. The common classification approach is based on subjective inspections of histological slices of anal tissues with all the drawbacks of depending on the status and individual variances of the trained pathologists. Therefore, a nonlinear quantitative classification method including the calculation of the fractal dimension and first order as well as second order image statistical parameters was developed. The absolute values of these quantitative parameters reflected the distinct grades of AIN very well. The quantitative approach has the potential to decrease classification errors significantly and it could be used as a widely applied screening technique.
Data classification and MTBF prediction with a multivariate analysis approach

International Nuclear Information System (INIS)

Braglia, Marcello; Carmignani, Gionata; Frosolini, Marco; Zammori, Francesco

2012-01-01

The paper presents a multivariate statistical approach that supports the classification of mechanical components, subjected to specific operating conditions, in terms of the Mean Time Between Failure (MTBF). Assessing the influence of working conditions and/or environmental factors on the MTBF is a prerequisite for the development of an effective preventive maintenance plan. However, this task may be demanding and it is generally performed with ad-hoc experimental methods, lacking of statistical rigor. To solve this common problem, a step by step multivariate data classification technique is proposed. Specifically, a set of structured failure data are classified in a meaningful way by means of: (i) cluster analysis, (ii) multivariate analysis of variance, (iii) feature extraction and (iv) predictive discriminant analysis. This makes it possible not only to define the MTBF of the analyzed components, but also to identify the working parameters that explain most of the variability of the observed data. The approach is finally demonstrated on 126 centrifugal pumps installed in an oil refinery plant; obtained results demonstrate the quality of the final discrimination, in terms of data classification and failure prediction.
A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

Directory of Open Access Journals (Sweden)

Himmelreich Uwe

2009-07-01

Full Text Available Abstract Background Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. Results We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. Conclusion The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.
The estimation of the measurement results with using statistical methods

International Nuclear Information System (INIS)

Ukrmetrteststandard, 4, Metrologichna Str., 03680, Kyiv (Ukraine))" data-affiliation=" (State Enterprise Ukrmetrteststandard, 4, Metrologichna Str., 03680, Kyiv (Ukraine))" >Velychko, O; UkrNDIspirtbioprod, 3, Babushkina Lane, 03190, Kyiv (Ukraine))" data-affiliation=" (State Scientific Institution UkrNDIspirtbioprod, 3, Babushkina Lane, 03190, Kyiv (Ukraine))" >Gordiyenko, T

2015-01-01

The row of international standards and guides describe various statistical methods that apply for a management, control and improvement of processes with the purpose of realization of analysis of the technical measurement results. The analysis of international standards and guides on statistical methods estimation of the measurement results recommendations for those applications in laboratories is described. For realization of analysis of standards and guides the cause-and-effect Ishikawa diagrams concerting to application of statistical methods for estimation of the measurement results are constructed
The estimation of the measurement results with using statistical methods

Science.gov (United States)

Velychko, O.; Gordiyenko, T.

2015-02-01

The row of international standards and guides describe various statistical methods that apply for a management, control and improvement of processes with the purpose of realization of analysis of the technical measurement results. The analysis of international standards and guides on statistical methods estimation of the measurement results recommendations for those applications in laboratories is described. For realization of analysis of standards and guides the cause-and-effect Ishikawa diagrams concerting to application of statistical methods for estimation of the measurement results are constructed.
Structured Literature Review of Electricity Consumption Classification Using Smart Meter Data

Directory of Open Access Journals (Sweden)

Alexander Martin Tureczek

2017-04-01

Full Text Available Smart meters for measuring electricity consumption are fast becoming prevalent in households. The meters measure consumption on a very fine scale, usually on a 15 min basis, and the data give unprecedented granularity of consumption patterns at household level. A multitude of papers have emerged utilizing smart meter data for deepening our knowledge of consumption patterns. This paper applies a modification of Okoli’s method for conducting structured literature reviews to generate an overview of research in electricity customer classification using smart meter data. The process assessed 2099 papers before identifying 34 significant papers, and highlights three key points: prominent methods, datasets and application. Three important findings are outlined. First, only a few papers contemplate future applications of the classification, rendering papers relevant only in a classification setting. Second; the encountered classification methods do not consider correlation or time series analysis when classifying. The identified papers fail to thoroughly analyze the statistical properties of the data, investigations that could potentially improve classification performance. Third, the description of the data utilized is of varying quality, with only 50% acknowledging missing values impact on the final sample size. A data description score for assessing the quality in data description has been developed and applied to all papers reviewed.
SOME ASPECTS OF THE USE OF MATHEMATICAL-STATISTICAL METHODS IN THE ANALYSIS OF SOCIO-HUMANISTIC TEXTS Humanities and social text, mathematics, method, statistics, probability

Directory of Open Access Journals (Sweden)

Zaira M Alieva

2016-01-01

Full Text Available The article analyzes the application of mathematical and statistical methods in the analysis of socio-humanistic texts. The essence of mathematical and statistical methods, presents examples of their use in the study of Humanities and social phenomena. Considers the key issues faced by the expert in the application of mathematical-statistical methods in socio-humanitarian sphere, including the availability of sustainable contrasting socio-humanitarian Sciences and mathematics; the complexity of the allocation of the object that is the bearer of the problem; having the use of a probabilistic approach. The conclusion according to the results of the study.
Cutting-edge statistical methods for a life-course approach.

Science.gov (United States)

Bub, Kristen L; Ferretti, Larissa K

2014-01-01

Advances in research methods, data collection and record keeping, and statistical software have substantially increased our ability to conduct rigorous research across the lifespan. In this article, we review a set of cutting-edge statistical methods that life-course researchers can use to rigorously address their research questions. For each technique, we describe the method, highlight the benefits and unique attributes of the strategy, offer a step-by-step guide on how to conduct the analysis, and illustrate the technique using data from the National Institute of Child Health and Human Development Study of Early Child Care and Youth Development. In addition, we recommend a set of technical and empirical readings for each technique. Our goal was not to address a substantive question of interest but instead to provide life-course researchers with a useful reference guide to cutting-edge statistical methods.
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

KAUST Repository

Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z.; Gao, Xin

2017-01-01

Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

KAUST Repository

Najibi, Seyed Morteza

2017-02-08

Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

Advanced data analysis in neuroscience integrating statistical and computational models

CERN Document Server

Durstewitz, Daniel

2017-01-01

This book is intended for use in advanced graduate courses in statistics / machine learning, as well as for all experimental neuroscientists seeking to understand statistical methods at a deeper level, and theoretical neuroscientists with a limited background in statistics. It reviews almost all areas of applied statistics, from basic statistical estimation and test theory, linear and nonlinear approaches for regression and classification, to model selection and methods for dimensionality reduction, density estimation and unsupervised clustering. Its focus, however, is linear and nonlinear time series analysis from a dynamical systems perspective, based on which it aims to convey an understanding also of the dynamical mechanisms that could have generated observed time series. Further, it integrates computational modeling of behavioral and neural dynamics with statistical estimation and hypothesis testing. This way computational models in neuroscience are not only explanat ory frameworks, but become powerfu...
A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data.

Directory of Open Access Journals (Sweden)

David Stephens

Full Text Available Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i the two primary features of bathymetry and backscatter, ii a subset of the features chosen by a feature selection process and iii all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well
A robust statistical method for association-based eQTL analysis.

Directory of Open Access Journals (Sweden)

Ning Jiang

Full Text Available It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS is statistical inference of linkage disequilibrium (LD between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations.The analyses show that the new method confers an improved statistical power for detecting genuine genetic association in subpopulations and an effective control of spurious associations stemmed from population structure when compared with other two popularly implemented methods in the literature of GWAS.
Automatic parquet block sorting using real-time spectral classification

Science.gov (United States)

Astrom, Anders; Astrand, Erik; Johansson, Magnus

1999-03-01

This paper presents a real-time spectral classification system based on the PGP spectrograph and a smart image sensor. The PGP is a spectrograph which extracts the spectral information from a scene and projects the information on an image sensor, which is a method often referred to as Imaging Spectroscopy. The classification is based on linear models and categorizes a number of pixels along a line. Previous systems adopting this method have used standard sensors, which often resulted in poor performance. The new system, however, is based on a patented near-sensor classification method, which exploits analogue features on the smart image sensor. The method reduces the enormous amount of data to be processed at an early stage, thus making true real-time spectral classification possible. The system has been evaluated on hardwood parquet boards showing very good results. The color defects considered in the experiments were blue stain, white sapwood, yellow decay and red decay. In addition to these four defect classes, a reference class was used to indicate correct surface color. The system calculates a statistical measure for each parquet block, giving the pixel defect percentage. The patented method makes it possible to run at very high speeds with a high spectral discrimination ability. Using a powerful illuminator, the system can run with a line frequency exceeding 2000 line/s. This opens up the possibility to maintain high production speed and still measure with good resolution.
Statistical Sensitive Data Protection and Inference Prevention with Decision Tree Methods

National Research Council Canada - National Science Library

Chang, LiWu

2003-01-01

.... We consider inference as correct classification and approach it with decision tree methods. As in our previous work, sensitive data are viewed as classes of those test data and non-sensitive data are the rest attribute values...
Nonlinear estimation and classification

CERN Document Server

Hansen, Mark; Holmes, Christopher; Mallick, Bani; Yu, Bin

2003-01-01

Researchers in many disciplines face the formidable task of analyzing massive amounts of high-dimensional and highly-structured data This is due in part to recent advances in data collection and computing technologies As a result, fundamental statistical research is being undertaken in a variety of different fields Driven by the complexity of these new problems, and fueled by the explosion of available computer power, highly adaptive, non-linear procedures are now essential components of modern "data analysis," a term that we liberally interpret to include speech and pattern recognition, classification, data compression and signal processing The development of new, flexible methods combines advances from many sources, including approximation theory, numerical analysis, machine learning, signal processing and statistics The proposed workshop intends to bring together eminent experts from these fields in order to exchange ideas and forge directions for the future
Gender differences of athletes in different classification groups of sports and sport disciplines

Directory of Open Access Journals (Sweden)

Olena Tarasevych

2016-04-01

Full Text Available Purpose: to identify the percentage of masculine, androgynous and feminine figures in different classification groups, sports and sports disciplines, depending on the sport qualification. Material & Methods: the study was conducted on the basis of the Kharkiv State Academy of Physical Culture among students – representatives of different sports that have different athletic skills using analysis and compilation of scientific and methodical literature, survey, testing the procedure S. Bam "Masculinity / femininity "Processing and statistical data. Results: based on the testing method established S. Bam percentage masculine, androgynous and feminine personalities among athletes and athletes in various sports classification groups depending on their athletic skills. Conclusions: among sportsmen and women in a variety of classification groups of sports is not revealed feminine personalities; masculine identity, among both men and women predominate in sports; androgyny attitude towards men and women are different.
Advances in statistical models for data analysis

CERN Document Server

Minerva, Tommaso; Vichi, Maurizio

2015-01-01

This edited volume focuses on recent research results in classification, multivariate statistics and machine learning and highlights advances in statistical models for data analysis. The volume provides both methodological developments and contributions to a wide range of application areas such as economics, marketing, education, social sciences and environment. The papers in this volume were first presented at the 9th biannual meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in September 2013 at the University of Modena and Reggio Emilia, Italy.
Analysis and application of classification methods of complex carbonate reservoirs

Science.gov (United States)

Li, Xiongyan; Qin, Ruibao; Ping, Haitao; Wei, Dan; Liu, Xiaomei

2018-06-01

There are abundant carbonate reservoirs from the Cenozoic to Mesozoic era in the Middle East. Due to variation in sedimentary environment and diagenetic process of carbonate reservoirs, several porosity types coexist in carbonate reservoirs. As a result, because of the complex lithologies and pore types as well as the impact of microfractures, the pore structure is very complicated. Therefore, it is difficult to accurately calculate the reservoir parameters. In order to accurately evaluate carbonate reservoirs, based on the pore structure evaluation of carbonate reservoirs, the classification methods of carbonate reservoirs are analyzed based on capillary pressure curves and flow units. Based on the capillary pressure curves, although the carbonate reservoirs can be classified, the relationship between porosity and permeability after classification is not ideal. On the basis of the flow units, the high-precision functional relationship between porosity and permeability after classification can be established. Therefore, the carbonate reservoirs can be quantitatively evaluated based on the classification of flow units. In the dolomite reservoirs, the average absolute error of calculated permeability decreases from 15.13 to 7.44 mD. Similarly, the average absolute error of calculated permeability of limestone reservoirs is reduced from 20.33 to 7.37 mD. Only by accurately characterizing pore structures and classifying reservoir types, reservoir parameters could be calculated accurately. Therefore, characterizing pore structures and classifying reservoir types are very important to accurate evaluation of complex carbonate reservoirs in the Middle East.
Recharge and discharge of near-surface groundwater in Forsmark. Comparison of classification methods

International Nuclear Information System (INIS)

Werner, Kent; Johansson, Per-Olof; Brydsten, Lars; Bosson, Emma; Berglund, Sten

2007-03-01

This report presents and compares data and models for identification of near-surface groundwater recharge and discharge (RD) areas in Forsmark. The general principles of groundwater recharge and discharge are demonstrated and applied to interpret hydrological and hydrogeological observations made in the Forsmark area. 'Continuous' RD classification methods considered in the study include topographical modelling, map overlays, and hydrological-hydrogeological flow modelling. 'Discrete' (point) methods include field-based and hydrochemistry-based RD classifications of groundwater monitoring well locations. The topographical RD modelling uses the digital elevation model as the only input. The map overlays use background maps of Quaternary deposits, soils, and ground- and field layers of the vegetation/land use map. Further, the hydrological-hydrogeological modelling is performed using the MIKE SHE-MIKE 11 software packages, taking into account e.g. topography, meteorology, hydrogeology, and geometry of watercourses and lakes. The best between-model agreement is found for the topography-based model and the MIKE SHE-MIKE 11 model. The agreement between the topographical model and the map overlays is less good. The agreement between the map overlays on the one hand, and the MIKE SHE and field-based RD classifications on the other, is thought to be less good, as inferred from the comparison made with the topography-based model. However, much improvement of the map overlays can likely be obtained, e.g. by using 'weights' and calibration (such exercises were outside the scope of the present study). For field-classified 'recharge wells', there is a good agreement to the hydrochemistry-based (Piper plot) well classification, but less good for the field-classified 'discharge wells'. In addition, the concentration of the age-dating parameter tritium shows low variability among recharge wells, but a large spread among discharge wells. The usefulness of hydrochemistry-based RD
Inter- and intraobserver reliability of the MTM-classification for proximal humeral fractures

DEFF Research Database (Denmark)

Bahrs, Christian; Schmal, Hagen; Lingenfelter, Erich

2008-01-01

tool. METHODS: Three observers classified plain radiographs of 22 fractures using both a simple version (fracture displacement, number of parts) and an extensive version (individual topographic fracture type and morphology) of the MTM classification. Kappa-statistics were used to determine reliability....... RESULTS: An acceptable reliability was found for the simple version classifying fracture displacement and fractured main parts. Fair interobserver agreement was found for the extensive version with individual topographic fracture type and morphology. CONCLUSION: Although the MTM-classification covers...
Improved classification of Alzheimer's disease data via removal of nuisance variability.

Directory of Open Access Journals (Sweden)

Juha Koikkalainen

Full Text Available Diagnosis of Alzheimer's disease is based on the results of neuropsychological tests and available supporting biomarkers such as the results of imaging studies. The results of the tests and the values of biomarkers are dependent on the nuisance features, such as age and gender. In order to improve diagnostic power, the effects of the nuisance features have to be removed from the data. In this paper, four types of interactions between classification features and nuisance features were identified. Three methods were tested to remove these interactions from the classification data. In stratified analysis, a homogeneous subgroup was generated from a training set. Data correction method utilized linear regression model to remove the effects of nuisance features from data. The third method was a combination of these two methods. The methods were tested using all the baseline data from the Alzheimer's Disease Neuroimaging Initiative database in two classification studies: classifying control subjects from Alzheimer's disease patients and discriminating stable and progressive mild cognitive impairment subjects. The results show that both stratified analysis and data correction are able to statistically significantly improve the classification accuracy of several neuropsychological tests and imaging biomarkers. The improvements were especially large for the classification of stable and progressive mild cognitive impairment subjects, where the best improvements observed were 6% units. The data correction method gave better results for imaging biomarkers, whereas stratified analysis worked well with the neuropsychological tests. In conclusion, the study shows that the excess variability caused by nuisance features should be removed from the data to improve the classification accuracy, and therefore, the reliability of diagnosis making.
Comparative analysis of methods for classification in predicting the quality of bread

OpenAIRE

E. A. Balashova; V. K. Bitjukov; E. A. Savvina

2013-01-01

The comparative analysis of classification methods of two-stage cluster and discriminant analysis and neural networks was performed. System of informative signs which classifies with a minimum of errors has been proposed.
Application of statistical method for FBR plant transient computation

International Nuclear Information System (INIS)

Kikuchi, Norihiro; Mochizuki, Hiroyasu

2014-01-01

Highlights: • A statistical method with a large trial number up to 10,000 is applied to the plant system analysis. • A turbine trip test conducted at the “Monju” reactor is selected as a plant transient. • A reduction method of trial numbers is discussed. • The result with reduced trial number can express the base regions of the computed distribution. -- Abstract: It is obvious that design tolerances, errors included in operation, and statistical errors in empirical correlations effect on the transient behavior. The purpose of the present study is to apply above mentioned statistical errors to a plant system computation in order to evaluate the statistical distribution contained in the transient evolution. A selected computation case is the turbine trip test conducted at 40% electric power of the prototype fast reactor “Monju”. All of the heat transport systems of “Monju” are modeled with the NETFLOW++ system code which has been validated using the plant transient tests of the experimental fast reactor Joyo, and “Monju”. The effects of parameters on upper plenum temperature are confirmed by sensitivity analyses, and dominant parameters are chosen. The statistical errors are applied to each computation deck by using a pseudorandom number and the Monte-Carlo method. The dSFMT (Double precision SIMD-oriented Fast Mersenne Twister) that is developed version of Mersenne Twister (MT), is adopted as the pseudorandom number generator. In the present study, uniform random numbers are generated by dSFMT, and these random numbers are transformed to the normal distribution by the Box–Muller method. Ten thousands of different computations are performed at once. In every computation case, the steady calculation is performed for 12,000 s, and transient calculation is performed for 4000 s. In the purpose of the present statistical computation, it is important that the base regions of distribution functions should be calculated precisely. A large number of
A Comparative Analysis of Classification Algorithms on Diverse Datasets

Directory of Open Access Journals (Sweden)

M. Alghobiri

2018-04-01

Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.
Comparative study of SVM methods combined with voxel selection for object category classification on fMRI data.

Science.gov (United States)

Song, Sutao; Zhan, Zhichao; Long, Zhiying; Zhang, Jiacai; Yao, Li

2011-02-16

Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming. Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time. The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice.
Methods for estimating low-flow statistics for Massachusetts streams

Science.gov (United States)

Ries, Kernell G.; Friesz, Paul J.

2000-01-01

Methods and computer software are described in this report for determining flow duration, low-flow frequency statistics, and August median flows. These low-flow statistics can be estimated for unregulated streams in Massachusetts using different methods depending on whether the location of interest is at a streamgaging station, a low-flow partial-record station, or an ungaged site where no data are available. Low-flow statistics for streamgaging stations can be estimated using standard U.S. Geological Survey methods described in the report. The MOVE.1 mathematical method and a graphical correlation method can be used to estimate low-flow statistics for low-flow partial-record stations. The MOVE.1 method is recommended when the relation between measured flows at a partial-record station and daily mean flows at a nearby, hydrologically similar streamgaging station is linear, and the graphical method is recommended when the relation is curved. Equations are presented for computing the variance and equivalent years of record for estimates of low-flow statistics for low-flow partial-record stations when either a single or multiple index stations are used to determine the estimates. The drainage-area ratio method or regression equations can be used to estimate low-flow statistics for ungaged sites where no data are available. The drainage-area ratio method is generally as accurate as or more accurate than regression estimates when the drainage-area ratio for an ungaged site is between 0.3 and 1.5 times the drainage area of the index data-collection site. Regression equations were developed to estimate the natural, long-term 99-, 98-, 95-, 90-, 85-, 80-, 75-, 70-, 60-, and 50-percent duration flows; the 7-day, 2-year and the 7-day, 10-year low flows; and the August median flow for ungaged sites in Massachusetts. Streamflow statistics and basin characteristics for 87 to 133 streamgaging stations and low-flow partial-record stations were used to develop the equations. The
A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

OpenAIRE

Zekić-Sušac, Marijana; Pfeifer, Sanja; Šarlija, Nataša

2014-01-01

Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART ...
Multivariate statistical methods and data mining in particle physics (4/4)

CERN Multimedia

CERN. Geneva

2008-01-01

The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Multivariate statistical methods and data mining in particle physics (2/4)

CERN Multimedia

CERN. Geneva

2008-01-01

The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.

Multivariate statistical methods and data mining in particle physics (1/4)

CERN Multimedia

CERN. Geneva

2008-01-01

The lectures will cover multivariate statistical methods and their applications in High Energy Physics. The methods will be viewed in the framework of a statistical test, as used e.g. to discriminate between signal and background events. Topics will include an introduction to the relevant statistical formalism, linear test variables, neural networks, probability density estimation (PDE) methods, kernel-based PDE, decision trees and support vector machines. The methods will be evaluated with respect to criteria relevant to HEP analyses such as statistical power, ease of computation and sensitivity to systematic effects. Simple computer examples that can be extended to more complex analyses will be presented.
Acute leukemia classification by ensemble particle swarm model selection.

Science.gov (United States)

Escalante, Hugo Jair; Montes-y-Gómez, Manuel; González, Jesús A; Gómez-Gil, Pilar; Altamirano, Leopoldo; Reyes, Carlos A; Reta, Carolina; Rosales, Alejandro

2012-07-01

Acute leukemia is a malignant disease that affects a large proportion of the world population. Different types and subtypes of acute leukemia require different treatments. In order to assign the correct treatment, a physician must identify the leukemia type or subtype. Advanced and precise methods are available for identifying leukemia types, but they are very expensive and not available in most hospitals in developing countries. Thus, alternative methods have been proposed. An option explored in this paper is based on the morphological properties of bone marrow images, where features are extracted from medical images and standard machine learning techniques are used to build leukemia type classifiers. This paper studies the use of ensemble particle swarm model selection (EPSMS), which is an automated tool for the selection of classification models, in the context of acute leukemia classification. EPSMS is the application of particle swarm optimization to the exploration of the search space of ensembles that can be formed by heterogeneous classification models in a machine learning toolbox. EPSMS does not require prior domain knowledge and it is able to select highly accurate classification models without user intervention. Furthermore, specific models can be used for different classification tasks. We report experimental results for acute leukemia classification with real data and show that EPSMS outperformed the best results obtained using manually designed classifiers with the same data. The highest performance using EPSMS was of 97.68% for two-type classification problems and of 94.21% for more than two types problems. To the best of our knowledge, these are the best results reported for this data set. Compared with previous studies, these improvements were consistent among different type/subtype classification tasks, different features extracted from images, and different feature extraction regions. The performance improvements were statistically significant
Estimating Classification Errors Under Edit Restrictions in Composite Survey-Register Data Using Multiple Imputation Latent Class Modelling (MILC

Directory of Open Access Journals (Sweden)

Boeschoten Laura

2017-12-01

Full Text Available Both registers and surveys can contain classification errors. These errors can be estimated by making use of a composite data set. We propose a new method based on latent class modelling to estimate the number of classification errors across several sources while taking into account impossible combinations with scores on other variables. Furthermore, the latent class model, by multiply imputing a new variable, enhances the quality of statistics based on the composite data set. The performance of this method is investigated by a simulation study, which shows that whether or not the method can be applied depends on the entropy R2 of the latent class model and the type of analysis a researcher is planning to do. Finally, the method is applied to public data from Statistics Netherlands.
Comparisons of likelihood and machine learning methods of individual classification

Science.gov (United States)

Guinand, B.; Topchy, A.; Page, K.S.; Burnham-Curtis, M. K.; Punch, W.F.; Scribner, K.T.

2002-01-01

Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin (“assignment tests”). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high FST), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0–2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to “learn” and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks. In recent years, characterization of highly polymorphic molecular markers such as mini- and microsatellites and development of novel methods of analysis have enabled researchers to extend investigations of ecological and evolutionary processes below the population level to the level of
Statistical methods for evaluating the attainment of cleanup standards

Energy Technology Data Exchange (ETDEWEB)

Gilbert, R.O.; Simpson, J.C.

1992-12-01

This document is the third volume in a series of volumes sponsored by the US Environmental Protection Agency (EPA), Statistical Policy Branch, that provide statistical methods for evaluating the attainment of cleanup Standards at Superfund sites. Volume 1 (USEPA 1989a) provides sampling designs and tests for evaluating attainment of risk-based standards for soils and solid media. Volume 2 (USEPA 1992) provides designs and tests for evaluating attainment of risk-based standards for groundwater. The purpose of this third volume is to provide statistical procedures for designing sampling programs and conducting statistical tests to determine whether pollution parameters in remediated soils and solid media at Superfund sites attain site-specific reference-based standards. This.document is written for individuals who may not have extensive training or experience with statistical methods. The intended audience includes EPA regional remedial project managers, Superfund-site potentially responsible parties, state environmental protection agencies, and contractors for these groups.
Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.

Science.gov (United States)

Becker, Natalia; Toedt, Grischa; Lichter, Peter; Benner, Axel

2011-05-09

Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution. Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (L1) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations. The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.The penalized SVM
G0-WISHART Distribution Based Classification from Polarimetric SAR Images

Science.gov (United States)

Hu, G. C.; Zhao, Q. H.

2017-09-01

Enormous scientific and technical developments have been carried out to further improve the remote sensing for decades, particularly Polarimetric Synthetic Aperture Radar(PolSAR) technique, so classification method based on PolSAR images has getted much more attention from scholars and related department around the world. The multilook polarmetric G0-Wishart model is a more flexible model which describe homogeneous, heterogeneous and extremely heterogeneous regions in the image. Moreover, the polarmetric G0-Wishart distribution dose not include the modified Bessel function of the second kind. It is a kind of simple statistical distribution model with less parameter. To prove its feasibility, a process of classification has been tested with the full-polarized Synthetic Aperture Radar (SAR) image by the method. First, apply multilook polarimetric SAR data process and speckle filter to reduce speckle influence for classification result. Initially classify the image into sixteen classes by H/A/α decomposition. Using the ICM algorithm to classify feature based on the G0-Wshart distance. Qualitative and quantitative results show that the proposed method can classify polaimetric SAR data effectively and efficiently.
Automated artery-venous classification of retinal blood vessels based on structural mapping method

Science.gov (United States)

Joshi, Vinayak S.; Garvin, Mona K.; Reinhardt, Joseph M.; Abramoff, Michael D.

2012-03-01

Retinal blood vessels show morphologic modifications in response to various retinopathies. However, the specific responses exhibited by arteries and veins may provide a precise diagnostic information, i.e., a diabetic retinopathy may be detected more accurately with the venous dilatation instead of average vessel dilatation. In order to analyze the vessel type specific morphologic modifications, the classification of a vessel network into arteries and veins is required. We previously described a method for identification and separation of retinal vessel trees; i.e. structural mapping. Therefore, we propose the artery-venous classification based on structural mapping and identification of color properties prominent to the vessel types. The mean and standard deviation of each of green channel intensity and hue channel intensity are analyzed in a region of interest around each centerline pixel of a vessel. Using the vector of color properties extracted from each centerline pixel, it is classified into one of the two clusters (artery and vein), obtained by the fuzzy-C-means clustering. According to the proportion of clustered centerline pixels in a particular vessel, and utilizing the artery-venous crossing property of retinal vessels, each vessel is assigned a label of an artery or a vein. The classification results are compared with the manually annotated ground truth (gold standard). We applied the proposed method to a dataset of 15 retinal color fundus images resulting in an accuracy of 88.28% correctly classified vessel pixels. The automated classification results match well with the gold standard suggesting its potential in artery-venous classification and the respective morphology analysis.
Statistical methods for accurately determining criticality code bias

International Nuclear Information System (INIS)

Trumble, E.F.; Kimball, K.D.

1997-01-01

A system of statistically treating validation calculations for the purpose of determining computer code bias is provided in this paper. The following statistical treatments are described: weighted regression analysis, lower tolerance limit, lower tolerance band, and lower confidence band. These methods meet the criticality code validation requirements of ANS 8.1. 8 refs., 5 figs., 4 tabs
The Classification of Tongue Colors with Standardized Acquisition and ICC Profile Correction in Traditional Chinese Medicine.

Science.gov (United States)

Qi, Zhen; Tu, Li-Ping; Chen, Jing-Bo; Hu, Xiao-Juan; Xu, Jia-Tuo; Zhang, Zhi-Feng

2016-01-01

Background and Goal . The application of digital image processing techniques and machine learning methods in tongue image classification in Traditional Chinese Medicine (TCM) has been widely studied nowadays. However, it is difficult for the outcomes to generalize because of lack of color reproducibility and image standardization. Our study aims at the exploration of tongue colors classification with a standardized tongue image acquisition process and color correction. Methods . Three traditional Chinese medical experts are chosen to identify the selected tongue pictures taken by the TDA-1 tongue imaging device in TIFF format through ICC profile correction. Then we compare the mean value of L * a * b * of different tongue colors and evaluate the effect of the tongue color classification by machine learning methods. Results . The L * a * b * values of the five tongue colors are statistically different. Random forest method has a better performance than SVM in classification. SMOTE algorithm can increase classification accuracy by solving the imbalance of the varied color samples. Conclusions . At the premise of standardized tongue acquisition and color reproduction, preliminary objectification of tongue color classification in Traditional Chinese Medicine (TCM) is feasible.
A chronicle of permutation statistical methods 1920–2000, and beyond

CERN Document Server

Berry, Kenneth J; Mielke Jr , Paul W

2014-01-01

The focus of this book is on the birth and historical development of permutation statistical methods from the early 1920s to the near present. Beginning with the seminal contributions of R.A. Fisher, E.J.G. Pitman, and others in the 1920s and 1930s, permutation statistical methods were initially introduced to validate the assumptions of classical statistical methods. Permutation methods have advantages over classical methods in that they are optimal for small data sets and non-random samples, are data-dependent, and are free of distributional assumptions. Permutation probability values may be exact, or estimated via moment- or resampling-approximation procedures. Because permutation methods are inherently computationally-intensive, the evolution of computers and computing technology that made modern permutation methods possible accompanies the historical narrative. Permutation analogs of many well-known statistical tests are presented in a historical context, including multiple correlation and regression, ana...
Statistical methods for quality assurance

International Nuclear Information System (INIS)

Rinne, H.; Mittag, H.J.

1989-01-01

This is the first German-language textbook on quality assurance and the fundamental statistical methods that is suitable for private study. The material for this book has been developed from a course of Hagen Open University and is characterized by a particularly careful didactical design which is achieved and supported by numerous illustrations and photographs, more than 100 exercises with complete problem solutions, many fully displayed calculation examples, surveys fostering a comprehensive approach, bibliography with comments. The textbook has an eye to practice and applications, and great care has been taken by the authors to avoid abstraction wherever appropriate, to explain the proper conditions of application of the testing methods described, and to give guidance for suitable interpretation of results. The testing methods explained also include latest developments and research results in order to foster their adoption in practice. (orig.) [de
Comparative analysis of methods for classification in predicting the quality of bread

Directory of Open Access Journals (Sweden)

E. A. Balashova

2013-01-01

Full Text Available The comparative analysis of classification methods of two-stage cluster and discriminant analysis and neural networks was performed. System of informative signs which classifies with a minimum of errors has been proposed.
SAW Classification Algorithm for Chinese Text Classification

OpenAIRE

Xiaoli Guo; Huiyu Sun; Tiehua Zhou; Ling Wang; Zhaoyang Qu; Jiannan Zang

2015-01-01

Considering the explosive growth of data, the increased amount of text data’s effect on the performance of text categorization forward the need for higher requirements, such that the existing classification method cannot be satisfied. Based on the study of existing text classification technology and semantics, this paper puts forward a kind of Chinese text classification oriented SAW (Structural Auxiliary Word) algorithm. The algorithm uses the special space effect of Chinese text where words...
Arabic text classification using Polynomial Networks

Directory of Open Access Journals (Sweden)

Mayy M. Al-Tahrawi

2015-10-01

Full Text Available In this paper, an Arabic statistical learning-based text classification system has been developed using Polynomial Neural Networks. Polynomial Networks have been recently applied to English text classification, but they were never used for Arabic text classification. In this research, we investigate the performance of Polynomial Networks in classifying Arabic texts. Experiments are conducted on a widely used Arabic dataset in text classification: Al-Jazeera News dataset. We chose this dataset to enable direct comparisons of the performance of Polynomial Networks classifier versus other well-known classifiers on this dataset in the literature of Arabic text classification. Results of experiments show that Polynomial Networks classifier is a competitive algorithm to the state-of-the-art ones in the field of Arabic text classification.
Understanding advanced statistical methods

CERN Document Server

Westfall, Peter

2013-01-01

Introduction: Probability, Statistics, and ScienceReality, Nature, Science, and ModelsStatistical Processes: Nature, Design and Measurement, and DataModelsDeterministic ModelsVariabilityParametersPurely Probabilistic Statistical ModelsStatistical Models with Both Deterministic and Probabilistic ComponentsStatistical InferenceGood and Bad ModelsUses of Probability ModelsRandom Variables and Their Probability DistributionsIntroductionTypes of Random Variables: Nominal, Ordinal, and ContinuousDiscrete Probability Distribution FunctionsContinuous Probability Distribution FunctionsSome Calculus-Derivatives and Least SquaresMore Calculus-Integrals and Cumulative Distribution FunctionsProbability Calculation and SimulationIntroductionAnalytic Calculations, Discrete and Continuous CasesSimulation-Based ApproximationGenerating Random NumbersIdentifying DistributionsIntroductionIdentifying Distributions from Theory AloneUsing Data: Estimating Distributions via the HistogramQuantiles: Theoretical and Data-Based Estimate...
An Outlyingness Matrix for Multivariate Functional Data Classification

KAUST Repository

Dai, Wenlin; Genton, Marc G.

2017-01-01

outlyingness with conventional statistical depth. We propose two classifiers based on directional outlyingness and the outlyingness matrix, respectively. Our classifiers provide better performance compared with existing depth-based classifiers when applied on both univariate and multivariate functional data from simulation studies. We also test our methods on two data problems: speech recognition and gesture classification, and obtain results that are consistent with the findings from the simulated data.
Macroscopic Rock Texture Image Classification Using a Hierarchical Neuro-Fuzzy Class Method

Directory of Open Access Journals (Sweden)

Laercio B. Gonçalves

2010-01-01

Full Text Available We used a Hierarchical Neuro-Fuzzy Class Method based on binary space partitioning (NFHB-Class Method for macroscopic rock texture classification. The relevance of this study is in helping Geologists in the diagnosis and planning of oil reservoir exploration. The proposed method is capable of generating its own decision structure, with automatic extraction of fuzzy rules. These rules are linguistically interpretable, thus explaining the obtained data structure. The presented image classification for macroscopic rocks is based on texture descriptors, such as spatial variation coefficient, Hurst coefficient, entropy, and cooccurrence matrix. Four rock classes have been evaluated by the NFHB-Class Method: gneiss (two subclasses, basalt (four subclasses, diabase (five subclasses, and rhyolite (five subclasses. These four rock classes are of great interest in the evaluation of oil boreholes, which is considered a complex task by geologists. We present a computer method to solve this problem. In order to evaluate system performance, we used 50 RGB images for each rock classes and subclasses, thus producing a total of 800 images. For all rock classes, the NFHB-Class Method achieved a percentage of correct hits over 73%. The proposed method converged for all tests presented in the case study.
Recharge and discharge of near-surface groundwater in Forsmark. Comparison of classification methods

Energy Technology Data Exchange (ETDEWEB)

Werner, Kent [Golder Associates AB, Uppsala (Sweden); Johansson, Per-Olof [Artesia Grundvattenkonsult AB, Taeby (Sweden); Brydsten, Lars [Umeaa University, Dept. of Ecology and Environmental Science (Sweden); Bosson, Emma; Berglund, Sten [Swedish Nuclear Fuel and Waste Management Co., Stockholm (Sweden)

2007-03-15

This report presents and compares data and models for identification of near-surface groundwater recharge and discharge (RD) areas in Forsmark. The general principles of groundwater recharge and discharge are demonstrated and applied to interpret hydrological and hydrogeological observations made in the Forsmark area. 'Continuous' RD classification methods considered in the study include topographical modelling, map overlays, and hydrological-hydrogeological flow modelling. 'Discrete' (point) methods include field-based and hydrochemistry-based RD classifications of groundwater monitoring well locations. The topographical RD modelling uses the digital elevation model as the only input. The map overlays use background maps of Quaternary deposits, soils, and ground- and field layers of the vegetation/land use map. Further, the hydrological-hydrogeological modelling is performed using the MIKE SHE-MIKE 11 software packages, taking into account e.g. topography, meteorology, hydrogeology, and geometry of watercourses and lakes. The best between-model agreement is found for the topography-based model and the MIKE SHE-MIKE 11 model. The agreement between the topographical model and the map overlays is less good. The agreement between the map overlays on the one hand, and the MIKE SHE and field-based RD classifications on the other, is thought to be less good, as inferred from the comparison made with the topography-based model. However, much improvement of the map overlays can likely be obtained, e.g. by using 'weights' and calibration (such exercises were outside the scope of the present study). For field-classified 'recharge wells', there is a good agreement to the hydrochemistry-based (Piper plot) well classification, but less good for the field-classified 'discharge wells'. In addition, the concentration of the age-dating parameter tritium shows low variability among recharge wells, but a large spread among discharge
Radar Target Classification using Recursive Knowledge-Based Methods

DEFF Research Database (Denmark)

Jochumsen, Lars Wurtz

The topic of this thesis is target classification of radar tracks from a 2D mechanically scanning coastal surveillance radar. The measurements provided by the radar are position data and therefore the classification is mainly based on kinematic data, which is deduced from the position. The target...... been terminated. Therefore, an update of the classification results must be made for each measurement of the target. The data for this work are collected throughout the PhD and are both collected from radars and other sensors such as GPS....

Statistical trend analysis methods for temporal phenomena

Energy Technology Data Exchange (ETDEWEB)

Lehtinen, E.; Pulkkinen, U. [VTT Automation, (Finland); Poern, K. [Poern Consulting, Nykoeping (Sweden)

1997-04-01

We consider point events occurring in a random way in time. In many applications the pattern of occurrence is of intrinsic interest as indicating a trend or some other systematic feature in the rate of occurrence. The purpose of this report is to survey briefly different statistical trend analysis methods and illustrate their applicability to temporal phenomena in particular. The trend testing of point events is usually seen as the testing of the hypotheses concerning the intensity of the occurrence of events. When the intensity function is parametrized, the testing of trend is a typical parametric testing problem. In industrial applications the operational experience generally does not suggest any specified model and method in advance. Therefore, and particularly, if the Poisson process assumption is very questionable, it is desirable to apply tests that are valid for a wide variety of possible processes. The alternative approach for trend testing is to use some non-parametric procedure. In this report we have presented four non-parametric tests: The Cox-Stuart test, the Wilcoxon signed ranks test, the Mann test, and the exponential ordered scores test. In addition to the classical parametric and non-parametric approaches we have also considered the Bayesian trend analysis. First we discuss a Bayesian model, which is based on a power law intensity model. The Bayesian statistical inferences are based on the analysis of the posterior distribution of the trend parameters, and the probability of trend is immediately seen from these distributions. We applied some of the methods discussed in an example case. It should be noted, that this report is a feasibility study rather than a scientific evaluation of statistical methods, and the examples can only be seen as demonstrations of the methods. 14 refs, 10 figs.
Statistical trend analysis methods for temporal phenomena

International Nuclear Information System (INIS)

Lehtinen, E.; Pulkkinen, U.; Poern, K.

1997-04-01

We consider point events occurring in a random way in time. In many applications the pattern of occurrence is of intrinsic interest as indicating a trend or some other systematic feature in the rate of occurrence. The purpose of this report is to survey briefly different statistical trend analysis methods and illustrate their applicability to temporal phenomena in particular. The trend testing of point events is usually seen as the testing of the hypotheses concerning the intensity of the occurrence of events. When the intensity function is parametrized, the testing of trend is a typical parametric testing problem. In industrial applications the operational experience generally does not suggest any specified model and method in advance. Therefore, and particularly, if the Poisson process assumption is very questionable, it is desirable to apply tests that are valid for a wide variety of possible processes. The alternative approach for trend testing is to use some non-parametric procedure. In this report we have presented four non-parametric tests: The Cox-Stuart test, the Wilcoxon signed ranks test, the Mann test, and the exponential ordered scores test. In addition to the classical parametric and non-parametric approaches we have also considered the Bayesian trend analysis. First we discuss a Bayesian model, which is based on a power law intensity model. The Bayesian statistical inferences are based on the analysis of the posterior distribution of the trend parameters, and the probability of trend is immediately seen from these distributions. We applied some of the methods discussed in an example case. It should be noted, that this report is a feasibility study rather than a scientific evaluation of statistical methods, and the examples can only be seen as demonstrations of the methods
A kernel-based multivariate feature selection method for microarray data classification.

Directory of Open Access Journals (Sweden)

Shiquan Sun

Full Text Available High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, [Formula: see text]-nearest neighbor on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.
Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

Science.gov (United States)

Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

2016-01-01

Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.
Fault classification method for the driving safety of electrified vehicles

Science.gov (United States)

Wanner, Daniel; Drugge, Lars; Stensson Trigell, Annika

2014-05-01

A fault classification method is proposed which has been applied to an electric vehicle. Potential faults in the different subsystems that can affect the vehicle directional stability were collected in a failure mode and effect analysis. Similar driveline faults were grouped together if they resembled each other with respect to their influence on the vehicle dynamic behaviour. The faults were physically modelled in a simulation environment before they were induced in a detailed vehicle model under normal driving conditions. A special focus was placed on faults in the driveline of electric vehicles employing in-wheel motors of the permanent magnet type. Several failures caused by mechanical and other faults were analysed as well. The fault classification method consists of a controllability ranking developed according to the functional safety standard ISO 26262. The controllability of a fault was determined with three parameters covering the influence of the longitudinal, lateral and yaw motion of the vehicle. The simulation results were analysed and the faults were classified according to their controllability using the proposed method. It was shown that the controllability decreased specifically with increasing lateral acceleration and increasing speed. The results for the electric driveline faults show that this trend cannot be generalised for all the faults, as the controllability deteriorated for some faults during manoeuvres with low lateral acceleration and low speed. The proposed method is generic and can be applied to various other types of road vehicles and faults.
Post-boosting of classification boundary for imbalanced data using geometric mean.

Science.gov (United States)

Du, Jie; Vong, Chi-Man; Pun, Chi-Man; Wong, Pak-Kin; Ip, Weng-Fai

2017-12-01

In this paper, a novel imbalance learning method for binary classes is proposed, named as Post-Boosting of classification boundary for Imbalanced data (PBI), which can significantly improve the performance of any trained neural networks (NN) classification boundary. The procedure of PBI simply consists of two steps: an (imbalanced) NN learning method is first applied to produce a classification boundary, which is then adjusted by PBI under the geometric mean (G-mean). For data imbalance, the geometric mean of the accuracies of both minority and majority classes is considered, that is statistically more suitable than the common metric accuracy. PBI also has the following advantages over traditional imbalance methods: (i) PBI can significantly improve the classification accuracy on minority class while improving or keeping that on majority class as well; (ii) PBI is suitable for large data even with high imbalance ratio (up to 0.001). For evaluation of (i), a new metric called Majority loss/Minority advance ratio (MMR) is proposed that evaluates the loss ratio of majority class to minority class. Experiments have been conducted for PBI and several imbalance learning methods over benchmark datasets of different sizes, different imbalance ratios, and different dimensionalities. By analyzing the experimental results, PBI is shown to outperform other imbalance learning methods on almost all datasets. Copyright © 2017 Elsevier Ltd. All rights reserved.
Classification of ADHD children through multimodal Magnetic Resonance Imaging

Directory of Open Access Journals (Sweden)

Dai eDai

2012-09-01

Full Text Available Attention deficit/hyperactivity disorder (ADHD is one of the most common diseases in school-age children. To date, the diagnosis of ADHD is mainly subjective and studies of objective diagnostic method are of great importance. Although many efforts have been made recently to investigate the use of structural and functional brain images for the diagnosis purpose, few of them are related to ADHD. In this paper, we introduce an automatic classification framework based on brain imaging features of ADHD patients, and present in detail the feature extraction, feature selection and classifier training methods. The effects of using different features are compared against each other. In addition, we integrate multimodal image features using multi-kernel learning (MKL. The performance of our framework has been validated in the ADHD-200 Global Competition, which is a world-wide classification contest on the ADHD-200 datasets. In this competition, our classification framework using features of resting-state functional connectivity was ranked the 6th out of 21 participants under the competition scoring policy, and performed the best in terms of sensitivity and J-statistic.
Classification of Vessels in Single-Pol COSMO-SkyMed Images Based on Statistical and Structural Features

Directory of Open Access Journals (Sweden)

Fan Wu

2015-05-01

Full Text Available Vessel monitoring is one of the most important maritime applications of Synthetic Aperture Radar (SAR data. Because of the dihedral reflections between the vessel hull and sea surface and the trihedral reflections among superstructures, vessels usually have strong backscattering in SAR images. Furthermore, in high-resolution SAR images, detailed information on vessel structures can be observed, allowing for vessel classification in high-resolution SAR images. This paper focuses on the feature analysis of merchant vessels, including bulk carriers, container ships and oil tankers, in 3 m resolution COSMO-SkyMed stripmap HIMAGE mode images and proposes a method for vessel classification. After preprocessing, a feature vector is estimated by calculating the average value of the kernel density estimation, three structural features and the mean backscattering coefficient. Support vector machine (SVM classifier is used for the vessel classification, and the results are compared with traditional methods, such as the K-nearest neighbor algorithm (K-NN and minimum distance classifier (MDC. In situ investigations are conducted during the SAR data acquisition. Corresponding Automatic Identification System (AIS reports are also obtained as ground truth to evaluate the effectiveness of the classifier. The preliminary results show that the combination of the average value of the kernel density estimation and mean backscattering coefficient has good ability for classifying the three types of vessels. When adding the three structural features, the results slightly improve. The result of the SVM classifier is better than that of K-NN and MDC. However, the SVM requires more time, when the parameters of the kernel are estimated.
Evaluation of interobserver agreement in Albertoni's classification for mallet finger

Directory of Open Access Journals (Sweden)

Vinícius Alexandre de Souza Almeida

Full Text Available ABSTRACT Objective: To measure the reliability of Albertoni's classification for mallet finger. Methods: Agreement study. Forty-three radiographs of patients with mallet finger were assessed by 19 responders (12 hand surgeons and seven residents. Injuries were classified by Albertoni's classification. For agreement comparison, lesions were grouped as: (A tendon avulsion; (B avulsion fracture; (C fracture of the dorsal lip; and (D physis injury-and subgroups (each group divided into two subgroups. Agreement was assessed by Fleiss's modification for kappa statistics. Results: Agreement was excellent for Group A (k = 0.95 (0.93-0.97 and remained good when separated into A1 and A2. Group B was moderate (k = 0.42 (0.39-0.44 and poor when separated into B1 and B2. In the Group C, agreement was good (k = 0.72 (0.70-0.74, but when separated into C1 and C2, it became moderate. Group D was always poor (k = 0.16 (0.14-0.19. The general agreement was moderate, with (k = 0.57 (0.56-0.58. Conclusion: Albertoni's classification evaluated for interobserver agreement is considered a reproducible classification by the method used in the research.
Electronic nose with a new feature reduction method and a multi-linear classifier for Chinese liquor classification

Energy Technology Data Exchange (ETDEWEB)

Jing, Yaqi; Meng, Qinghao, E-mail: qh-meng@tju.edu.cn; Qi, Peifeng; Zeng, Ming; Li, Wei; Ma, Shugen [Tianjin Key Laboratory of Process Measurement and Control, Institute of Robotics and Autonomous Systems, School of Electrical Engineering and Automation, Tianjin University, Tianjin 300072 (China)

2014-05-15

An electronic nose (e-nose) was designed to classify Chinese liquors of the same aroma style. A new method of feature reduction which combined feature selection with feature extraction was proposed. Feature selection method used 8 feature-selection algorithms based on information theory and reduced the dimension of the feature space to 41. Kernel entropy component analysis was introduced into the e-nose system as a feature extraction method and the dimension of feature space was reduced to 12. Classification of Chinese liquors was performed by using back propagation artificial neural network (BP-ANN), linear discrimination analysis (LDA), and a multi-linear classifier. The classification rate of the multi-linear classifier was 97.22%, which was higher than LDA and BP-ANN. Finally the classification of Chinese liquors according to their raw materials and geographical origins was performed using the proposed multi-linear classifier and classification rate was 98.75% and 100%, respectively.
Electronic nose with a new feature reduction method and a multi-linear classifier for Chinese liquor classification

International Nuclear Information System (INIS)

Jing, Yaqi; Meng, Qinghao; Qi, Peifeng; Zeng, Ming; Li, Wei; Ma, Shugen

2014-01-01

An electronic nose (e-nose) was designed to classify Chinese liquors of the same aroma style. A new method of feature reduction which combined feature selection with feature extraction was proposed. Feature selection method used 8 feature-selection algorithms based on information theory and reduced the dimension of the feature space to 41. Kernel entropy component analysis was introduced into the e-nose system as a feature extraction method and the dimension of feature space was reduced to 12. Classification of Chinese liquors was performed by using back propagation artificial neural network (BP-ANN), linear discrimination analysis (LDA), and a multi-linear classifier. The classification rate of the multi-linear classifier was 97.22%, which was higher than LDA and BP-ANN. Finally the classification of Chinese liquors according to their raw materials and geographical origins was performed using the proposed multi-linear classifier and classification rate was 98.75% and 100%, respectively
Statistical methods and challenges in connectome genetics

KAUST Repository

Pluta, Dustin; Yu, Zhaoxia; Shen, Tong; Chen, Chuansheng; Xue, Gui; Ombao, Hernando

2018-01-01

The study of genetic influences on brain connectivity, known as connectome genetics, is an exciting new direction of research in imaging genetics. We here review recent results and current statistical methods in this area, and discuss some
A fuzzy decision tree method for fault classification in the steam generator of a pressurized water reactor

International Nuclear Information System (INIS)

Zio, Enrico; Baraldi, Piero; Popescu, Irina Crenguta

2009-01-01

This paper extends a method previously introduced by the authors for building a transparent fault classification algorithm by combining the fuzzy clustering, fuzzy logic and decision trees techniques. The baseline method transforms an opaque, fuzzy clustering-based classification model into a fuzzy logic inference model based on linguistic rules which can be represented by a decision tree formalism. The classification model thereby obtained is transparent in that it allows direct interpretation and inspection of the model. An extension in the procedure for the development of the fuzzy logic inference model is introduced to allow the treatment of more complicated cases, e.g. splitted and overlapping clusters. The corresponding computational tool developed relies on a number of parameters which can be tuned by the user to optimally compromise the level of transparency of the classification process and its efficiency. A numerical application is presented with regards to the fault classification in the Steam Generator of a Pressurized Water Reactor.
Quantitative EEG Applying the Statistical Recognition Pattern Method

DEFF Research Database (Denmark)

Engedal, Knut; Snaedal, Jon; Hoegh, Peter

2015-01-01

BACKGROUND/AIM: The aim of this study was to examine the discriminatory power of quantitative EEG (qEEG) applying the statistical pattern recognition (SPR) method to separate Alzheimer's disease (AD) patients from elderly individuals without dementia and from other dementia patients. METHODS...
Comparative Estimation of Russia’s Regions Investment Potential on the Base of the Multivariate Statistical Analysis

Directory of Open Access Journals (Sweden)

Victor V. Nikitin

2013-01-01

Full Text Available The article introduces the algorithm of Russia’s regions investment potential estimation, developed by means of multivariate statistical methods, determines the factors, reflecting regions investment state. The integral indicator was developed on their basis, using statistical data. The article presents regions’ classification on the basis of the integral index
Classification and Analysis of Computer Network Traffic

DEFF Research Database (Denmark)

Bujlow, Tomasz

2014-01-01

various classification modes (decision trees, rulesets, boosting, softening thresholds) regarding the classification accuracy and the time required to create the classifier. We showed how to use our VBS tool to obtain per-flow, per-application, and per-content statistics of traffic in computer networks...
32nd Annual Conference of the Gesellschaft für Klassifikation e.V., Joint Conference with the British Classification Society (BCS) and the Dutch/Flemish Classification Society (VOC), Helmut-Schmidt-University

CERN Document Server

Lausen, Berthold; Seidel, Wilfried; Ultsch, Alfred

2010-01-01

Data Analysis, Data Handling and Business Intelligence are research areas at the intersection of computer science, artificial intelligence, mathematics, and statistics. They cover general methods and techniques that can be applied to a vast set of applications such as in marketing, finance, economics, engineering, linguistics, archaeology, musicology, medical science, and biology. This volume contains the revised versions of selected papers presented during the 32nd Annual Conference of the German Classification Society (Gesellschaft für Klassifikation, GfKl). The conference, which was organized in cooperation with the British Classification Society (BCS) and the Dutch/Flemish Classification Society (VOC), was hosted by Helmut-Schmidt-University, Hamburg, Germany, in July 2008.
Multinomial Response Models, for Modeling and Determining Important Factors in Different Contraceptive Methods in Women

Directory of Open Access Journals (Sweden)

E Haji Nejad

2001-06-01

Full Text Available Difference aspects of multinomial statistical modelings and its classifications has been studied so far. In these type of problems Y is the qualitative random variable with T possible states which are considered as classifications. The goal is prediction of Y based on a random Vector X ? IR^m. Many methods for analyzing these problems were considered. One of the modern and general method of classification is Classification and Regression Trees (CART. Another method is recursive partitioning techniques which has a strange relationship with nonparametric regression. Classical discriminant analysis is a standard method for analyzing these type of data. Flexible discriminant analysis method which is a combination of nonparametric regression and discriminant analysis and classification using spline that includes least square regression and additive cubic splines. Neural network is an advanced statistical method for analyzing these types of data. In this paper properties of multinomial logistics regression were investigated and this method was used for modeling effective factors in selecting contraceptive methods in Ghom province for married women age 15-49. The response variable has a tetranomial distibution. The levels of this variable are: nothing, pills, traditional and a collection of other contraceptive methods. A collection of significant independent variables were: place, age of women, education, history of pregnancy and family size. Menstruation age and age at marriage were not statistically significant.
Longitudinal data analysis a handbook of modern statistical methods

CERN Document Server

Fitzmaurice, Garrett; Verbeke, Geert; Molenberghs, Geert

2008-01-01

Although many books currently available describe statistical models and methods for analyzing longitudinal data, they do not highlight connections between various research threads in the statistical literature. Responding to this void, Longitudinal Data Analysis provides a clear, comprehensive, and unified overview of state-of-the-art theory and applications. It also focuses on the assorted challenges that arise in analyzing longitudinal data. After discussing historical aspects, leading researchers explore four broad themes: parametric modeling, nonparametric and semiparametric methods, joint
Statistical methods for assessing agreement between continuous measurements

DEFF Research Database (Denmark)

Sokolowski, Ineta; Hansen, Rikke Pilegaard; Vedsted, Peter

Background: Clinical research often involves study of agreement amongst observers. Agreement can be measured in different ways, and one can obtain quite different values depending on which method one uses. Objective: We review the approaches that have been discussed to assess the agreement between...... continuous measures and discuss their strengths and weaknesses. Different methods are illustrated using actual data from the `Delay in diagnosis of cancer in general practice´ project in Aarhus, Denmark. Subjects and Methods: We use weighted kappa-statistic, intraclass correlation coefficient (ICC......), concordance coefficient, Bland-Altman limits of agreement and percentage of agreement to assess the agreement between patient reported delay and doctor reported delay in diagnosis of cancer in general practice. Key messages: The correct statistical approach is not obvious. Many studies give the product...

Digitisation of films and texture analysis for digital classification of pulmonary opacities

International Nuclear Information System (INIS)

Desaga, J.F.; Dengler, J.; Wolf, T.; Engelmann, U.; Scheppelmann, D.; Meinzer, H.P.

1988-01-01

The study aimed at evaluating the effect of different methods of digitisation of radiographic films on the digital classification of pulmonary opacities. Test sets from the standard of the International Labour Office (ILO) Classification of Radiographs of Pneumoconiosis were prepared by film digitsation using a scanning microdensitometer or a video digitiser based on a personal computer equipped with a real time digitiser board and a vidicon or a Charge Coupled Device (CCD) camera. Seven different algorithms were used for texture analysis resulting in 16 texture parameters for each region. All methods used for texture analysis were independent of the mean grey value level and the size of the image analysed. Classification was performed by discriminant analysis using the classes from the ILO classification. A hit ratio of at least 85% was achieved for a digitisation by scanner digitisation or the vidicon, while the corresponding results of the CCD camera were significantly less good. Classification by texture analysis of opacities of chest X-rays of pneumoconiosis digitised by a personal computer based video digitiser and a vidicon are of equal quality compared to digitisation by a scanning microdensitometer. Correct classification of 90% was achieved via the described statistical approach. (orig.) [de
Beyond Zar: the use and abuse of classification statistics for otolith chemistry.

Science.gov (United States)

Jones, C M; Palmer, M; Schaffler, J J

2017-02-01

Classification method performance was evaluated using otolith chemistry of juvenile Atlantic menhaden Brevoortia tyrannus when assumptions of data normality were met and were violated. Four methods were tested [linear discriminant function analysis (LDFA), quadratic discriminant function analysis (QDFA), random forest (RF) and artificial neural networks (ANN)] using computer simulation to determine their performance when variable-group means ranged from small to large and their performance under conditions of typical skewness to double the amount of skewness typically observed. Using the kappa index, the parametric methods performed best after applying appropriate data transformation, gaining 2% better performance with LDFA performing slightly better than QDFA. RF performed as well as QDFA and showed no difference in performance between raw and transformed data while the performance of ANN was the poorest and worse with raw data. All methods performed well when group differences were large, but parametric methods outperformed machine-learning methods. When data were skewed the performance of all methods declined and worsened with greater skewness, but RF performed consistently as well or better than the other methods in the presence of skewness. The parametric methods were found to be more powerful when assumptions of normality can be met and can be used confidently when skewness and kurtosis are minimized. When these assumptions cannot be minimized, then machine-algorithm methods should also be tried. © 2016 The Fisheries Society of the British Isles.
Comparison of four statistical and machine learning methods for crash severity prediction.

Science.gov (United States)

Iranitalab, Amirfarrokh; Khattak, Aemal

2017-11-01

Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.
An Object-Oriented Classification Method on High Resolution Satellite Data

Science.gov (United States)

2004-11-01

25th ACRS 2004 Chiang Mai , Thailand 347 Data Processing B-4.6 AN OBJECT-ORIENTED CLASSIFICATION METHOD ON...unlimited 13. SUPPLEMENTARY NOTES Proceedings of the 25th Asian Conference on Remote Sensing, Held in Chiang Mai , Thailand on 22-26 November 2004...panchromatic (left) and multispectral (right) 25th ACRS 2004 Chiang Mai , Thailand 349 Data Processing B-4.6 First of all, the
Statistical and Machine Learning forecasting methods: Concerns and ways forward

Science.gov (United States)

Makridakis, Spyros; Assimakopoulos, Vassilios

2018-01-01

Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784
Statistical and Machine Learning forecasting methods: Concerns and ways forward.

Science.gov (United States)

Makridakis, Spyros; Spiliotis, Evangelos; Assimakopoulos, Vassilios

2018-01-01

Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.
A neural network-based optimal spatial filter design method for motor imagery classification.

Directory of Open Access Journals (Sweden)

Ayhan Yuksel

Full Text Available In this study, a novel spatial filter design method is introduced. Spatial filtering is an important processing step for feature extraction in motor imagery-based brain-computer interfaces. This paper introduces a new motor imagery signal classification method combined with spatial filter optimization. We simultaneously train the spatial filter and the classifier using a neural network approach. The proposed spatial filter network (SFN is composed of two layers: a spatial filtering layer and a classifier layer. These two layers are linked to each other with non-linear mapping functions. The proposed method addresses two shortcomings of the common spatial patterns (CSP algorithm. First, CSP aims to maximize the between-classes variance while ignoring the minimization of within-classes variances. Consequently, the features obtained using the CSP method may have large within-classes variances. Second, the maximizing optimization function of CSP increases the classification accuracy indirectly because an independent classifier is used after the CSP method. With SFN, we aimed to maximize the between-classes variance while minimizing within-classes variances and simultaneously optimizing the spatial filter and the classifier. To classify motor imagery EEG signals, we modified the well-known feed-forward structure and derived forward and backward equations that correspond to the proposed structure. We tested our algorithm on simple toy data. Then, we compared the SFN with conventional CSP and its multi-class version, called one-versus-rest CSP, on two data sets from BCI competition III. The evaluation results demonstrate that SFN is a good alternative for classifying motor imagery EEG signals with increased classification accuracy.
Computerized statistical analysis with bootstrap method in nuclear medicine

International Nuclear Information System (INIS)

Zoccarato, O.; Sardina, M.; Zatta, G.; De Agostini, A.; Barbesti, S.; Mana, O.; Tarolo, G.L.

1988-01-01

Statistical analysis of data samples involves some hypothesis about the features of data themselves. The accuracy of these hypotheses can influence the results of statistical inference. Among the new methods of computer-aided statistical analysis, the bootstrap method appears to be one of the most powerful, thanks to its ability to reproduce many artificial samples starting from a single original sample and because it works without hypothesis about data distribution. The authors applied the bootstrap method to two typical situation of Nuclear Medicine Department. The determination of the normal range of serum ferritin, as assessed by radioimmunoassay and defined by the mean value ±2 standard deviations, starting from an experimental sample of small dimension, shows an unacceptable lower limit (ferritin plasmatic levels below zero). On the contrary, the results obtained by elaborating 5000 bootstrap samples gives ans interval of values (10.95 ng/ml - 72.87 ng/ml) corresponding to the normal ranges commonly reported. Moreover the authors applied the bootstrap method in evaluating the possible error associated with the correlation coefficient determined between left ventricular ejection fraction (LVEF) values obtained by first pass radionuclide angiocardiography with 99m Tc and 195m Au. The results obtained indicate a high degree of statistical correlation and give the range of r 2 values to be considered acceptable for this type of studies
Statistical methods to monitor the West Valley off-gas system

International Nuclear Information System (INIS)

Eggett, D.L.

1990-01-01

This paper reports on the of-gas system for the ceramic melter operated at the West Valley Demonstration Project at West Valley, NY, monitored during melter operation. A one-at-a-time method of monitoring the parameters of the off-gas system is not statistically sound. Therefore, multivariate statistical methods appropriate for the monitoring of many correlated parameters will be used. Monitoring a large number of parameters increases the probability of a false out-of-control signal. If the parameters being monitored are statistically independent, the control limits can be easily adjusted to obtain the desired probability of a false out-of-control signal. The principal component (PC) scores have desirable statistical properties when the original variables are distributed as multivariate normals. Two statistics derived from the PC scores and used to form multivariate control charts are outlined and their distributional properties reviewed
Statistical methods of parameter estimation for deterministically chaotic time series

Science.gov (United States)

Pisarenko, V. F.; Sornette, D.

2004-03-01

We discuss the possibility of applying some standard statistical methods (the least-square method, the maximum likelihood method, and the method of statistical moments for estimation of parameters) to deterministically chaotic low-dimensional dynamic system (the logistic map) containing an observational noise. A “segmentation fitting” maximum likelihood (ML) method is suggested to estimate the structural parameter of the logistic map along with the initial value x1 considered as an additional unknown parameter. The segmentation fitting method, called “piece-wise” ML, is similar in spirit but simpler and has smaller bias than the “multiple shooting” previously proposed. Comparisons with different previously proposed techniques on simulated numerical examples give favorable results (at least, for the investigated combinations of sample size N and noise level). Besides, unlike some suggested techniques, our method does not require the a priori knowledge of the noise variance. We also clarify the nature of the inherent difficulties in the statistical analysis of deterministically chaotic time series and the status of previously proposed Bayesian approaches. We note the trade off between the need of using a large number of data points in the ML analysis to decrease the bias (to guarantee consistency of the estimation) and the unstable nature of dynamical trajectories with exponentially fast loss of memory of the initial condition. The method of statistical moments for the estimation of the parameter of the logistic map is discussed. This method seems to be the unique method whose consistency for deterministically chaotic time series is proved so far theoretically (not only numerically).
An Overview of Short-term Statistical Forecasting Methods

DEFF Research Database (Denmark)

Elias, Russell J.; Montgomery, Douglas C.; Kulahci, Murat

2006-01-01

An overview of statistical forecasting methodology is given, focusing on techniques appropriate to short- and medium-term forecasts. Topics include basic definitions and terminology, smoothing methods, ARIMA models, regression methods, dynamic regression models, and transfer functions. Techniques...... for evaluating and monitoring forecast performance are also summarized....
An analysis of feature relevance in the classification of astronomical transients with machine learning methods

Science.gov (United States)

D'Isanto, A.; Cavuoti, S.; Brescia, M.; Donalek, C.; Longo, G.; Riccio, G.; Djorgovski, S. G.

2016-04-01

The exploitation of present and future synoptic (multiband and multi-epoch) surveys requires an extensive use of automatic methods for data processing and data interpretation. In this work, using data extracted from the Catalina Real Time Transient Survey (CRTS), we investigate the classification performance of some well tested methods: Random Forest, MultiLayer Perceptron with Quasi Newton Algorithm and K-Nearest Neighbours, paying special attention to the feature selection phase. In order to do so, several classification experiments were performed. Namely: identification of cataclysmic variables, separation between galactic and extragalactic objects and identification of supernovae.
Non-Statistical Methods of Analysing of Bankruptcy Risk

Directory of Open Access Journals (Sweden)

Pisula Tomasz

2015-06-01

Full Text Available The article focuses on assessing the effectiveness of a non-statistical approach to bankruptcy modelling in enterprises operating in the logistics sector. In order to describe the issue more comprehensively, the aforementioned prediction of the possible negative results of business operations was carried out for companies functioning in the Polish region of Podkarpacie, and in Slovakia. The bankruptcy predictors selected for the assessment of companies operating in the logistics sector included 28 financial indicators characterizing these enterprises in terms of their financial standing and management effectiveness. The purpose of the study was to identify factors (models describing the bankruptcy risk in enterprises in the context of their forecasting effectiveness in a one-year and two-year time horizon. In order to assess their practical applicability the models were carefully analysed and validated. The usefulness of the models was assessed in terms of their classification properties, and the capacity to accurately identify enterprises at risk of bankruptcy and healthy companies as well as proper calibration of the models to the data from training sample sets.
An Outlyingness Matrix for Multivariate Functional Data Classification

KAUST Repository

Dai, Wenlin

2017-08-25

The classification of multivariate functional data is an important task in scientific research. Unlike point-wise data, functional data are usually classified by their shapes rather than by their scales. We define an outlyingness matrix by extending directional outlyingness, an effective measure of the shape variation of curves that combines the direction of outlyingness with conventional statistical depth. We propose two classifiers based on directional outlyingness and the outlyingness matrix, respectively. Our classifiers provide better performance compared with existing depth-based classifiers when applied on both univariate and multivariate functional data from simulation studies. We also test our methods on two data problems: speech recognition and gesture classification, and obtain results that are consistent with the findings from the simulated data.
Direct Learning of Systematics-Aware Summary Statistics

CERN Multimedia

CERN. Geneva

2018-01-01

Complex machine learning tools, such as deep neural networks and gradient boosting algorithms, are increasingly being used to construct powerful discriminative features for High Energy Physics analyses. These methods are typically trained with simulated or auxiliary data samples by optimising some classification or regression surrogate objective. The learned feature representations are then used to build a sample-based statistical model to perform inference (e.g. interval estimation or hypothesis testing) over a set of parameters of interest. However, the effectiveness of the mentioned approach can be reduced by the presence of known uncertainties that cause differences between training and experimental data, included in the statistical model via nuisance parameters. This work presents an end-to-end algorithm, which leverages on existing deep learning technologies but directly aims to produce inference-optimal sample-summary statistics. By including the statistical model and a differentiable approximation of ...
Order statistics & inference estimation methods

CERN Document Server

Balakrishnan, N

1991-01-01

The literature on order statistics and inferenc eis quite extensive and covers a large number of fields ,but most of it is dispersed throughout numerous publications. This volume is the consolidtion of the most important results and places an emphasis on estimation. Both theoretical and computational procedures are presented to meet the needs of researchers, professionals, and students. The methods of estimation discussed are well-illustrated with numerous practical examples from both the physical and life sciences, including sociology,psychology,a nd electrical and chemical engineering. A co
Computerized Classification Testing with the Rasch Model

Science.gov (United States)

Eggen, Theo J. H. M.

2011-01-01

If classification in a limited number of categories is the purpose of testing, computerized adaptive tests (CATs) with algorithms based on sequential statistical testing perform better than estimation-based CATs (e.g., Eggen & Straetmans, 2000). In these computerized classification tests (CCTs), the Sequential Probability Ratio Test (SPRT) (Wald,…
Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders.

Science.gov (United States)

Subasi, Abdulhamit

2013-06-01

Support vector machine (SVM) is an extensively used machine learning method with many biomedical signal classification applications. In this study, a novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy. This optimization mechanism involves kernel parameter setting in the SVM training procedure, which significantly influences the classification accuracy. The experiments were conducted on the basis of EMG signal to classify into normal, neurogenic or myopathic. In the proposed method the EMG signals were decomposed into the frequency sub-bands using discrete wavelet transform (DWT) and a set of statistical features were extracted from these sub-bands to represent the distribution of wavelet coefficients. The obtained results obviously validate the superiority of the SVM method compared to conventional machine learning methods, and suggest that further significant enhancements in terms of classification accuracy can be achieved by the proposed PSO-SVM classification system. The PSO-SVM yielded an overall accuracy of 97.41% on 1200 EMG signals selected from 27 subject records against 96.75%, 95.17% and 94.08% for the SVM, the k-NN and the RBF classifiers, respectively. PSO-SVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of PSO-SVM for diagnosis of neuromuscular disorders. Copyright © 2013 Elsevier Ltd. All rights reserved.
Selective ablation of Copper-Indium-Diselenide solar cells monitored by laser-induced breakdown spectroscopy and classification methods

Energy Technology Data Exchange (ETDEWEB)

Diego-Vallejo, David [Technische Universität Berlin, Institute of Optics and Atomic Physics, Straße des 17, Juni 135, 10623 Berlin (Germany); Laser- und Medizin- Technologie Berlin GmbH (LMTB), Applied Laser Technology, Fabeckstr. 60-62, 14195 Berlin (Germany); Ashkenasi, David, E-mail: d.ashkenasi@lmtb.de [Laser- und Medizin- Technologie Berlin GmbH (LMTB), Applied Laser Technology, Fabeckstr. 60-62, 14195 Berlin (Germany); Lemke, Andreas [Laser- und Medizin- Technologie Berlin GmbH (LMTB), Applied Laser Technology, Fabeckstr. 60-62, 14195 Berlin (Germany); Eichler, Hans Joachim [Technische Universität Berlin, Institute of Optics and Atomic Physics, Straße des 17, Juni 135, 10623 Berlin (Germany); Laser- und Medizin- Technologie Berlin GmbH (LMTB), Applied Laser Technology, Fabeckstr. 60-62, 14195 Berlin (Germany)

2013-09-01

Laser-induced breakdown spectroscopy (LIBS) and two classification methods, i.e. linear correlation and artificial neural networks (ANN), are used to monitor P1, P2 and P3 scribing steps of Copper-Indium-Diselenide (CIS) solar cells. Narrow channels featuring complete removal of desired layers with minimum damage on the underlying film are expected to enhance efficiency of solar cells. The monitoring technique is intended to determine that enough material has been removed to reach the desired layer based on the analysis of plasma emission acquired during multiple pass laser scribing. When successful selective scribing is achieved, a high degree of similarity between test and reference spectra has to be identified by classification methods in order to stop the scribing procedure and avoid damaging the bottom layer. Performance of linear correlation and artificial neural networks is compared and evaluated for two spectral bandwidths. By using experimentally determined combinations of classifier and analyzed spectral band for each step, classification performance achieves errors of 7, 1 and 4% for steps P1, P2 and P3, respectively. The feasibility of using plasma emission for the supervision of processing steps of solar cell manufacturing is demonstrated. This method has the potential to be implemented as an online monitoring procedure assisting the production of solar cells. - Highlights: • LIBS and two classification methods were used to monitor CIS solar cells processing. • Selective ablation of thin-film solar cells was improved with inspection system. • Customized classification method and analyzed spectral band enhanced performance.
A pentatonic classification of extreme events

International Nuclear Information System (INIS)

Eliazar, Iddo; Cohen, Morrel H.

2015-01-01

In this paper we present a classification of the extreme events – very small and very large outcomes – of positive-valued random variables. The classification distinguishes five different categories of randomness, ranging from the very ‘mild’ to the very ‘wild’. In analogy with the common five-tone musical scale we term the classification ‘pentatonic’. The classification is based on the analysis of the inherent Gibbsian ‘forces’ and ‘temperatures’ existing on the logarithmic scale of the random variables under consideration, and provides a statistical-physics insight regarding the nature of these random variables. The practical application of the pentatonic classification is remarkably straightforward, it can be performed by non-experts, and it is demonstrated via an array of examples

Waste classification and methods applied to specific disposal sites

International Nuclear Information System (INIS)

Rogers, V.C.

1979-01-01

An adequate definition of the classes of radioactive wastes is necessary to regulating the disposal of radioactive wastes. A classification system is proposed in which wastes are classified according to characteristics relating to their disposal. Several specific sites are analyzed with the methodology in order to gain insights into the classification of radioactive wastes. Also presented is the analysis of ocean dumping as it applies to waste classification. 5 refs
Short text sentiment classification based on feature extension and ensemble classifier

Science.gov (United States)

Liu, Yang; Zhu, Xie

2018-05-01

With the rapid development of Internet social media, excavating the emotional tendencies of the short text information from the Internet, the acquisition of useful information has attracted the attention of researchers. At present, the commonly used can be attributed to the rule-based classification and statistical machine learning classification methods. Although micro-blog sentiment analysis has made good progress, there still exist some shortcomings such as not highly accurate enough and strong dependence from sentiment classification effect. Aiming at the characteristics of Chinese short texts, such as less information, sparse features, and diverse expressions, this paper considers expanding the original text by mining related semantic information from the reviews, forwarding and other related information. First, this paper uses Word2vec to compute word similarity to extend the feature words. And then uses an ensemble classifier composed of SVM, KNN and HMM to analyze the emotion of the short text of micro-blog. The experimental results show that the proposed method can make good use of the comment forwarding information to extend the original features. Compared with the traditional method, the accuracy, recall and F1 value obtained by this method have been improved.
Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning

Directory of Open Access Journals (Sweden)

Chuan Li

2016-06-01

Full Text Available Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM. The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.
Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning.

Science.gov (United States)

Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

2016-06-17

Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.
A simple semi-automatic approach for land cover classification from multispectral remote sensing imagery.

Directory of Open Access Journals (Sweden)

Dong Jiang

Full Text Available Land cover data represent a fundamental data source for various types of scientific research. The classification of land cover based on satellite data is a challenging task, and an efficient classification method is needed. In this study, an automatic scheme is proposed for the classification of land use using multispectral remote sensing images based on change detection and a semi-supervised classifier. The satellite image can be automatically classified using only the prior land cover map and existing images; therefore human involvement is reduced to a minimum, ensuring the operability of the method. The method was tested in the Qingpu District of Shanghai, China. Using Environment Satellite 1(HJ-1 images of 2009 with 30 m spatial resolution, the areas were classified into five main types of land cover based on previous land cover data and spectral features. The results agreed on validation of land cover maps well with a Kappa value of 0.79 and statistical area biases in proportion less than 6%. This study proposed a simple semi-automatic approach for land cover classification by using prior maps with satisfied accuracy, which integrated the accuracy of visual interpretation and performance of automatic classification methods. The method can be used for land cover mapping in areas lacking ground reference information or identifying rapid variation of land cover regions (such as rapid urbanization with convenience.
Statistical classification techniques in high energy physics (SDDT algorithm)

International Nuclear Information System (INIS)

Bouř, Petr; Kůs, Václav; Franc, Jiří

2016-01-01

We present our proposal of the supervised binary divergence decision tree with nested separation method based on the generalized linear models. A key insight we provide is the clustering driven only by a few selected physical variables. The proper selection consists of the variables achieving the maximal divergence measure between two different classes. Further, we apply our method to Monte Carlo simulations of physics processes corresponding to a data sample of top quark-antiquark pair candidate events in the lepton+jets decay channel. The data sample is produced in pp̅ collisions at √S = 1.96 TeV. It corresponds to an integrated luminosity of 9.7 fb"-"1 recorded with the D0 detector during Run II of the Fermilab Tevatron Collider. The efficiency of our algorithm achieves 90% AUC in separating signal from background. We also briefly deal with the modification of statistical tests applicable to weighted data sets in order to test homogeneity of the Monte Carlo simulations and measured data. The justification of these modified tests is proposed through the divergence tests. (paper)
Threshold selection for classification of MR brain images by clustering method

Energy Technology Data Exchange (ETDEWEB)

Moldovanu, Simona [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania); Dumitru Moţoc High School, 15 Milcov St., 800509, Galaţi (Romania); Obreja, Cristian; Moraru, Luminita, E-mail: luminita.moraru@ugal.ro [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania)

2015-12-07

Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease.
Statistical analysis of the DIAMOND MI study by the multipole method

DEFF Research Database (Denmark)

Olesen, R.M.; Thomsen, P.E.B.; Særmark, Knud

2005-01-01

We present a new method to describe the dynamics of the beat-to-beat RR time series. The classification of the phase-space plots obtained from RR time series is performed by a calculation of parameters which describe the features of the two-dimensional plot. We demonstrate that every parameter has...
Fundamentals of modern statistical methods substantially improving power and accuracy

CERN Document Server

Wilcox, Rand R

2001-01-01

Conventional statistical methods have a very serious flaw They routinely miss differences among groups or associations among variables that are detected by more modern techniques - even under very small departures from normality Hundreds of journal articles have described the reasons standard techniques can be unsatisfactory, but simple, intuitive explanations are generally unavailable Improved methods have been derived, but they are far from obvious or intuitive based on the training most researchers receive Situations arise where even highly nonsignificant results become significant when analyzed with more modern methods Without assuming any prior training in statistics, Part I of this book describes basic statistical principles from a point of view that makes their shortcomings intuitive and easy to understand The emphasis is on verbal and graphical descriptions of concepts Part II describes modern methods that address the problems covered in Part I Using data from actual studies, many examples are include...
Statistical methods and challenges in connectome genetics

KAUST Repository

Pluta, Dustin

2018-03-12

The study of genetic influences on brain connectivity, known as connectome genetics, is an exciting new direction of research in imaging genetics. We here review recent results and current statistical methods in this area, and discuss some of the persistent challenges and possible directions for future work.
A comprehensive simulation study on classification of RNA-Seq data.

Directory of Open Access Journals (Sweden)

Gökmen Zararsız

Full Text Available RNA sequencing (RNA-Seq is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM, classification and regression trees (CART, and random forests (RF. We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count
Literature in Focus: Statistical Methods in Experimental Physics

CERN Multimedia

2007-01-01

Frederick James was a high-energy physicist who became the CERN "expert" on statistics and is now well-known around the world, in part for this famous text. The first edition of Statistical Methods in Experimental Physics was originally co-written with four other authors and was published in 1971 by North Holland (now an imprint of Elsevier). It became such an important text that demand for it has continued for more than 30 years. Fred has updated it and it was released in a second edition by World Scientific in 2006. It is still a top seller and there is no exaggeration in calling it «the» reference on the subject. A full review of the title appeared in the October CERN Courier.Come and meet the author to hear more about how this book has flourished during its 35-year lifetime. Frederick James Statistical Methods in Experimental Physics Monday, 26th of November, 4 p.m. Council Chamber (Bldg. 503-1-001) The author will be introduced...
Heterogeneous Rock Simulation Using DIP-Micromechanics-Statistical Methods

Directory of Open Access Journals (Sweden)

H. Molladavoodi

2018-01-01

Full Text Available Rock as a natural material is heterogeneous. Rock material consists of minerals, crystals, cement, grains, and microcracks. Each component of rock has a different mechanical behavior under applied loading condition. Therefore, rock component distribution has an important effect on rock mechanical behavior, especially in the postpeak region. In this paper, the rock sample was studied by digital image processing (DIP, micromechanics, and statistical methods. Using image processing, volume fractions of the rock minerals composing the rock sample were evaluated precisely. The mechanical properties of the rock matrix were determined based on upscaling micromechanics. In order to consider the rock heterogeneities effect on mechanical behavior, the heterogeneity index was calculated in a framework of statistical method. A Weibull distribution function was fitted to the Young modulus distribution of minerals. Finally, statistical and Mohr–Coulomb strain-softening models were used simultaneously as a constitutive model in DEM code. The acoustic emission, strain energy release, and the effect of rock heterogeneities on the postpeak behavior process were investigated. The numerical results are in good agreement with experimental data.
The Playground Game: Inquiry‐Based Learning About Research Methods and Statistics

NARCIS (Netherlands)

Westera, Wim; Slootmaker, Aad; Kurvers, Hub

2014-01-01

The Playground Game is a web-based game that was developed for teaching research methods and statistics to nursing and social sciences students in higher education and vocational training. The complexity and abstract nature of research methods and statistics poses many challenges for students. The
Development of a new reconstruction and classification method for Tau leptons and its application in the ATLAS detector at the LHC

International Nuclear Information System (INIS)

Limbach, Christian

2015-05-01

This thesis presents a new method for the reconstruction and classification of hadronically decaying tau leptons in the ATLAS detector at the LHC. It also presents a possible application of the new methods. The new reconstruction method follows the energy flow approach, which aims at reconstructing every single particle in a collision, and applies it to hadronically decaying tau leptons. This provides access to the tau decay mode and also improves the energy and spatial resolution of the tau. The new classification method makes use of so-called kinematic tau variables, which capture the kinematics of the tau decay. By combining several of these variables, it is possible to further improve the decay mode classification of the tau leptons. By taking into account the decay mode, the new classification method is also capable of improving the spatial and energy resolution of reconstructed tau leptons. In a simulation-based study, it is shown that the new reconstruction and classification methods are also capable of measuring the mean tau polarisation in the decays of a Z-Boson into two taus.
Bayes linear statistics, theory & methods

CERN Document Server

Goldstein, Michael

2007-01-01

Bayesian methods combine information available from data with any prior information available from expert knowledge. The Bayes linear approach follows this path, offering a quantitative structure for expressing beliefs, and systematic methods for adjusting these beliefs, given observational data. The methodology differs from the full Bayesian methodology in that it establishes simpler approaches to belief specification and analysis based around expectation judgements. Bayes Linear Statistics presents an authoritative account of this approach, explaining the foundations, theory, methodology, and practicalities of this important field. The text provides a thorough coverage of Bayes linear analysis, from the development of the basic language to the collection of algebraic results needed for efficient implementation, with detailed practical examples. The book covers:The importance of partial prior specifications for complex problems where it is difficult to supply a meaningful full prior probability specification...
Instrumental and statistical methods for the comparison of class evidence

Science.gov (United States)

Liszewski, Elisa Anne

Trace evidence is a major field within forensic science. Association of trace evidence samples can be problematic due to sample heterogeneity and a lack of quantitative criteria for comparing spectra or chromatograms. The aim of this study is to evaluate different types of instrumentation for their ability to discriminate among samples of various types of trace evidence. Chemometric analysis, including techniques such as Agglomerative Hierarchical Clustering, Principal Components Analysis, and Discriminant Analysis, was employed to evaluate instrumental data. First, automotive clear coats were analyzed by using microspectrophotometry to collect UV absorption data. In total, 71 samples were analyzed with classification accuracy of 91.61%. An external validation was performed, resulting in a prediction accuracy of 81.11%. Next, fiber dyes were analyzed using UV-Visible microspectrophotometry. While several physical characteristics of cotton fiber can be identified and compared, fiber color is considered to be an excellent source of variation, and thus was examined in this study. Twelve dyes were employed, some being visually indistinguishable. Several different analyses and comparisons were done, including an inter-laboratory comparison and external validations. Lastly, common plastic samples and other polymers were analyzed using pyrolysis-gas chromatography/mass spectrometry, and their pyrolysis products were then analyzed using multivariate statistics. The classification accuracy varied dependent upon the number of classes chosen, but the plastics were grouped based on composition. The polymers were used as an external validation and misclassifications occurred with chlorinated samples all being placed into the category containing PVC.
Mapping patent classifications: Portfolio and statistical analysis, and the comparison of strengths and weaknesses

NARCIS (Netherlands)

Leydesdorff, L.; Kogler, D.F.; Yan, B.

The Cooperative Patent Classifications (CPC) recently developed cooperatively by the European and US Patent Offices provide a new basis for mapping patents and portfolio analysis. CPC replaces International Patent Classifications (IPC) of the World Intellectual Property Organization. In this study,
Predictive Manufacturing: A Classification Strategy to Predict Product Failures

DEFF Research Database (Denmark)

Khan, Abdul Rauf; Schiøler, Henrik; Kulahci, Murat

2018-01-01

manufacturing analytics model that employs a big data approach to predicting product failures; third, we illustrate the issue of high dimensionality, along with statistically redundant information; and, finally, our proposed method will be compared against the well-known classification methods (SVM, K......-nearest neighbor, artificial neural networks). The results from real data show that our predictive manufacturing analytics approach, using genetic algorithms and Voronoi tessellations, is capable of predicting product failure with reasonable accuracy. The potential application of this method contributes...... to accurately predicting product failures, which would enable manufacturers to reduce production costs without compromising product quality....
All of statistics a concise course in statistical inference

CERN Document Server

Wasserman, Larry

2004-01-01

This book is for people who want to learn probability and statistics quickly It brings together many of the main ideas in modern statistics in one place The book is suitable for students and researchers in statistics, computer science, data mining and machine learning This book covers a much wider range of topics than a typical introductory text on mathematical statistics It includes modern topics like nonparametric curve estimation, bootstrapping and classification, topics that are usually relegated to follow-up courses The reader is assumed to know calculus and a little linear algebra No previous knowledge of probability and statistics is required The text can be used at the advanced undergraduate and graduate level Larry Wasserman is Professor of Statistics at Carnegie Mellon University He is also a member of the Center for Automated Learning and Discovery in the School of Computer Science His research areas include nonparametric inference, asymptotic theory, causality, and applications to astrophysics, bi...

Effects of Feature Extraction and Classification Methods on Cyberbully Detection

OpenAIRE

ÖZEL, Selma Ayşe; SARAÇ, Esra

2016-01-01

Cyberbullying is defined as an aggressive, intentional action against a defenseless person by using the Internet, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended in suicides; hence automatic detection of cyberbullying has become important. In this study we show the effects of feature extraction, feature selection, and classification methods that are used, on the performance of automatic detection of cyberbullying. To perform the exper...
Classification Method in Integrated Information Network Using Vector Image Comparison

Directory of Open Access Journals (Sweden)

Zhou Yuan

2014-05-01

Full Text Available Wireless Integrated Information Network (WMN consists of integrated information that can get data from its surrounding, such as image, voice. To transmit information, large resource is required which decreases the service time of the network. In this paper we present a Classification Approach based on Vector Image Comparison (VIC for WMN that improve the service time of the network. The available methods for sub-region selection and conversion are also proposed.
Effective selection of informative SNPs and classification on the HapMap genotype data

Directory of Open Access Journals (Sweden)

Wang Lipo

2007-12-01

Full Text Available Abstract Background Since the single nucleotide polymorphisms (SNPs are genetic variations which determine the difference between any two unrelated individuals, the SNPs can be used to identify the correct source population of an individual. For efficient population identification with the HapMap genotype data, as few informative SNPs as possible are required from the original 4 million SNPs. Recently, Park et al. (2006 adopted the nearest shrunken centroid method to classify the three populations, i.e., Utah residents with ancestry from Northern and Western Europe (CEU, Yoruba in Ibadan, Nigeria in West Africa (YRI, and Han Chinese in Beijing together with Japanese in Tokyo (CHB+JPT, from which 100,736 SNPs were obtained and the top 82 SNPs could completely classify the three populations. Results In this paper, we propose to first rank each feature (SNP using a ranking measure, i.e., a modified t-test or F-statistics. Then from the ranking list, we form different feature subsets by sequentially choosing different numbers of features (e.g., 1, 2, 3, ..., 100. with top ranking values, train and test them by a classifier, e.g., the support vector machine (SVM, thereby finding one subset which has the highest classification accuracy. Compared to the classification method of Park et al., we obtain a better result, i.e., good classification of the 3 populations using on average 64 SNPs. Conclusion Experimental results show that the both of the modified t-test and F-statistics method are very effective in ranking SNPs about their classification capabilities. Combined with the SVM classifier, a desirable feature subset (with the minimum size and most informativeness can be quickly found in the greedy manner after ranking all SNPs. Our method is able to identify a very small number of important SNPs that can determine the populations of individuals.
Influence of stability classification on atmospheric diffusion calculations for elevated releases over a terrain of major roughness

International Nuclear Information System (INIS)

Hu Erbang

1988-01-01

A series (22) of atmospheric tracer experiments with 100m release height have been performed at the kernforschungszentrum karlsruhe (KfK) of West Germany over a terrain of major roughness (Z 0 = 1.5 m). The concentration data of the tracers are statistically analysed in which 5 methods of stability classification are used. The results show that the normalized diffusion factors predicted by Gaussian plume dispersion model is in good agreement with the observed ones for elevated releases over a terrain of major roughness. Differnent sets of dispersion parameters could be obtained for the same series of atmospheric tracer experiments if different methods of classification are applied. The same method of stability classification should be used for the application of these dispersion parameters to evaluate the environment impact
The statistical process control methods - SPC

Directory of Open Access Journals (Sweden)

Floreková Ľubica

1998-03-01

Full Text Available Methods of statistical evaluation of quality SPC (item 20 of the documentation system of quality control of ISO norm, series 900 of various processes, products and services belong amongst basic qualitative methods that enable us to analyse and compare data pertaining to various quantitative parameters. Also they enable, based on the latter, to propose suitable interventions with the aim of improving these processes, products and services. Theoretical basis and applicatibily of the principles of the: - diagnostics of a cause and effects, - Paret analysis and Lorentz curve, - number distribution and frequency curves of random variable distribution, - Shewhart regulation charts, are presented in the contribution.
Decision tree methods: applications for classification and prediction.

Science.gov (United States)

Song, Yan-Yan; Lu, Ying

2015-04-25

Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
THE BUDGET CLASSIFICATION AS THE BASIS OF THE USAGE OF THE METHOD OF ACCRUAL IN THE PUBLIC SECTOR

Directory of Open Access Journals (Sweden)

Y. Kaliuha

2016-02-01

Full Text Available Components of the budget classification of Ukraine, which are used for accounting and administration of budgets of different levels, have been investigated. The five-level hierarchy of normative documents in accordance with international practice has been proposed. The analysis of the budget classification of Ukraine and of that one, which has been developed in accordance with the IMF GFSM 2001, has been done as the basis for implementation of the method of accrual in the public sector. Proposals on the improvement of the budget classification and the method of accrual have been made.
Nuclear Power Plant Thermocouple Sensor-Fault Detection and Classification Using Deep Learning and Generalized Likelihood Ratio Test

Science.gov (United States)

Mandal, Shyamapada; Santhi, B.; Sridhar, S.; Vinolia, K.; Swaminathan, P.

2017-06-01

In this paper, an online fault detection and classification method is proposed for thermocouples used in nuclear power plants. In the proposed method, the fault data are detected by the classification method, which classifies the fault data from the normal data. Deep belief network (DBN), a technique for deep learning, is applied to classify the fault data. The DBN has a multilayer feature extraction scheme, which is highly sensitive to a small variation of data. Since the classification method is unable to detect the faulty sensor; therefore, a technique is proposed to identify the faulty sensor from the fault data. Finally, the composite statistical hypothesis test, namely generalized likelihood ratio test, is applied to compute the fault pattern of the faulty sensor signal based on the magnitude of the fault. The performance of the proposed method is validated by field data obtained from thermocouple sensors of the fast breeder test reactor.
Neutron Activation Analysis and Moessbauer Correlations of Archaeological Pottery from Amazon Basin for Classification Studies

International Nuclear Information System (INIS)

Bellido, A. V. B.; Latini, R. M.; Nicoli, I.; Scorzelli, R. B.; Solorzano, P. M.

2011-01-01

The aim of the present work was to investigate the correlation between data obtained by means of two analytical methods, instrumental neutron activation analysis (INAA) and Moessbauer Spectroscopy of pottery samples combined with multivariate statistical analysis in order to optimize quantitative analysis in the classification studies. Ceramics recently discovered in archaeological earth circular structures sites in Acre state Brazil. 199 samples were analyzed by INAA, allowing simultaneous determination of twenty elements chemical concentrations, and 44 samples by using Moessbauer Spectroscopy, allowing the determination of fourteen hyperfine parameters. For the correlation study, data were treated by two multivariate statistical methods: cluster analysis for the classification and the principal component analysis for the data correlations. INAA data show that some of REE (rare earth elements) were the discriminating variables for this technique. Mossbauer parameters that exhibit the same behavior are being investigated, remarkable improve can be seem for the combined REE and the Mossbauer variables showing a good results considering the limited number of samples. This data matrix is being used for the understanding in the studies of classification and provenance of ceramics prehistory of the Amazonic basin.
Landslide Susceptibility Statistical Methods: A Critical and Systematic Literature Review

Science.gov (United States)

Mihir, Monika; Malamud, Bruce; Rossi, Mauro; Reichenbach, Paola; Ardizzone, Francesca

2014-05-01

Landslide susceptibility assessment, the subject of this systematic review, is aimed at understanding the spatial probability of slope failures under a set of geomorphological and environmental conditions. It is estimated that about 375 landslides that occur globally each year are fatal, with around 4600 people killed per year. Past studies have brought out the increasing cost of landslide damages which primarily can be attributed to human occupation and increased human activities in the vulnerable environments. Many scientists, to evaluate and reduce landslide risk, have made an effort to efficiently map landslide susceptibility using different statistical methods. In this paper, we do a critical and systematic landslide susceptibility literature review, in terms of the different statistical methods used. For each of a broad set of studies reviewed we note: (i) study geography region and areal extent, (ii) landslide types, (iii) inventory type and temporal period covered, (iv) mapping technique (v) thematic variables used (vi) statistical models, (vii) assessment of model skill, (viii) uncertainty assessment methods, (ix) validation methods. We then pulled out broad trends within our review of landslide susceptibility, particularly regarding the statistical methods. We found that the most common statistical methods used in the study of landslide susceptibility include logistic regression, artificial neural network, discriminant analysis and weight of evidence. Although most of the studies we reviewed assessed the model skill, very few assessed model uncertainty. In terms of geographic extent, the largest number of landslide susceptibility zonations were in Turkey, Korea, Spain, Italy and Malaysia. However, there are also many landslides and fatalities in other localities, particularly India, China, Philippines, Nepal and Indonesia, Guatemala, and Pakistan, where there are much fewer landslide susceptibility studies available in the peer-review literature. This
Which method of posttraumatic stress disorder classification best predicts psychosocial function in children with traumatic brain injury?

Science.gov (United States)

Iselin, Greg; Le Brocque, Robyne; Kenardy, Justin; Anderson, Vicki; McKinlay, Lynne

2010-10-01

Controversy surrounds the classification of posttraumatic stress disorder (PTSD), particularly in children and adolescents with traumatic brain injury (TBI). In these populations, it is difficult to differentiate TBI-related organic memory loss from dissociative amnesia. Several alternative PTSD classification algorithms have been proposed for use with children. This paper investigates DSM-IV-TR and alternative PTSD classification algorithms, including and excluding the dissociative amnesia item, in terms of their ability to predict psychosocial function following pediatric TBI. A sample of 184 children aged 6-14 years were recruited following emergency department presentation and/or hospital admission for TBI. PTSD was assessed via semi-structured clinical interview (CAPS-CA) with the child at 3 months post-injury. Psychosocial function was assessed using the parent report CHQ-PF50. Two alternative classification algorithms, the PTSD-AA and 2 of 3 algorithms, reached statistical significance. While the inclusion of the dissociative amnesia item increased prevalence rates across algorithms, it generally resulted in weaker associations with psychosocial function. The PTSD-AA algorithm appears to have the strongest association with psychosocial function following TBI in children and adolescents. Removing the dissociative amnesia item from the diagnostic algorithm generally results in improved validity. Copyright 2010 Elsevier Ltd. All rights reserved.
Statistical methods with applications to demography and life insurance

CERN Document Server

Khmaladze, Estáte V

2013-01-01

Suitable for statisticians, mathematicians, actuaries, and students interested in the problems of insurance and analysis of lifetimes, Statistical Methods with Applications to Demography and Life Insurance presents contemporary statistical techniques for analyzing life distributions and life insurance problems. It not only contains traditional material but also incorporates new problems and techniques not discussed in existing actuarial literature. The book mainly focuses on the analysis of an individual life and describes statistical methods based on empirical and related processes. Coverage ranges from analyzing the tails of distributions of lifetimes to modeling population dynamics with migrations. To help readers understand the technical points, the text covers topics such as the Stieltjes, Wiener, and Itô integrals. It also introduces other themes of interest in demography, including mixtures of distributions, analysis of longevity and extreme value theory, and the age structure of a population. In addi...
Robust Automatic Modulation Classification Technique for Fading Channels via Deep Neural Network

Directory of Open Access Journals (Sweden)

Jung Hwan Lee

2017-08-01

Full Text Available In this paper, we propose a deep neural network (DNN-based automatic modulation classification (AMC for digital communications. While conventional AMC techniques perform well for additive white Gaussian noise (AWGN channels, classification accuracy degrades for fading channels where the amplitude and phase of channel gain change in time. The key contributions of this paper are in two phases. First, we analyze the effectiveness of a variety of statistical features for AMC task in fading channels. We reveal that the features that are shown to be effective for fading channels are different from those known to be good for AWGN channels. Second, we introduce a new enhanced AMC technique based on DNN method. We use the extensive and diverse set of statistical features found in our study for the DNN-based classifier. The fully connected feedforward network with four hidden layers are trained to classify the modulation class for several fading scenarios. Numerical evaluation shows that the proposed technique offers significant performance gain over the existing AMC methods in fading channels.
Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification

Directory of Open Access Journals (Sweden)

Yin Wang

2016-01-01

Full Text Available Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.
Reproducibility of the Lauge-Hansen, Danis-Weber, and AO classifications for ankle fractures

Directory of Open Access Journals (Sweden)

Lucas Lopes da Fonseca

Full Text Available ABSTRACT Objective: This study evaluated the reproducibility of the three main classifications of ankle fractures most commonly used in emergency clinical practice: Lauge-Hansen, Danis-Weber, and AO-OTA. The secondary objective was to assess whether the level of professional experience influenced the interobserver agreement for the classification of this pathology. Methods: The study included 83 digitized preoperative radiographic images of ankle fractures, in anteroposterior and lateral views, of different adults that had occurred between January and December 2013. For sample calculation, the estimated accuracy was approximately 15%, with a sampling error of 5% and a sampling power of 80%. The images were analyzed and classified by six different observers: two foot and ankle surgeons, two general orthopedic surgeons, and two-second-year residents in orthopedics and traumatology. The Kappa statistical method of multiple variances was used to assess the variations. Results: The Danis-Weber classification indicated that 40% of the agreements among all observers were good or excellent, whereas only 20% of good and excellent agreements were obtained using the AO and Lauge Hansen classifications. The Kappa index was 0.49 for the Danis-Weber classification, 0.32 for Lauge Hansen, and 0.38 for AO. Conclusion: The Hansen-Lauge classification presented the poorest interobserver agreement among the three systems. The AO classification demonstrated a moderate agreement and the Danis-Weber classification presented an excellent interobserver agreement index, regardless of professional experience.
Object-based methods for individual tree identification and tree species classification from high-spatial resolution imagery

Science.gov (United States)

Wang, Le

2003-10-01

Modern forest management poses an increasing need for detailed knowledge of forest information at different spatial scales. At the forest level, the information for tree species assemblage is desired whereas at or below the stand level, individual tree related information is preferred. Remote Sensing provides an effective tool to extract the above information at multiple spatial scales in the continuous time domain. To date, the increasing volume and readily availability of high-spatial-resolution data have lead to a much wider application of remotely sensed products. Nevertheless, to make effective use of the improving spatial resolution, conventional pixel-based classification methods are far from satisfactory. Correspondingly, developing object-based methods becomes a central challenge for researchers in the field of Remote Sensing. This thesis focuses on the development of methods for accurate individual tree identification and tree species classification. We develop a method in which individual tree crown boundaries and treetop locations are derived under a unified framework. We apply a two-stage approach with edge detection followed by marker-controlled watershed segmentation. Treetops are modeled from radiometry and geometry aspects. Specifically, treetops are assumed to be represented by local radiation maxima and to be located near the center of the tree-crown. As a result, a marker image was created from the derived treetop to guide a watershed segmentation to further differentiate overlapping trees and to produce a segmented image comprised of individual tree crowns. The image segmentation method developed achieves a promising result for a 256 x 256 CASI image. Then further effort is made to extend our methods to the multiscales which are constructed from a wavelet decomposition. A scale consistency and geometric consistency are designed to examine the gradients along the scale-space for the purpose of separating true crown boundary from unwanted
The Methods of Stress Management and Their Classification

Directory of Open Access Journals (Sweden)

Honchar Mykhailo F.

2017-12-01

Full Text Available The article considers the content and classification of methods of stress management, which provides systematization of their varieties by the number of existing (character, time interval of application, direction of impact, period of action, way of account the interests of employees, level of formation, method of substantiation, content and the allocated new attributes (scale of changes in terms of stress management systems, level of novelty at enterprise, consistency, which allows to choose the appropriate types of such methods in overcoming undesirable deviations that have a significant negative impact on the functioning of economic entities. It has been determined that such methods are formed in the implementing of technology of stress-management; are the result of management activities of the steering subsystem of organization at each level of management; have alternative nature; form an information-management base for the adoption of managerial decisions in terms of the systems of stress administration. It has been specified that, with the assistance of certain methods in terms of stress management systems, managers can track existing and potential problems in the complex and dynamic environment of the organization, identify their relationships, identify «weak signals», adjust goals and tasks of management of critical undesirable deviations, determine indicators and criteria of stress-management, etc.
Integrated statistical learning of metabolic ion mobility spectrometry profiles for pulmonary disease identification

DEFF Research Database (Denmark)

Hauschild, A.C.; Baumbach, Jan; Baumbach, J.

2012-01-01

sophisticated statistical learning techniques for VOC-based feature selection and supervised classification into patient groups. We analyzed breath data from 84 volunteers, each of them either suffering from chronic obstructive pulmonary disease (COPD), or both COPD and bronchial carcinoma (COPD + BC), as well...... as from 35 healthy volunteers, comprising a control group (CG). We standardized and integrated several statistical learning methods to provide a broad overview of their potential for distinguishing the patient groups. We found that there is strong potential for separating MCC/IMS chromatograms of healthy...... patients from healthy controls. We conclude that these statistical learning methods have a generally high accuracy when applied to well-structured, medical MCC/IMS data....
Effective Packet Number for 5G IM WeChat Application at Early Stage Traffic Classification

Directory of Open Access Journals (Sweden)

Muhammad Shafiq

2017-01-01

Full Text Available Accurate network traffic classification at early stage is very important for 5G network applications. During the last few years, researchers endeavored hard to propose effective machine learning model for classification of Internet traffic applications at early stage with few packets. Nevertheless, this essential problem still needs to be studied profoundly to find out effective packet number as well as effective machine learning (ML model. In this paper, we tried to solve the above-mentioned problem. For this purpose, five Internet traffic datasets are utilized. Initially, we extract packet size of 20 packets and then mutual information analysis is carried out to find out the mutual information of each packet on n flow type. Thereafter, we execute 10 well-known machine learning algorithms using crossover classification method. Two statistical analysis tests, Friedman and Wilcoxon pairwise tests, are applied for the experimental results. Moreover, we also apply the statistical tests for classifiers to find out effective ML classifier. Our experimental results show that 13–19 packets are the effective packet numbers for 5G IM WeChat application at early stage network traffic classification. We also find out effective ML classifier, where Random Forest ML classifier is effective classifier at early stage Internet traffic classification.
CIN classification and prediction using machine learning methods

Science.gov (United States)

Chirkina, Anastasia; Medvedeva, Marina; Komotskiy, Evgeny

2017-06-01

The aim of this paper is a comparison of the existing classification algorithms with different parameters, and selection those ones, which allows solving the problem of primary diagnosis of cervical intraepithelial neoplasia (CIN), as it characterizes the condition of the body in the precancerous stage. The paper describes a feature selection process, as well as selection of the best models for a multiclass classification.

Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

Directory of Open Access Journals (Sweden)

Mustafa Serter Uzer

2013-01-01

Full Text Available This paper offers a hybrid approach that uses the artificial bee colony (ABC algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.
Establishing structure-property correlations and classification of base oils using statistical techniques and artificial neural networks

International Nuclear Information System (INIS)

Kapur, G.S.; Sastry, M.I.S.; Jaiswal, A.K.; Sarpal, A.S.

2004-01-01

The present paper describes various classification techniques like cluster analysis, principal component (PC)/factor analysis to classify different types of base stocks. The API classification of base oils (Group I-III) has been compared to a more detailed NMR derived chemical compositional and molecular structural parameters based classification in order to point out the similarities of the base oils in the same group and the differences between the oils placed in different groups. The detailed compositional parameters have been generated using 1 H and 13 C nuclear magnetic resonance (NMR) spectroscopic methods. Further, oxidation stability, measured in terms of rotating bomb oxidation test (RBOT) life, of non-conventional base stocks and their blends with conventional base stocks, has been quantitatively correlated with their 1 H NMR and elemental (sulphur and nitrogen) data with the help of multiple linear regression (MLR) and artificial neural networks (ANN) techniques. The MLR based model developed using NMR and elemental data showed a high correlation between the 'measured' and 'estimated' RBOT values for both training (R=0.859) and validation (R=0.880) data sets. The ANN based model, developed using fewer number of input variables (only 1 H NMR data) also showed high correlation between the 'measured' and 'estimated' RBOT values for training (R=0.881), validation (R=0.860) and test (R=0.955) data sets
Three-dimensional textural features of conventional MRI improve diagnostic classification of childhood brain tumours.

Science.gov (United States)

Fetit, Ahmed E; Novak, Jan; Peet, Andrew C; Arvanitits, Theodoros N

2015-09-01

The aim of this study was to assess the efficacy of three-dimensional texture analysis (3D TA) of conventional MR images for the classification of childhood brain tumours in a quantitative manner. The dataset comprised pre-contrast T1 - and T2-weighted MRI series obtained from 48 children diagnosed with brain tumours (medulloblastoma, pilocytic astrocytoma and ependymoma). 3D and 2D TA were carried out on the images using first-, second- and higher order statistical methods. Six supervised classification algorithms were trained with the most influential 3D and 2D textural features, and their performances in the classification of tumour types, using the two feature sets, were compared. Model validation was carried out using the leave-one-out cross-validation (LOOCV) approach, as well as stratified 10-fold cross-validation, in order to provide additional reassurance. McNemar's test was used to test the statistical significance of any improvements demonstrated by 3D-trained classifiers. Supervised learning models trained with 3D textural features showed improved classification performances to those trained with conventional 2D features. For instance, a neural network classifier showed 12% improvement in area under the receiver operator characteristics curve (AUC) and 19% in overall classification accuracy. These improvements were statistically significant for four of the tested classifiers, as per McNemar's tests. This study shows that 3D textural features extracted from conventional T1 - and T2-weighted images can improve the diagnostic classification of childhood brain tumours. Long-term benefits of accurate, yet non-invasive, diagnostic aids include a reduction in surgical procedures, improvement in surgical and therapy planning, and support of discussions with patients' families. It remains necessary, however, to extend the analysis to a multicentre cohort in order to assess the scalability of the techniques used. Copyright © 2015 John Wiley & Sons, Ltd.
Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology.

Science.gov (United States)

Sharma, Harshita; Zerbe, Norman; Klempert, Iris; Hellwich, Olaf; Hufnagl, Peter

2017-11-01

Deep learning using convolutional neural networks is an actively emerging field in histological image analysis. This study explores deep learning methods for computer-aided classification in H&E stained histopathological whole slide images of gastric carcinoma. An introductory convolutional neural network architecture is proposed for two computerized applications, namely, cancer classification based on immunohistochemical response and necrosis detection based on the existence of tumor necrosis in the tissue. Classification performance of the developed deep learning approach is quantitatively compared with traditional image analysis methods in digital histopathology requiring prior computation of handcrafted features, such as statistical measures using gray level co-occurrence matrix, Gabor filter-bank responses, LBP histograms, gray histograms, HSV histograms and RGB histograms, followed by random forest machine learning. Additionally, the widely known AlexNet deep convolutional framework is comparatively analyzed for the corresponding classification problems. The proposed convolutional neural network architecture reports favorable results, with an overall classification accuracy of 0.6990 for cancer classification and 0.8144 for necrosis detection. Copyright © 2017 Elsevier Ltd. All rights reserved.
The reliability of AO classification for distal radius fracture, using CT findings

International Nuclear Information System (INIS)

Nakanishi, Yasuaki; Ono, Hiroshi; Furuta, Kazuhiko; Fujitani, Ryoutarou; Ota, Hiroyoshi

2006-01-01

The purpose of this study was to assess the reliability of the AO (Association for the Study of Internal Fixation) classification of distal radius fracture, using plain radiographs and 2 cross-sectional computed tomographic (CT) surface images. Five observers independently classified 32 distal radius fractures into 9 groups under AO classification. We established 4 methods for observation. First, using only two-directional radiographs; second, four-directional radiographs; third, CT (axial view) with four-directional radiographs; and fourth, CT (axial and sagittal views) with four-directional radiographs. Kappa statistics were used to establish the relative level of agreement between the observers. Interobserver reliability was poor in both first and second methods in which only plain radiographs were used (κ=0.30 and 0.23, respectively). Furthermore, reliability did not increase in the third method with the addition of 1 CT surface image (κ=0.29). In the fourth method, with the addition of 2 cross-sectional CT surface images, the reliability increased to a moderate level (κ=0.44). Interobserver reliability of the AO system of the classification of distal radius fractures was observed on using 2 cross-sectional CT surface images with four-directional radiographs. (author)
Improving the Classification Accuracy for Near-Infrared Spectroscopy of Chinese Salvia miltiorrhiza Using Local Variable Selection

Directory of Open Access Journals (Sweden)

Lianqing Zhu

2018-01-01

Full Text Available In order to improve the classification accuracy of Chinese Salvia miltiorrhiza using near-infrared spectroscopy, a novel local variable selection strategy is thus proposed. Combining the strengths of the local algorithm and interval partial least squares, the spectra data have firstly been divided into several pairs of classes in sample direction and equidistant subintervals in variable direction. Then, a local classification model has been built, and the most proper spectral region has been selected based on the new evaluation criterion considering both classification error rate and best predictive ability under the leave-one-out cross validation scheme for each pair of classes. Finally, each observation can be assigned to belong to the class according to the statistical analysis of classification results of the local classification model built on selected variables. The performance of the proposed method was demonstrated through near-infrared spectra of cultivated or wild Salvia miltiorrhiza, which are collected from 8 geographical origins in 5 provinces of China. For comparison, soft independent modelling of class analogy and partial least squares discriminant analysis methods are, respectively, employed as the classification model. Experimental results showed that classification performance of the classification model with local variable selection was obvious better than that without variable selection.
Identification of mine waters by statistical multivariate methods

Energy Technology Data Exchange (ETDEWEB)

Mali, N [IGGG, Ljubljana (Slovenia)

1992-01-01

Three water-bearing aquifers are present in the Velenje lignite mine. The aquifer waters have differing chemical composition; a geochemical water analysis can therefore determine the source of mine water influx. Mine water samples from different locations in the mine were analyzed, the results of chemical content and of electric conductivity of mine water were statistically processed by means of MICROGAS, SPSS-X and IN STATPAC computer programs, which apply three multivariate statistical methods (discriminate, cluster and factor analysis). Reliability of calculated values was determined with the Kolmogorov and Smirnov tests. It is concluded that laboratory analysis of single water samples can produce measurement errors, but statistical processing of water sample data can identify origin and movement of mine water. 15 refs.
Classification of analysis methods for characterization of magnetic nanoparticle properties

DEFF Research Database (Denmark)

Posth, O.; Hansen, Mikkel Fougt; Steinhoff, U.

2015-01-01

The aim of this paper is to provide a roadmap for the standardization of magnetic nanoparticle (MNP) characterization. We have assessed common MNP analysis techniques under various criteria in order to define the methods that can be used as either standard techniques for magnetic particle...... characterization or those that can be used to obtain a comprehensive picture of a MNP system. This classification is the first step on the way to develop standards for nanoparticle characterization....
Heuristic approach to the classification of postpartum endometritis and its forms

Directory of Open Access Journals (Sweden)

E. A. Balashova

2017-01-01

Full Text Available Тhe work is dedicated to the development of a method of automated medical diagnosis based on the description of biomedical systems using two parameters: energy, reflecting the interaction of its elements, and entropy characterizing the organization of the system. The violations of the energy-entropy cycle of biomedical systems is reflected in the symptoms of the disease. Statistical link between the symptoms of the condition of the body and the nature of excitation of its elements best expressed in the heuristic description of the system state. High accuracy classification of the patient's condition is achieved by using heuristic detection methods. In the proposed approach, allowing to estimate the probability of correct diagnosis increases the accuracy of the classification, and the estimated minimum amount of training samples and the capacity of its constituent signs. Classification technique consists in averaging the characteristic values in the selected classes, the preparation of the complex of symptoms of the most important signs of the disease, to conduct a "rough" diagnostic threshold rules that allow to distinguish severe forms of the disease, then differential diagnosis the severity of the disease. The proposed method was tested for classification of the forms of puerperal endometritis (mild, moderate, severe. The training sample contained 70 case histories. Syndrome to classify the patient's condition was composed of 17 characteristics. Threshold diagnosis has allowed to establish the presence of disease and to separate heavy. Differential diagnosis was used for classification of mild and moderate severity of postpartum endometritis. The accuracy of the classification of forms of postpartum endometritis amounted to 97.1%.
Analysis and Evaluation of IKONOS Image Fusion Algorithm Based on Land Cover Classification

Institute of Scientific and Technical Information of China (English)

Xia; JING; Yan; BAO

2015-01-01

Different fusion algorithm has its own advantages and limitations,so it is very difficult to simply evaluate the good points and bad points of the fusion algorithm. Whether an algorithm was selected to fuse object images was also depended upon the sensor types and special research purposes. Firstly,five fusion methods,i. e. IHS,Brovey,PCA,SFIM and Gram-Schmidt,were briefly described in the paper. And then visual judgment and quantitative statistical parameters were used to assess the five algorithms. Finally,in order to determine which one is the best suitable fusion method for land cover classification of IKONOS image,the maximum likelihood classification( MLC) was applied using the above five fusion images. The results showed that the fusion effect of SFIM transform and Gram-Schmidt transform were better than the other three image fusion methods in spatial details improvement and spectral information fidelity,and Gram-Schmidt technique was superior to SFIM transform in the aspect of expressing image details. The classification accuracy of the fused image using Gram-Schmidt and SFIM algorithms was higher than that of the other three image fusion methods,and the overall accuracy was greater than 98%. The IHS-fused image classification accuracy was the lowest,the overall accuracy and kappa coefficient were 83. 14% and 0. 76,respectively. Thus the IKONOS fusion images obtained by the Gram-Schmidt and SFIM were better for improving the land cover classification accuracy.
A Novel Texture Classification Procedure by using Association Rules

Directory of Open Access Journals (Sweden)

L. Jaba Sheela

2008-11-01

Full Text Available Texture can be defined as a local statistical pattern of texture primitives in observer’s domain of interest. Texture classification aims to assign texture labels to unknown textures, according to training samples and classification rules. Association rules have been used in various applications during the past decades. Association rules capture both structural and statistical information, and automatically identify the structures that occur most frequently and relationships that have significant discriminative power. So, association rules can be adapted to capture frequently occurring local structures in textures. This paper describes the usage of association rules for texture classification problem. The performed experimental studies show the effectiveness of the association rules. The overall success rate is about 98%.
Students' Attitudes toward Statistics across the Disciplines: A Mixed-Methods Approach

Science.gov (United States)

Griffith, James D.; Adams, Lea T.; Gu, Lucy L.; Hart, Christian L.; Nichols-Whitehead, Penney

2012-01-01

Students' attitudes toward statistics were investigated using a mixed-methods approach including a discovery-oriented qualitative methodology among 684 undergraduate students across business, criminal justice, and psychology majors where at least one course in statistics was required. Students were asked about their attitudes toward statistics and…
Automated classification of mouse pup isolation syllables: from cluster analysis to an Excel based ‘mouse pup syllable classification calculator’

Directory of Open Access Journals (Sweden)

Jasmine eGrimsley

2013-01-01

Full Text Available Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified ten syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.
Machine learning in infrared object classification - an all-sky selection of YSO candidates

Science.gov (United States)

Marton, Gabor; Zahorecz, Sarolta; Toth, L. Viktor; Magnus McGehee, Peregrine; Kun, Maria

2015-08-01

Object classification is a fundamental and challenging problem in the era of big data. I will discuss up-to-date methods and their application to classify infrared point sources.We analysed the ALLWISE catalogue, the most recent public source catalogue of the Wide-field Infrared Survey Explorer (WISE) to compile a reliable list of Young Stellar Object (YSO) candidates. We tested and compared classical and up-to-date statistical methods as well, to discriminate source types like extragalactic objects, evolved stars, main sequence stars, objects related to the interstellar medium and YSO candidates by using their mid-IR WISE properties and associated near-IR 2MASS data.In the particular classification problem the Support Vector Machines (SVM), a class of supervised learning algorithm turned out to be the best tool. As a result we classify Class I and II YSOs with >90% accuracy while the fraction of contaminating extragalactic objects remains well below 1%, based on the number of known objects listed in the SIMBAD and VizieR databases. We compare our results to other classification schemes from the literature and show that the SVM outperforms methods that apply linear cuts on the colour-colour and colour-magnitude space. Our homogenous YSO candidate catalog can serve as an excellent pathfinder for future detailed observations of individual objects and a starting point of statistical studies that aim to add pieces to the big picture of star formation theory.
A computer method for spectral classification

International Nuclear Information System (INIS)

Appenzeller, I.; Zekl, H.

1978-01-01

The authors describe the start of an attempt to improve the accuracy of spectroscopic parallaxes by evaluating spectroscopic temperature and luminosity criteria such as those of the MK classification spectrograms which were analyzed automatically by means of a suitable computer program. (Auth.)
Neutral face classification using personalized appearance models for fast and robust emotion detection.

Science.gov (United States)

Chiranjeevi, Pojala; Gopalakrishnan, Viswanath; Moogi, Pratibha

2015-09-01

Facial expression recognition is one of the open problems in computer vision. Robust neutral face recognition in real time is a major challenge for various supervised learning-based facial expression recognition methods. This is due to the fact that supervised methods cannot accommodate all appearance variability across the faces with respect to race, pose, lighting, facial biases, and so on, in the limited amount of training data. Moreover, processing each and every frame to classify emotions is not required, as user stays neutral for majority of the time in usual applications like video chat or photo album/web browsing. Detecting neutral state at an early stage, thereby bypassing those frames from emotion classification would save the computational power. In this paper, we propose a light-weight neutral versus emotion classification engine, which acts as a pre-processer to the traditional supervised emotion classification approaches. It dynamically learns neutral appearance at key emotion (KE) points using a statistical texture model, constructed by a set of reference neutral frames for each user. The proposed method is made robust to various types of user head motions by accounting for affine distortions based on a statistical texture model. Robustness to dynamic shift of KE points is achieved by evaluating the similarities on a subset of neighborhood patches around each KE point using the prior information regarding the directionality of specific facial action units acting on the respective KE point. The proposed method, as a result, improves emotion recognition (ER) accuracy and simultaneously reduces computational complexity of the ER system, as validated on multiple databases.
Identifying User Profiles from Statistical Grouping Methods

Directory of Open Access Journals (Sweden)

Francisco Kelsen de Oliveira

2018-02-01

Full Text Available This research aimed to group users into subgroups according to their levels of knowledge about technology. Statistical hierarchical and non-hierarchical clustering methods were studied, compared and used in the creations of the subgroups from the similarities of the skill levels with these users’ technology. The research sample consisted of teachers who answered online questionnaires about their skills with the use of software and hardware with educational bias. The statistical methods of grouping were performed and showed the possibilities of groupings of the users. The analyses of these groups allowed to identify the common characteristics among the individuals of each subgroup. Therefore, it was possible to define two subgroups of users, one with skill in technology and another with skill with technology, so that the partial results of the research showed two main algorithms for grouping with 92% similarity in the formation of groups of users with skill with technology and the other with little skill, confirming the accuracy of the techniques of discrimination against individuals.
Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

Science.gov (United States)

Pao, Tsang-Long; Chen, Yu-Te; Yeh, Jun-Heng

It is said that technology comes out from humanity. What is humanity? The very definition of humanity is emotion. Emotion is the basis for all human expression and the underlying theme behind everything that is done, said, thought or imagined. Making computers being able to perceive and respond to human emotion, the human-computer interaction will be more natural. Several classifiers are adopted for automatically assigning an emotion category, such as anger, happiness or sadness, to a speech utterance. These classifiers were designed independently and tested on various emotional speech corpora, making it difficult to compare and evaluate their performance. In this paper, we first compared several popular classification methods and evaluated their performance by applying them to a Mandarin speech corpus consisting of five basic emotions, including anger, happiness, boredom, sadness and neutral. The extracted feature streams contain MFCC, LPCC, and LPC. The experimental results show that the proposed WD-MKNN classifier achieves an accuracy of 81.4% for the 5-class emotion recognition and outperforms other classification techniques, including KNN, MKNN, DW-KNN, LDA, QDA, GMM, HMM, SVM, and BPNN. Then, to verify the advantage of the proposed method, we compared these classifiers by applying them to another Mandarin expressive speech corpus consisting of two emotions. The experimental results still show that the proposed WD-MKNN outperforms others.
Genome profiling (GP method based classification of insects: congruence with that of classical phenotype-based one.

Directory of Open Access Journals (Sweden)

Shamim Ahmed

Full Text Available Ribosomal RNAs have been widely used for identification and classification of species, and have produced data giving new insights into phylogenetic relationships. Recently, multilocus genotyping and even whole genome sequencing-based technologies have been adopted in ambitious comparative biology studies. However, such technologies are still far from routine-use in species classification studies due to their high costs in terms of labor, equipment and consumables.Here, we describe a simple and powerful approach for species classification called genome profiling (GP. The GP method composed of random PCR, temperature gradient gel electrophoresis (TGGE and computer-aided gel image processing is highly informative and less laborious. For demonstration, we classified 26 species of insects using GP and 18S rDNA-sequencing approaches. The GP method was found to give a better correspondence to the classical phenotype-based approach than did 18S rDNA sequencing employing a congruence value. To our surprise, use of a single probe in GP was sufficient to identify the relationships between the insect species, making this approach more straightforward.The data gathered here, together with those of previous studies show that GP is a simple and powerful method that can be applied for actually universally identifying and classifying species. The current success supported our previous proposal that GP-based web database can be constructible and effective for the global identification/classification of species.
A Method of Particle Swarm Optimized SVM Hyper-spectral Remote Sensing Image Classification

International Nuclear Information System (INIS)

Liu, Q J; Jing, L H; Wang, L M; Lin, Q Z

2014-01-01

Support Vector Machine (SVM) has been proved to be suitable for classification of remote sensing image and proposed to overcome the Hughes phenomenon. Hyper-spectral sensors are intrinsically designed to discriminate among a broad range of land cover classes which may lead to high computational time in SVM mutil-class algorithms. Model selection for SVM involving kernel and the margin parameter values selection which is usually time-consuming, impacts training efficiency of SVM model and final classification accuracies of SVM hyper-spectral remote sensing image classifier greatly. Firstly, based on combinatorial optimization theory and cross-validation method, particle swarm algorithm is introduced to the optimal selection of SVM (PSSVM) kernel parameter σ and margin parameter C to improve the modelling efficiency of SVM model. Then an experiment of classifying AVIRIS in India Pine site of USA was performed for evaluating the novel PSSVM, as well as traditional SVM classifier with general Grid-Search cross-validation method (GSSVM). And then, evaluation indexes including SVM model training time, classification Overall Accuracy (OA) and Kappa index of both PSSVM and GSSVM are all analyzed quantitatively. It is demonstrated that OA of PSSVM on test samples and whole image are 85% and 82%, the differences with that of GSSVM are both within 0.08% respectively. And Kappa indexes reach 0.82 and 0.77, the differences with that of GSSVM are both within 0.001. While the modelling time of PSSVM can be only 1/10 of that of GSSVM, and the modelling. Therefore, PSSVM is an fast and accurate algorithm for hyper-spectral image classification and is superior to GSSVM

The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm

OpenAIRE

Wu, Jianning; Wu, Bin

2015-01-01

The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of...
Vortex methods and vortex statistics

International Nuclear Information System (INIS)

Chorin, A.J.

1993-05-01

Vortex methods originated from the observation that in incompressible, inviscid, isentropic flow vorticity (or, more accurately, circulation) is a conserved quantity, as can be readily deduced from the absence of tangential stresses. Thus if the vorticity is known at time t = 0, one can deduce the flow at a later time by simply following it around. In this narrow context, a vortex method is a numerical method that makes use of this observation. Even more generally, the analysis of vortex methods leads, to problems that are closely related to problems in quantum physics and field theory, as well as in harmonic analysis. A broad enough definition of vortex methods ends up by encompassing much of science. Even the purely computational aspects of vortex methods encompass a range of ideas for which vorticity may not be the best unifying theme. The author restricts himself in these lectures to a special class of numerical vortex methods, those that are based on a Lagrangian transport of vorticity in hydrodynamics by smoothed particles (''blobs'') and those whose understanding contributes to the understanding of blob methods. Vortex methods for inviscid flow lead to systems of ordinary differential equations that can be readily clothed in Hamiltonian form, both in three and two space dimensions, and they can preserve exactly a number of invariants of the Euler equations, including topological invariants. Their viscous versions resemble Langevin equations. As a result, they provide a very useful cartoon of statistical hydrodynamics, i.e., of turbulence, one that can to some extent be analyzed analytically and more importantly, explored numerically, with important implications also for superfluids, superconductors, and even polymers. In the authors view, vortex ''blob'' methods provide the most promising path to the understanding of these phenomena
Mathematical and Statistical Methods for Actuarial Sciences and Finance

CERN Document Server

Legros, Florence; Perna, Cira; Sibillo, Marilena

2017-01-01

This volume gathers selected peer-reviewed papers presented at the international conference "MAF 2016 – Mathematical and Statistical Methods for Actuarial Sciences and Finance”, held in Paris (France) at the Université Paris-Dauphine from March 30 to April 1, 2016. The contributions highlight new ideas on mathematical and statistical methods in actuarial sciences and finance. The cooperation between mathematicians and statisticians working in insurance and finance is a very fruitful field, one that yields unique theoretical models and practical applications, as well as new insights in the discussion of problems of national and international interest. This volume is addressed to academicians, researchers, Ph.D. students and professionals.
Multi-class Mode of Action Classification of Toxic Compounds Using Logic Based Kernel Methods.

Science.gov (United States)

Lodhi, Huma; Muggleton, Stephen; Sternberg, Mike J E

2010-09-17

Toxicity prediction is essential for drug design and development of effective therapeutics. In this paper we present an in silico strategy, to identify the mode of action of toxic compounds, that is based on the use of a novel logic based kernel method. The technique uses support vector machines in conjunction with the kernels constructed from first order rules induced by an Inductive Logic Programming system. It constructs multi-class models by using a divide and conquer reduction strategy that splits multi-classes into binary groups and solves each individual problem recursively hence generating an underlying decision list structure. In order to evaluate the effectiveness of the approach for chemoinformatics problems like predictive toxicology, we apply it to toxicity classification in aquatic systems. The method is used to identify and classify 442 compounds with respect to the mode of action. The experimental results show that the technique successfully classifies toxic compounds and can be useful in assessing environmental risks. Experimental comparison of the performance of the proposed multi-class scheme with the standard multi-class Inductive Logic Programming algorithm and multi-class Support Vector Machine yields statistically significant results and demonstrates the potential power and benefits of the approach in identifying compounds of various toxic mechanisms. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
METHODS OF ANALYSIS AND CLASSIFICATION OF THE COMPONENTS OF GRAIN MIXTURES BASED ON MEASURING THE REFLECTION AND TRANSMISSION SPECTRA

Directory of Open Access Journals (Sweden)

Artem O. Donskikh*

2017-10-01

Full Text Available The paper considers methods of classification of grain mixture components based on spectral analysis in visible and near-infrared wavelength ranges using various measurement approaches - reflection, transmission and combined spectrum methods. It also describes the experimental measuring units used and suggests the prototype of a multispectral grain mixture analyzer. The results of the spectral measurement were processed using neural network based classification algorithms. The probabilities of incorrect recognition for various numbers of spectral parts and combinations of spectral methods were estimated. The paper demonstrates that combined usage of two spectral analysis methods leads to higher classification accuracy and allows for reducing the number of the analyzed spectral parts. A detailed description of the proposed measurement device for high-performance real-time multispectral analysis of the components of grain mixtures is given.
FACET CLASSIFICATIONS OF E-LEARNING TOOLS

Directory of Open Access Journals (Sweden)

Olena Yu. Balalaieva

2013-12-01

Full Text Available The article deals with the classification of e-learning tools based on the facet method, which suggests the separation of the parallel set of objects into independent classification groups; at the same time it is not assumed rigid classification structure and pre-built finite groups classification groups are formed by a combination of values taken from the relevant facets. An attempt to systematize the existing classification of e-learning tools from the standpoint of classification theory is made for the first time. Modern Ukrainian and foreign facet classifications of e-learning tools are described; their positive and negative features compared to classifications based on a hierarchical method are analyzed. The original author's facet classification of e-learning tools is proposed.
Classification with support hyperplanes

NARCIS (Netherlands)

G.I. Nalbantov (Georgi); J.C. Bioch (Cor); P.J.F. Groenen (Patrick)

2006-01-01

textabstractA new classification method is proposed, called Support Hy- perplanes (SHs). To solve the binary classification task, SHs consider the set of all hyperplanes that do not make classification mistakes, referred to as semi-consistent hyperplanes. A test object is classified using
A method for statistical steady state thermal analysis of reactor cores

International Nuclear Information System (INIS)

Whetton, P.A.

1980-01-01

This paper presents a method for performing a statistical steady state thermal analysis of a reactor core. The technique is only outlined here since detailed thermal equations are dependent on the core geometry. The method has been applied to a pressurised water reactor core and the results are presented for illustration purposes. Random hypothetical cores are generated using the Monte-Carlo method. The technique shows that by splitting the parameters into two types, denoted core-wise and in-core, the Monte Carlo method may be used inexpensively. The idea of using extremal statistics to characterise the low probability events (i.e. the tails of a distribution) is introduced together with a method of forming the final probability distribution. After establishing an acceptable probability of exceeding a thermal design criterion, the final probability distribution may be used to determine the corresponding thermal response value. If statistical and deterministic (i.e. conservative) thermal response values are compared, information on the degree of pessimism in the deterministic method of analysis may be inferred and the restrictive performance limitations imposed by this method relieved. (orig.)
Active learning methods for interactive image retrieval.

Science.gov (United States)

Gosselin, Philippe Henri; Cord, Matthieu

2008-07-01

Active learning methods have been considered with increased interest in the statistical learning community. Initially developed within a classification framework, a lot of extensions are now being proposed to handle multimedia applications. This paper provides algorithms within a statistical framework to extend active learning for online content-based image retrieval (CBIR). The classification framework is presented with experiments to compare several powerful classification techniques in this information retrieval context. Focusing on interactive methods, active learning strategy is then described. The limitations of this approach for CBIR are emphasized before presenting our new active selection process RETIN. First, as any active method is sensitive to the boundary estimation between classes, the RETIN strategy carries out a boundary correction to make the retrieval process more robust. Second, the criterion of generalization error to optimize the active learning selection is modified to better represent the CBIR objective of database ranking. Third, a batch processing of images is proposed. Our strategy leads to a fast and efficient active learning scheme to retrieve sets of online images (query concept). Experiments on large databases show that the RETIN method performs well in comparison to several other active strategies.
REMOTE SENSING IMAGE CLASSIFICATION APPLIED TO THE FIRST NATIONAL GEOGRAPHICAL INFORMATION CENSUS OF CHINA

Directory of Open Access Journals (Sweden)

X. Yu

2016-06-01

Full Text Available Image classification will still be a long way in the future, although it has gone almost half a century. In fact, researchers have gained many fruits in the image classification domain, but there is still a long distance between theory and practice. However, some new methods in the artificial intelligence domain will be absorbed into the image classification domain and draw on the strength of each to offset the weakness of the other, which will open up a new prospect. Usually, networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Tree Augmented Naive Bayesian Networks (TAN to texture classification of High-resolution remote sensing images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. Since 2013, China government has started the first national geographical information census project, which mainly interprets geographical information based on high-resolution remote sensing images. Therefore, this paper tries to apply Bayesian network to remote sensing image classification, in order to improve image interpretation in the first national geographical information census project. In the experiment, we choose some remote sensing images in Beijing. Experimental results demonstrate TAN outperform than Naive Bayesian Classifier (NBC and Maximum Likelihood Classification Method (MLC in the overall classification accuracy. In addition, the proposed method can reduce the workload of field workers and improve the work efficiency. Although it is time consuming, it will be an attractive and effective method for assisting office operation of image interpretation.
Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis

Directory of Open Access Journals (Sweden)

Łukasz Augustyniak

2015-12-01

Full Text Available We propose a novel method for counting sentiment orientation that outperforms supervised learning approaches in time and memory complexity and is not statistically significantly different from them in accuracy. Our method consists of a novel approach to generating unigram, bigram and trigram lexicons. The proposed method, called frequentiment, is based on calculating the frequency of features (words in the document and averaging their impact on the sentiment score as opposed to documents that do not contain these features. Afterwards, we use ensemble classification to improve the overall accuracy of the method. What is important is that the frequentiment-based lexicons with sentiment threshold selection outperform other popular lexicons and some supervised learners, while being 3–5 times faster than the supervised approach. We compare 37 methods (lexicons, ensembles with lexicon’s predictions as input and supervised learners applied to 10 Amazon review data sets and provide the first statistical comparison of the sentiment annotation methods that include ensemble approaches. It is one of the most comprehensive comparisons of domain sentiment analysis in the literature.
Applied systems ecology: models, data, and statistical methods

Energy Technology Data Exchange (ETDEWEB)

Eberhardt, L L

1976-01-01

In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.
Mathematical methods in quantum and statistical mechanics

International Nuclear Information System (INIS)

Fishman, L.

1977-01-01

The mathematical structure and closed-form solutions pertaining to several physical problems in quantum and statistical mechanics are examined in some detail. The J-matrix method, introduced previously for s-wave scattering and based upon well-established Hilbert Space theory and related generalized integral transformation techniques, is extended to treat the lth partial wave kinetic energy and Coulomb Hamiltonians within the context of square integrable (L 2 ), Laguerre (Slater), and oscillator (Gaussian) basis sets. The theory of relaxation in statistical mechanics within the context of the theory of linear integro-differential equations of the Master Equation type and their corresponding Markov processes is examined. Several topics of a mathematical nature concerning various computational aspects of the L 2 approach to quantum scattering theory are discussed
Selection of hidden layer nodes in neural networks by statistical tests

International Nuclear Information System (INIS)

Ciftcioglu, Ozer

1992-05-01

A statistical methodology for selection of the number of hidden layer nodes in feedforward neural networks is described. The method considers the network as an empirical model for the experimental data set subject to pattern classification so that the selection process becomes a model estimation through parameter identification. The solution is performed for an overdetermined estimation problem for identification using nonlinear least squares minimization technique. The number of the hidden layer nodes is determined as result of hypothesis testing. Accordingly the redundant network structure with respect to the number of parameters is avoided and the classification error being kept to a minimum. (author). 11 refs.; 4 figs.; 1 tab
Tissue Classification

DEFF Research Database (Denmark)

Van Leemput, Koen; Puonti, Oula

2015-01-01

Computational methods for automatically segmenting magnetic resonance images of the brain have seen tremendous advances in recent years. So-called tissue classification techniques, aimed at extracting the three main brain tissue classes (white matter, gray matter, and cerebrospinal fluid), are now...... well established. In their simplest form, these methods classify voxels independently based on their intensity alone, although much more sophisticated models are typically used in practice. This article aims to give an overview of often-used computational techniques for brain tissue classification...
Remote sensing mapping of macroalgal farms by modifying thresholds in the classification tree

KAUST Repository

Zheng, Yuhan

2018-05-07

Remote sensing is the main approach used to classify and map aquatic vegetation, and classification tree (CT) analysis is superior to various classification methods. Based on previous studies, modified CT can be developed from traditional CT by adjusting the thresholds based on the statistical relationship between spectral features to classify different images without ground-truth data. However, no studies have yet employed this method to resolve marine vegetation. In this study, three Gao-Fen 1 satellite images obtained with the same sensor on January 30, 2014, November 5, 2014, and January 21, 2015 were selected, and two features were then employed to extract macroalgae from aquaculture farms from the seawater background. Besides, object-based classification and other image analysis methods were adopted to improve the classification accuracy in this study. Results show that the overall accuracies of traditional CTs for three images are 92.0%, 94.2% and 93.9%, respectively, whereas the overall accuracies of the two corresponding modified CTs for images obtained on January 21, 2015 and November 5, 2014 are 93.1% and 89.5%, respectively. This indicates modified CTs can help map macroalgae with multi-date imagery and monitor the spatiotemporal distribution of macroalgae in coastal environments.
Remote sensing mapping of macroalgal farms by modifying thresholds in the classification tree

KAUST Repository

Zheng, Yuhan; Duarte, Carlos M.; Chen, Jiang; Li, Dan; Lou, Zhaohan; Wu, Jiaping

2018-01-01

Remote sensing is the main approach used to classify and map aquatic vegetation, and classification tree (CT) analysis is superior to various classification methods. Based on previous studies, modified CT can be developed from traditional CT by adjusting the thresholds based on the statistical relationship between spectral features to classify different images without ground-truth data. However, no studies have yet employed this method to resolve marine vegetation. In this study, three Gao-Fen 1 satellite images obtained with the same sensor on January 30, 2014, November 5, 2014, and January 21, 2015 were selected, and two features were then employed to extract macroalgae from aquaculture farms from the seawater background. Besides, object-based classification and other image analysis methods were adopted to improve the classification accuracy in this study. Results show that the overall accuracies of traditional CTs for three images are 92.0%, 94.2% and 93.9%, respectively, whereas the overall accuracies of the two corresponding modified CTs for images obtained on January 21, 2015 and November 5, 2014 are 93.1% and 89.5%, respectively. This indicates modified CTs can help map macroalgae with multi-date imagery and monitor the spatiotemporal distribution of macroalgae in coastal environments.
Coefficient of variation for use in crop area classification across multiple climates

Science.gov (United States)

Whelen, Tracy; Siqueira, Paul

2018-05-01

In this study, the coefficient of variation (CV) is introduced as a unitless statistical measurement for the classification of croplands using synthetic aperture radar (SAR) data. As a measurement of change, the CV is able to capture changing backscatter responses caused by cycles of planting, growing, and harvesting, and thus is able to differentiate these areas from a more static forest or urban area. Pixels with CV values above a given threshold are classified as crops, and below the threshold are non-crops. This paper uses cross-polarized L-band SAR data from the ALOS PALSAR satellite to classify eleven regions across the United States, covering a wide range of major crops and climates. Two separate sets of classification were done, with the first targeting the optimum classification thresholds for each dataset, and the second using a generalized threshold for all datasets to simulate a large-scale operationalized situation. Overall accuracies for the first phase of classification ranged from 66%-81%, and 62%-84% for the second phase. Visual inspection of the results shows numerous possibilities for improving the classifications while still using the same classification method, including increasing the number and temporal frequency of input images in order to better capture phenological events and mitigate the effects of major precipitation events, as well as more accurate ground truth data. These improvements would make the CV method a viable tool for monitoring agriculture throughout the year on a global scale.
An Efficient Graph-based Method for Long-term Land-use Change Statistics

Directory of Open Access Journals (Sweden)

Yipeng Zhang

2015-12-01

Full Text Available Statistical analysis of land-use change plays an important role in sustainable land management and has received increasing attention from scholars and administrative departments. However, the statistical process involving spatial overlay analysis remains difficult and needs improvement to deal with mass land-use data. In this paper, we introduce a spatio-temporal flow network model to reveal the hidden relational information among spatio-temporal entities. Based on graph theory, the constant condition of saturated multi-commodity flow is derived. A new method based on a network partition technique of spatio-temporal flow network are proposed to optimize the transition statistical process. The effectiveness and efficiency of the proposed method is verified through experiments using land-use data in Hunan from 2009 to 2014. In the comparison among three different land-use change statistical methods, the proposed method exhibits remarkable superiority in efficiency.
GA Based Optimal Feature Extraction Method for Functional Data Classification

OpenAIRE

Jun Wan; Zehua Chen; Yingwu Chen; Zhidong Bai

2010-01-01

Classification is an interesting problem in functional data analysis (FDA), because many science and application problems end up with classification problems, such as recognition, prediction, control, decision making, management, etc. As the high dimension and high correlation in functional data (FD), it is a key problem to extract features from FD whereas keeping its global characters, which relates to the classification efficiency and precision to heavens. In this paper...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.