sampling classifying multiclass: Topics by WorldWideScience.org

Sample records for sampling classifying multiclass

A fast learning method for large scale and multi-class samples of SVM

Science.gov (United States)

Fan, Yu; Guo, Huiming

2017-06-01

A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
A Supervised Multiclass Classifier for an Autocoding System

Directory of Open Access Journals (Sweden)

Yukako Toko

2017-11-01

Full Text Available Classification is often required in various contexts, including in the field of official statistics. In the previous study, we have developed a multiclass classifier that can classify short text descriptions with high accuracy. The algorithm borrows the concept of the naïve Bayes classifier and is so simple that its structure is easily understandable. The proposed classifier has the following two advantages. First, the processing times for both learning and classifying are extremely practical. Second, the proposed classifier yields high-accuracy results for a large portion of a dataset. We have previously developed an autocoding system for the Family Income and Expenditure Survey in Japan that has a better performing classifier. While the original system was developed in Perl in order to improve the efficiency of the coding process of short Japanese texts, the proposed system is implemented in the R programming language in order to explore versatility and is modified to make the system easily applicable to English text descriptions, in consideration of the increasing number of R users in the field of official statistics. We are planning to publish the proposed classifier as an R-package. The proposed classifier would be generally applicable to other classification tasks including coding activities in the field of official statistics, and it would contribute greatly to improving their efficiency.
SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

Science.gov (United States)

Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

2014-01-01

Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.
Classifying Physical Morphology of Cocoa Beans Digital Images using Multiclass Ensemble Least-Squares Support Vector Machine

Science.gov (United States)

Lawi, Armin; Adhitya, Yudhi

2018-03-01

The objective of this research is to determine the quality of cocoa beans through morphology of their digital images. Samples of cocoa beans were scattered on a bright white paper under a controlled lighting condition. A compact digital camera was used to capture the images. The images were then processed to extract their morphological parameters. Classification process begins with an analysis of cocoa beans image based on morphological feature extraction. Parameters for extraction of morphological or physical feature parameters, i.e., Area, Perimeter, Major Axis Length, Minor Axis Length, Aspect Ratio, Circularity, Roundness, Ferret Diameter. The cocoa beans are classified into 4 groups, i.e.: Normal Beans, Broken Beans, Fractured Beans, and Skin Damaged Beans. The model of classification used in this paper is the Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM), a proposed improvement model of SVM using ensemble method in which the separate hyperplanes are obtained by least square approach and the multiclass procedure uses One-Against- All method. The result of our proposed model showed that the classification with morphological feature input parameters were accurately as 99.705% for the four classes, respectively.
Application of machine learning on brain cancer multiclass classification

Science.gov (United States)

Panca, V.; Rustam, Z.

2017-07-01

Classification of brain cancer is a problem of multiclass classification. One approach to solve this problem is by first transforming it into several binary problems. The microarray gene expression dataset has the two main characteristics of medical data: extremely many features (genes) and only a few number of samples. The application of machine learning on microarray gene expression dataset mainly consists of two steps: feature selection and classification. In this paper, the features are selected using a method based on support vector machine recursive feature elimination (SVM-RFE) principle which is improved to solve multiclass classification, called multiple multiclass SVM-RFE. Instead of using only the selected features on a single classifier, this method combines the result of multiple classifiers. The features are divided into subsets and SVM-RFE is used on each subset. Then, the selected features on each subset are put on separate classifiers. This method enhances the feature selection ability of each single SVM-RFE. Twin support vector machine (TWSVM) is used as the method of the classifier to reduce computational complexity. While ordinary SVM finds single optimum hyperplane, the main objective Twin SVM is to find two non-parallel optimum hyperplanes. The experiment on the brain cancer microarray gene expression dataset shows this method could classify 71,4% of the overall test data correctly, using 100 and 1000 genes selected from multiple multiclass SVM-RFE feature selection method. Furthermore, the per class results show that this method could classify data of normal and MD class with 100% accuracy.
Identification and optimization of classifier genes from multi-class earthworm microarray dataset.

Directory of Open Access Journals (Sweden)

Ying Li

Full Text Available Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.
Case based reasoning applied to medical diagnosis using multi-class classifier: A preliminary study

Directory of Open Access Journals (Sweden)

D. Viveros-Melo

2017-02-01

Full Text Available Case-based reasoning (CBR is a process used for computer processing that tries to mimic the behavior of a human expert in making decisions regarding a subject and learn from the experience of past cases. CBR has demonstrated to be appropriate for working with unstructured domains data or difficult knowledge acquisition situations, such as medical diagnosis, where it is possible to identify diseases such as: cancer diagnosis, epilepsy prediction and appendicitis diagnosis. Some of the trends that may be developed for CBR in the health science are oriented to reduce the number of features in highly dimensional data. An important contribution may be the estimation of probabilities of belonging to each class for new cases. In this paper, in order to adequately represent the database and to avoid the inconveniences caused by the high dimensionality, noise and redundancy, a number of algorithms are used in the preprocessing stage for performing both variable selection and dimension reduction procedures. Also, a comparison of the performance of some representative multi-class classifiers is carried out to identify the most effective one to include within a CBR scheme. Particularly, four classification techniques and two reduction techniques are employed to make a comparative study of multiclass classifiers on CBR
Action Recognition Using 3D Histograms of Texture and A Multi-Class Boosting Classifier.

Science.gov (United States)

Zhang, Baochang; Yang, Yun; Chen, Chen; Yang, Linlin; Han, Jungong; Shao, Ling

2017-10-01

Human action recognition is an important yet challenging task. This paper presents a low-cost descriptor called 3D histograms of texture (3DHoTs) to extract discriminant features from a sequence of depth maps. 3DHoTs are derived from projecting depth frames onto three orthogonal Cartesian planes, i.e., the frontal, side, and top planes, and thus compactly characterize the salient information of a specific action, on which texture features are calculated to represent the action. Besides this fast feature descriptor, a new multi-class boosting classifier (MBC) is also proposed to efficiently exploit different kinds of features in a unified framework for action classification. Compared with the existing boosting frameworks, we add a new multi-class constraint into the objective function, which helps to maintain a better margin distribution by maximizing the mean of margin, whereas still minimizing the variance of margin. Experiments on the MSRAction3D, MSRGesture3D, MSRActivity3D, and UTD-MHAD data sets demonstrate that the proposed system combining 3DHoTs and MBC is superior to the state of the art.
An overview of sample preparation procedures for LC-MS multiclass antibiotic determination in environmental and food samples.

Science.gov (United States)

Moreno-Bondi, María Cruz; Marazuela, María Dolores; Herranz, Sonia; Rodriguez, Erika

2009-10-01

Antibiotics are a class of pharmaceuticals that are of great interest due to the large volumes of these substances that are consumed in both human and veterinary medicine, and due to their status as the agents responsible for bacterial resistance. They can be present in foodstuffs and in environmental samples as multicomponent chemical mixtures that exhibit a wide range of mechanisms of action. Moreover, they can be transformed into different metabolites by the action of microorganisms, as well as by other physical or chemical means, resulting in mixtures with higher ecotoxicities and risks to human health than those of the individual compounds. Therefore, there is growing interest in the availability of multiclass methods for the analysis of antimicrobial mixtures in environmental and food samples at very low concentrations. Liquid chromatography (LC) has become the technique of choice for multiclass analysis, especially when coupled to mass spectrometry (LC-MS) and tandem MS (LC-MS(2)). However, due to the complexity of the matrix, in most cases an extraction step for sample clean-up and preconcentration is required before analysis in order to achieve the required sensitivities. This paper reviews the most recent developments and applications of multiclass antimicrobial determination in environmental and food matrices, emphasizing the practical aspects of sample preparation for the simultaneous extraction of antimicrobials from the selected samples. Future trends in the application of LC-MS-based techniques to multiclass antibiotic analysis are also presented.
Multiclass semi-supervised learning for animal behavior recognition from accelerometer data

NARCIS (Netherlands)

Tanha, J.; van Someren, M.; de Bakker, M.; Bouten, W.; Shamoun-Baranes, J.; Afsarmanesh, H.

2012-01-01

In this paper we present a new Multiclass semi-supervised learning algorithm that uses a base classifier in combination with a similarity function applied to all data to find a classifier that maximizes the margin and consistency over all data. A novel multiclass loss function is presented and used
The use of hyperspectral data for tree species discrimination: Combining binary classifiers

CSIR Research Space (South Africa)

Dastile, X

2010-11-01

Full Text Available classifier Classification system 7 class 1 class 2 new sample For 5-nearest neighbour classification: assign new sample to class 1. RU SASA 2010 ? Given learning task {(x1,t1),(x 2,t2),?,(x p,tp)} (xi ? Rn feature vectors, ti ? {?1,?, ?c...). A review on the combination of binary classifiers in multiclass problems. Springer science and Business Media B.V [7] Dietterich T.G and Bakiri G.(1995). Solving Multiclass Learning Problem via Error-Correcting Output Codes. AI Access Foundation...
EEG classification for motor imagery and resting state in BCI applications using multi-class Adaboost extreme learning machine

Science.gov (United States)

Gao, Lin; Cheng, Wei; Zhang, Jinhua; Wang, Jue

2016-08-01

Brain-computer interface (BCI) systems provide an alternative communication and control approach for people with limited motor function. Therefore, the feature extraction and classification approach should differentiate the relative unusual state of motion intention from a common resting state. In this paper, we sought a novel approach for multi-class classification in BCI applications. We collected electroencephalographic (EEG) signals registered by electrodes placed over the scalp during left hand motor imagery, right hand motor imagery, and resting state for ten healthy human subjects. We proposed using the Kolmogorov complexity (Kc) for feature extraction and a multi-class Adaboost classifier with extreme learning machine as base classifier for classification, in order to classify the three-class EEG samples. An average classification accuracy of 79.5% was obtained for ten subjects, which greatly outperformed commonly used approaches. Thus, it is concluded that the proposed method could improve the performance for classification of motor imagery tasks for multi-class samples. It could be applied in further studies to generate the control commands to initiate the movement of a robotic exoskeleton or orthosis, which finally facilitates the rehabilitation of disabled people.
Single classifier, OvO, OvA and RCC multiclass classification method in handheld based smartphone gait identification

Science.gov (United States)

Raziff, Abdul Rafiez Abdul; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran

2017-10-01

Gait recognition is widely used in many applications. In the application of the gait identification especially in people, the number of classes (people) is many which may comprise to more than 20. Due to the large amount of classes, the usage of single classification mapping (direct classification) may not be suitable as most of the existing algorithms are mostly designed for the binary classification. Furthermore, having many classes in a dataset may result in the possibility of having a high degree of overlapped class boundary. This paper discusses the application of multiclass classifier mappings such as one-vs-all (OvA), one-vs-one (OvO) and random correction code (RCC) on handheld based smartphone gait signal for person identification. The results is then compared with a single J48 decision tree for benchmark. From the result, it can be said that using multiclass classification mapping method thus partially improved the overall accuracy especially on OvO and RCC with width factor more than 4. For OvA, the accuracy result is worse than a single J48 due to a high number of classes.
Gene-Based Multiclass Cancer Diagnosis with Class-Selective Rejections

Science.gov (United States)

Jrad, Nisrine; Grall-Maës, Edith; Beauseroy, Pierre

2009-01-01

Supervised learning of microarray data is receiving much attention in recent years. Multiclass cancer diagnosis, based on selected gene profiles, are used as adjunct of clinical diagnosis. However, supervised diagnosis may hinder patient care, add expense or confound a result. To avoid this misleading, a multiclass cancer diagnosis with class-selective rejection is proposed. It rejects some patients from one, some, or all classes in order to ensure a higher reliability while reducing time and expense costs. Moreover, this classifier takes into account asymmetric penalties dependant on each class and on each wrong or partially correct decision. It is based on ν-1-SVM coupled with its regularization path and minimizes a general loss function defined in the class-selective rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers. PMID:19584932
Multi-class Mode of Action Classification of Toxic Compounds Using Logic Based Kernel Methods.

Science.gov (United States)

Lodhi, Huma; Muggleton, Stephen; Sternberg, Mike J E

2010-09-17

Toxicity prediction is essential for drug design and development of effective therapeutics. In this paper we present an in silico strategy, to identify the mode of action of toxic compounds, that is based on the use of a novel logic based kernel method. The technique uses support vector machines in conjunction with the kernels constructed from first order rules induced by an Inductive Logic Programming system. It constructs multi-class models by using a divide and conquer reduction strategy that splits multi-classes into binary groups and solves each individual problem recursively hence generating an underlying decision list structure. In order to evaluate the effectiveness of the approach for chemoinformatics problems like predictive toxicology, we apply it to toxicity classification in aquatic systems. The method is used to identify and classify 442 compounds with respect to the mode of action. The experimental results show that the technique successfully classifies toxic compounds and can be useful in assessing environmental risks. Experimental comparison of the performance of the proposed multi-class scheme with the standard multi-class Inductive Logic Programming algorithm and multi-class Support Vector Machine yields statistically significant results and demonstrates the potential power and benefits of the approach in identifying compounds of various toxic mechanisms. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Multiclass Boosting with Adaptive Group-Based kNN and Its Application in Text Categorization

Directory of Open Access Journals (Sweden)

Lei La

2012-01-01

Full Text Available AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine (SVM, neural networks (NN, naïve Bayes, and k-nearest neighbor (kNN. This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple two-class classification problems. This novel method is more effective. In addition, it keeps the accuracy advantage of existing AdaBoost. An adaptive group-based kNN method is proposed in this paper to build more accurate weak classifiers and in this way control the number of basis classifiers in an acceptable range. To further enhance the performance, weak classifiers are combined into a strong classifier through a double iterative weighted way and construct an adaptive group-based kNN boosting algorithm (AGkNN-AdaBoost. We implement AGkNN-AdaBoost in a Chinese text categorization system. Experimental results showed that the classification algorithm proposed in this paper has better performance both in precision and recall than many other text categorization methods including traditional AdaBoost. In addition, the processing speed is significantly enhanced than original AdaBoost and many other classic categorization algorithms.
Detection of circuit-board components with an adaptive multiclass correlation filter

Science.gov (United States)

Diaz-Ramirez, Victor H.; Kober, Vitaly

2008-08-01

A new method for reliable detection of circuit-board components is proposed. The method is based on an adaptive multiclass composite correlation filter. The filter is designed with the help of an iterative algorithm using complex synthetic discriminant functions. The impulse response of the filter contains information needed to localize and classify geometrically distorted circuit-board components belonging to different classes. Computer simulation results obtained with the proposed method are provided and compared with those of known multiclass correlation based techniques in terms of performance criteria for recognition and classification of objects.
Circular blurred shape model for multiclass symbol recognition.

Science.gov (United States)

Escalera, Sergio; Fornés, Alicia; Pujol, Oriol; Lladós, Josep; Radeva, Petia

2011-04-01

In this paper, we propose a circular blurred shape model descriptor to deal with the problem of symbol detection and classification as a particular case of object recognition. The feature extraction is performed by capturing the spatial arrangement of significant object characteristics in a correlogram structure. The shape information from objects is shared among correlogram regions, where a prior blurring degree defines the level of distortion allowed in the symbol, making the descriptor tolerant to irregular deformations. Moreover, the descriptor is rotation invariant by definition. We validate the effectiveness of the proposed descriptor in both the multiclass symbol recognition and symbol detection domains. In order to perform the symbol detection, the descriptors are learned using a cascade of classifiers. In the case of multiclass categorization, the new feature space is learned using a set of binary classifiers which are embedded in an error-correcting output code design. The results over four symbol data sets show the significant improvements of the proposed descriptor compared to the state-of-the-art descriptors. In particular, the results are even more significant in those cases where the symbols suffer from elastic deformations.
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.

Science.gov (United States)

Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina

2007-05-22

Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at http://svm-fold.c2b2.columbia.edu. Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach
Multiclass gene selection using Pareto-fronts.

Science.gov (United States)

Rajapakse, Jagath C; Mundra, Piyushkumar A

2013-01-01

Filter methods are often used for selection of genes in multiclass sample classification by using microarray data. Such techniques usually tend to bias toward a few classes that are easily distinguishable from other classes due to imbalances of strong features and sample sizes of different classes. It could therefore lead to selection of redundant genes while missing the relevant genes, leading to poor classification of tissue samples. In this manuscript, we propose to decompose multiclass ranking statistics into class-specific statistics and then use Pareto-front analysis for selection of genes. This alleviates the bias induced by class intrinsic characteristics of dominating classes. The use of Pareto-front analysis is demonstrated on two filter criteria commonly used for gene selection: F-score and KW-score. A significant improvement in classification performance and reduction in redundancy among top-ranked genes were achieved in experiments with both synthetic and real-benchmark data sets.

Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics.

Science.gov (United States)

Trainor, Patrick J; DeFilippis, Andrew P; Rai, Shesh N

2017-06-21

Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k -Nearest Neighbors ( k -NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction
Universum Learning for Multiclass SVM

OpenAIRE

Dhar, Sauptik; Ramakrishnan, Naveen; Cherkassky, Vladimir; Shah, Mohak

2016-01-01

We introduce Universum learning for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose a span bound for MU-SVM that can be used for model selection thereby avoiding resampling. Empirical results demonstrate the effectiveness of MU-SVM and the proposed bound.
Fisher classifier and its probability of error estimation

Science.gov (United States)

Chittineni, C. B.

1979-01-01

Computationally efficient expressions are derived for estimating the probability of error using the leave-one-out method. The optimal threshold for the classification of patterns projected onto Fisher's direction is derived. A simple generalization of the Fisher classifier to multiple classes is presented. Computational expressions are developed for estimating the probability of error of the multiclass Fisher classifier.
Least Square Support Vector Machine Classifier vs a Logistic Regression Classifier on the Recognition of Numeric Digits

Directory of Open Access Journals (Sweden)

Danilo A. López-Sarmiento

2013-11-01

Full Text Available In this paper is compared the performance of a multi-class least squares support vector machine (LSSVM mc versus a multi-class logistic regression classifier to problem of recognizing the numeric digits (0-9 handwritten. To develop the comparison was used a data set consisting of 5000 images of handwritten numeric digits (500 images for each number from 0-9, each image of 20 x 20 pixels. The inputs to each of the systems were vectors of 400 dimensions corresponding to each image (not done feature extraction. Both classifiers used OneVsAll strategy to enable multi-classification and a random cross-validation function for the process of minimizing the cost function. The metrics of comparison were precision and training time under the same computational conditions. Both techniques evaluated showed a precision above 95 %, with LS-SVM slightly more accurate. However the computational cost if we found a marked difference: LS-SVM training requires time 16.42 % less than that required by the logistic regression model based on the same low computational conditions.
Gene-Based Multiclass Cancer Diagnosis with Class-Selective Rejections

Directory of Open Access Journals (Sweden)

Nisrine Jrad

2009-01-01

rejection scheme. The state of art multiclass algorithms can be considered as a particular case of the proposed algorithm where the number of decisions is given by the classes and the loss function is defined by the Bayesian risk. Two experiments are carried out in the Bayesian and the class selective rejection frameworks. Five genes selected datasets are used to assess the performance of the proposed method. Results are discussed and accuracies are compared with those computed by the Naive Bayes, Nearest Neighbor, Linear Perceptron, Multilayer Perceptron, and Support Vector Machines classifiers.
Computationally efficient SVM multi-class image recognition with confidence measures

International Nuclear Information System (INIS)

Makili, Lazaro; Vega, Jesus; Dormido-Canto, Sebastian; Pastor, Ignacio; Murari, Andrea

2011-01-01

Typically, machine learning methods produce non-qualified estimates, i.e. the accuracy and reliability of the predictions are not provided. Transductive predictors are very recent classifiers able to provide, simultaneously with the prediction, a couple of values (confidence and credibility) to reflect the quality of the prediction. Usually, a drawback of the transductive techniques for huge datasets and large dimensionality is the high computational time. To overcome this issue, a more efficient classifier has been used in a multi-class image classification problem in the TJ-II stellarator database. It is based on the creation of a hash function to generate several 'one versus the rest' classifiers for every class. By using Support Vector Machines as the underlying classifier, a comparison between the pure transductive approach and the new method has been performed. In both cases, the success rates are high and the computation time with the new method is up to 0.4 times the old one.
The Construction of Support Vector Machine Classifier Using the Firefly Algorithm

Directory of Open Access Journals (Sweden)

Chih-Feng Chao

2015-01-01

Full Text Available The setting of parameters in the support vector machines (SVMs is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM. This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI, machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM. The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.
GenSVM: a generalized multiclass support vector machine

NARCIS (Netherlands)

G.J.J. van den Burg (Gertjan); P.J.F. Groenen (Patrick)

2016-01-01

textabstractTraditional extensions of the binary support vector machine (SVM) to multiclass problems are either heuristics or require solving a large dual optimization problem. Here, a generalized multiclass SVM is proposed called GenSVM. In this method classification boundaries for a K-class
Building gene expression profile classifiers with a simple and efficient rejection option in R.

Science.gov (United States)

Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez

2011-01-01

The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional
Improved Classification by Non Iterative and Ensemble Classifiers in Motor Fault Diagnosis

Directory of Open Access Journals (Sweden)

PANIGRAHY, P. S.

2018-02-01

Full Text Available Data driven approach for multi-class fault diagnosis of induction motor using MCSA at steady state condition is a complex pattern classification problem. This investigation has exploited the built-in ensemble process of non-iterative classifiers to resolve the most challenging issues in this area, including bearing and stator fault detection. Non-iterative techniques exhibit with an average 15% of increased fault classification accuracy against their iterative counterparts. Particularly RF has shown outstanding performance even at less number of training samples and noisy feature space because of its distributive feature model. The robustness of the results, backed by the experimental verification shows that the non-iterative individual classifiers like RF is the optimum choice in the area of automatic fault diagnosis of induction motor.
Multiclass Posterior Probability Twin SVM for Motor Imagery EEG Classification.

Science.gov (United States)

She, Qingshan; Ma, Yuliang; Meng, Ming; Luo, Zhizeng

2015-01-01

Motor imagery electroencephalography is widely used in the brain-computer interface systems. Due to inherent characteristics of electroencephalography signals, accurate and real-time multiclass classification is always challenging. In order to solve this problem, a multiclass posterior probability solution for twin SVM is proposed by the ranking continuous output and pairwise coupling in this paper. First, two-class posterior probability model is constructed to approximate the posterior probability by the ranking continuous output techniques and Platt's estimating method. Secondly, a solution of multiclass probabilistic outputs for twin SVM is provided by combining every pair of class probabilities according to the method of pairwise coupling. Finally, the proposed method is compared with multiclass SVM and twin SVM via voting, and multiclass posterior probability SVM using different coupling approaches. The efficacy on the classification accuracy and time complexity of the proposed method has been demonstrated by both the UCI benchmark datasets and real world EEG data from BCI Competition IV Dataset 2a, respectively.
Detection of surface cracking in steel pipes based on vibration data using a multi-class support vector machine classifier

Science.gov (United States)

Mustapha, S.; Braytee, A.; Ye, L.

2017-04-01

In this study, we focused at the development and verification of a robust framework for surface crack detection in steel pipes using measured vibration responses; with the presence of multiple progressive damage occurring in different locations within the structure. Feature selection, dimensionality reduction, and multi-class support vector machine were established for this purpose. Nine damage cases, at different locations, orientations and length, were introduced into the pipe structure. The pipe was impacted 300 times using an impact hammer, after each damage case, the vibration data were collected using 3 PZT wafers which were installed on the outer surface of the pipe. At first, damage sensitive features were extracted using the frequency response function approach followed by recursive feature elimination for dimensionality reduction. Then, a multi-class support vector machine learning algorithm was employed to train the data and generate a statistical model. Once the model is established, decision values and distances from the hyper-plane were generated for the new collected data using the trained model. This process was repeated on the data collected from each sensor. Overall, using a single sensor for training and testing led to a very high accuracy reaching 98% in the assessment of the 9 damage cases used in this study.
Multi-view Multi-sparsity Kernel Reconstruction for Multi-class Image Classification

KAUST Repository

Zhu, Xiaofeng

2015-05-28

This paper addresses the problem of multi-class image classification by proposing a novel multi-view multi-sparsity kernel reconstruction (MMKR for short) model. Given images (including test images and training images) representing with multiple visual features, the MMKR first maps them into a high-dimensional space, e.g., a reproducing kernel Hilbert space (RKHS), where test images are then linearly reconstructed by some representative training images, rather than all of them. Furthermore a classification rule is proposed to classify test images. Experimental results on real datasets show the effectiveness of the proposed MMKR while comparing to state-of-the-art algorithms.
Non-Mutually Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis

Directory of Open Access Journals (Sweden)

Bach Phi Duong

2018-04-01

Full Text Available The simultaneous occurrence of various types of defects in bearings makes their diagnosis more challenging owing to the resultant complexity of the constituent parts of the acoustic emission (AE signals. To address this issue, a new approach is proposed in this paper for the detection of multiple combined faults in bearings. The proposed methodology uses a deep neural network (DNN architecture to effectively diagnose the combined defects. The DNN structure is based on the stacked denoising autoencoder non-mutually exclusive classifier (NMEC method for combined modes. The NMEC-DNN is trained using data for a single fault and it classifies both single faults and multiple combined faults. The results of experiments conducted on AE data collected through an experimental test-bed demonstrate that the DNN achieves good classification performance with a maximum accuracy of 95%. The proposed method is compared with a multi-class classifier based on support vector machines (SVMs. The NMEC-DNN yields better diagnostic performance in comparison to the multi-class classifier based on SVM. The NMEC-DNN reduces the number of necessary data collections and improves the bearing fault diagnosis performance.
Non-Mutually Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis.

Science.gov (United States)

Duong, Bach Phi; Kim, Jong-Myon

2018-04-07

The simultaneous occurrence of various types of defects in bearings makes their diagnosis more challenging owing to the resultant complexity of the constituent parts of the acoustic emission (AE) signals. To address this issue, a new approach is proposed in this paper for the detection of multiple combined faults in bearings. The proposed methodology uses a deep neural network (DNN) architecture to effectively diagnose the combined defects. The DNN structure is based on the stacked denoising autoencoder non-mutually exclusive classifier (NMEC) method for combined modes. The NMEC-DNN is trained using data for a single fault and it classifies both single faults and multiple combined faults. The results of experiments conducted on AE data collected through an experimental test-bed demonstrate that the DNN achieves good classification performance with a maximum accuracy of 95%. The proposed method is compared with a multi-class classifier based on support vector machines (SVMs). The NMEC-DNN yields better diagnostic performance in comparison to the multi-class classifier based on SVM. The NMEC-DNN reduces the number of necessary data collections and improves the bearing fault diagnosis performance.
Non-Mutually Exclusive Deep Neural Network Classifier for Combined Modes of Bearing Fault Diagnosis

Science.gov (United States)

Kim, Jong-Myon

2018-01-01

The simultaneous occurrence of various types of defects in bearings makes their diagnosis more challenging owing to the resultant complexity of the constituent parts of the acoustic emission (AE) signals. To address this issue, a new approach is proposed in this paper for the detection of multiple combined faults in bearings. The proposed methodology uses a deep neural network (DNN) architecture to effectively diagnose the combined defects. The DNN structure is based on the stacked denoising autoencoder non-mutually exclusive classifier (NMEC) method for combined modes. The NMEC-DNN is trained using data for a single fault and it classifies both single faults and multiple combined faults. The results of experiments conducted on AE data collected through an experimental test-bed demonstrate that the DNN achieves good classification performance with a maximum accuracy of 95%. The proposed method is compared with a multi-class classifier based on support vector machines (SVMs). The NMEC-DNN yields better diagnostic performance in comparison to the multi-class classifier based on SVM. The NMEC-DNN reduces the number of necessary data collections and improves the bearing fault diagnosis performance. PMID:29642466
A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

Science.gov (United States)

Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

2017-09-01

Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
Multiclass Classification of Cardiac Arrhythmia Using Improved Feature Selection and SVM Invariants.

Science.gov (United States)

Mustaqeem, Anam; Anwar, Syed Muhammad; Majid, Muahammad

2018-01-01

Arrhythmia is considered a life-threatening disease causing serious health issues in patients, when left untreated. An early diagnosis of arrhythmias would be helpful in saving lives. This study is conducted to classify patients into one of the sixteen subclasses, among which one class represents absence of disease and the other fifteen classes represent electrocardiogram records of various subtypes of arrhythmias. The research is carried out on the dataset taken from the University of California at Irvine Machine Learning Data Repository. The dataset contains a large volume of feature dimensions which are reduced using wrapper based feature selection technique. For multiclass classification, support vector machine (SVM) based approaches including one-against-one (OAO), one-against-all (OAA), and error-correction code (ECC) are employed to detect the presence and absence of arrhythmias. The SVM method results are compared with other standard machine learning classifiers using varying parameters and the performance of the classifiers is evaluated using accuracy, kappa statistics, and root mean square error. The results show that OAO method of SVM outperforms all other classifiers by achieving an accuracy rate of 81.11% when used with 80/20 data split and 92.07% using 90/10 data split option.
78 FR 21393 - Notice of Submission of Proposed Information Collection to OMB Ginnie Mae Multiclass Securities...

Science.gov (United States)

2013-04-10

..., Ginnie Mae has already guaranteed the collateral for the multiclass instruments. The Ginnie Mae... mortgage market and to attract new sources of capital for federally insured or guaranteed loans. Under this... guaranteed the collateral for the multiclass instruments. The Ginnie Mae Multiclass Securities Program...
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition

Science.gov (United States)

Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

2017-01-01

Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition. PMID:28937987

Combination of minimum enclosing balls classifier with SVM in coal-rock recognition.

Science.gov (United States)

Song, QingJun; Jiang, HaiYan; Song, Qinghui; Zhao, XieGuang; Wu, Xiaoxuan

2017-01-01

Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score) feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB) algorithm plus Support vector machine (SVM) is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition.
Predicting Tunnel Squeezing Using Multiclass Support Vector Machines

Directory of Open Access Journals (Sweden)

Yang Sun

2018-01-01

Full Text Available Tunnel squeezing is one of the major geological disasters that often occur during the construction of tunnels in weak rock masses subjected to high in situ stresses. It could cause shield jamming, budget overruns, and construction delays and could even lead to tunnel instability and casualties. Therefore, accurate prediction or identification of tunnel squeezing is extremely important in the design and construction of tunnels. This study presents a modified application of a multiclass support vector machine (SVM to predict tunnel squeezing based on four parameters, that is, diameter (D, buried depth (H, support stiffness (K, and rock tunneling quality index (Q. We compiled a database from the literature, including 117 case histories obtained from different countries such as India, Nepal, and Bhutan, to train the multiclass SVM model. The proposed model was validated using 8-fold cross validation, and the average error percentage was approximately 11.87%. Compared with existing approaches, the proposed multiclass SVM model yields a better performance in predictive accuracy. More importantly, one could estimate the severity of potential squeezing problems based on the predicted squeezing categories/classes.
Combination of minimum enclosing balls classifier with SVM in coal-rock recognition.

Directory of Open Access Journals (Sweden)

QingJun Song

Full Text Available Top-coal caving technology is a productive and efficient method in modern mechanized coal mining, the study of coal-rock recognition is key to realizing automation in comprehensive mechanized coal mining. In this paper we propose a new discriminant analysis framework for coal-rock recognition. In the framework, a data acquisition model with vibration and acoustic signals is designed and the caving dataset with 10 feature variables and three classes is got. And the perfect combination of feature variables can be automatically decided by using the multi-class F-score (MF-Score feature selection. In terms of nonlinear mapping in real-world optimization problem, an effective minimum enclosing ball (MEB algorithm plus Support vector machine (SVM is proposed for rapid detection of coal-rock in the caving process. In particular, we illustrate how to construct MEB-SVM classifier in coal-rock recognition which exhibit inherently complex distribution data. The proposed method is examined on UCI data sets and the caving dataset, and compared with some new excellent SVM classifiers. We conduct experiments with accuracy and Friedman test for comparison of more classifiers over multiple on the UCI data sets. Experimental results demonstrate that the proposed algorithm has good robustness and generalization ability. The results of experiments on the caving dataset show the better performance which leads to a promising feature selection and multi-class recognition in coal-rock recognition.
Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems

Directory of Open Access Journals (Sweden)

Chung I-Fang

2008-10-01

Full Text Available Abstract Background The Signal-to-Noise-Ratio (SNR is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs. These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems. Results and discussion To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer. For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC, Nearest Mean Classifier (NMC, Support Vector Machine (SVM classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA and one-vs-one (OVO strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the
On the role of cost-sensitive learning in multi-class brain-computer interfaces.

Science.gov (United States)

Devlaminck, Dieter; Waegeman, Willem; Wyns, Bart; Otte, Georges; Santens, Patrick

2010-06-01

Brain-computer interfaces (BCIs) present an alternative way of communication for people with severe disabilities. One of the shortcomings in current BCI systems, recently put forward in the fourth BCI competition, is the asynchronous detection of motor imagery versus resting state. We investigated this extension to the three-class case, in which the resting state is considered virtually lying between two motor classes, resulting in a large penalty when one motor task is misclassified into the other motor class. We particularly focus on the behavior of different machine-learning techniques and on the role of multi-class cost-sensitive learning in such a context. To this end, four different kernel methods are empirically compared, namely pairwise multi-class support vector machines (SVMs), two cost-sensitive multi-class SVMs and kernel-based ordinal regression. The experimental results illustrate that ordinal regression performs better than the other three approaches when a cost-sensitive performance measure such as the mean-squared error is considered. By contrast, multi-class cost-sensitive learning enables us to control the number of large errors made between two motor tasks.
Towards an Erlang formula for multiclass networks

NARCIS (Netherlands)

Jonckheere, M.; Mairesse, J.

2010-01-01

Consider a multiclass stochastic network with state-dependent service rates and arrival rates describing bandwidth-sharing mechanisms as well as admission control and/or load balancing schemes. Given Poisson arrival and exponential service requirements, the number of customers in the network evolves
A Realistic Seizure Prediction Study Based on Multiclass SVM.

Science.gov (United States)

Direito, Bruno; Teixeira, César A; Sales, Francisco; Castelo-Branco, Miguel; Dourado, António

2017-05-01

A patient-specific algorithm, for epileptic seizure prediction, based on multiclass support-vector machines (SVM) and using multi-channel high-dimensional feature sets, is presented. The feature sets, combined with multiclass classification and post-processing schemes aim at the generation of alarms and reduced influence of false positives. This study considers 216 patients from the European Epilepsy Database, and includes 185 patients with scalp EEG recordings and 31 with intracranial data. The strategy was tested over a total of 16,729.80[Formula: see text]h of inter-ictal data, including 1206 seizures. We found an overall sensitivity of 38.47% and a false positive rate per hour of 0.20. The performance of the method achieved statistical significance in 24 patients (11% of the patients). Despite the encouraging results previously reported in specific datasets, the prospective demonstration on long-term EEG recording has been limited. Our study presents a prospective analysis of a large heterogeneous, multicentric dataset. The statistical framework based on conservative assumptions, reflects a realistic approach compared to constrained datasets, and/or in-sample evaluations. The improvement of these results, with the definition of an appropriate set of features able to improve the distinction between the pre-ictal and nonpre-ictal states, hence minimizing the effect of confounding variables, remains a key aspect.
Multi-class oscillating systems of interacting neurons

DEFF Research Database (Denmark)

Ditlevsen, Susanne; Löcherbach, Eva

2017-01-01

We consider multi-class systems of interacting nonlinear Hawkes processes modeling several large families of neurons and study their mean field limits. As the total number of neurons goes to infinity we prove that the evolution within each class can be described by a nonlinear limit differential...
Classifier-Guided Sampling for Complex Energy System Optimization

Energy Technology Data Exchange (ETDEWEB)

Backlund, Peter B. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States); Eddy, John P. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)

2015-09-01

This report documents the results of a Laboratory Directed Research and Development (LDRD) effort enti tled "Classifier - Guided Sampling for Complex Energy System Optimization" that was conducted during FY 2014 and FY 2015. The goal of this proj ect was to develop, implement, and test major improvements to the classifier - guided sampling (CGS) algorithm. CGS is type of evolutionary algorithm for perform ing search and optimization over a set of discrete design variables in the face of one or more objective functions. E xisting evolutionary algorithms, such as genetic algorithms , may require a large number of o bjecti ve function evaluations to identify optimal or near - optimal solutions . Reducing the number of evaluations can result in significant time savings, especially if the objective function is computationally expensive. CGS reduce s the evaluation count by us ing a Bayesian network classifier to filter out non - promising candidate designs , prior to evaluation, based on their posterior probabilit ies . In this project, b oth the single - objective and multi - objective version s of the CGS are developed and tested on a set of benchm ark problems. As a domain - specific case study, CGS is used to design a microgrid for use in islanded mode during an extended bulk power grid outage.
Feature selection and classification of mechanical fault of an induction motor using random forest classifier

OpenAIRE

Patel, Raj Kumar; Giri, V.K.

2016-01-01

Fault detection and diagnosis is the most important technology in condition-based maintenance (CBM) system for rotating machinery. This paper experimentally explores the development of a random forest (RF) classifier, a recently emerged machine learning technique, for multi-class mechanical fault diagnosis in bearing of an induction motor. Firstly, the vibration signals are collected from the bearing using accelerometer sensor. Parameters from the vibration signal are extracted in the form of...
Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

Directory of Open Access Journals (Sweden)

Chi-Cheng Huang

2013-01-01

Full Text Available Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n=535. The agreement between PAM50 centroid-based single sample prediction (SSP and PLS-regression was excellent (weighted Kappa: 0.988 within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed. Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.
Multiclass methods for the analysis of antibiotic residues in milk by liquid chromatography coupled to mass spectrometry: A review.

Science.gov (United States)

Rossi, Rosanna; Saluti, Giorgio; Moretti, Simone; Diamanti, Irene; Giusepponi, Danilo; Galarini, Roberta

2018-02-01

Milk is an important and beneficial food from a nutritional point of view, being an indispensable source of high quality proteins. Furthermore, it is a raw material for many dairy products, such as yoghurt, cheese, cream etc. Before reaching consumers, milk goes through production, processing and circulation. Each step involves potentially unsafe factors, such as chemical contamination that can affect milk quality. Antibiotics are widely used in veterinary medicine for dry cow therapy and mastitis treatment in lactating cows, which can cause the presence of antimicrobial residues in milk. In order to ensure consumers' safety, milk is analyzed to make sure that the fixed Maximum Residue Limits (MRLs) for antibiotics are not exceeded. Multiclass methods can monitor more drug classes through a single analysis, so they are faster, less time-consuming and cheaper than traditional methods (single-class); this aspect is particularly important for milk, which is a highly perishable food. Nevertheless, multiclass methods for veterinary drug residues in foodstuffs are real analytical challenges. This article reviews the major multiclass methods published for the determination of antibiotic residues in milk by liquid chromatography coupled to mass spectrometry, with a special focus on sample preparation approaches.
A Multi-Class, Interdisciplinary Project Using Elementary Statistics

Science.gov (United States)

Reese, Margaret

2012-01-01

This article describes a multi-class project that employs statistical computing and writing in a statistics class. Three courses, General Ecology, Meteorology, and Introductory Statistics, cooperated on a project for the EPA's Student Design Competition. The continuing investigation has also spawned several undergraduate research projects in…
Throughput Maximization Using an SVM for Multi-Class Hypothesis-Based Spectrum Sensing in Cognitive Radio

Directory of Open Access Journals (Sweden)

Sana Ullah Jan

2018-03-01

Full Text Available A framework of spectrum sensing with a multi-class hypothesis is proposed to maximize the achievable throughput in cognitive radio networks. The energy range of a sensing signal under the hypothesis that the primary user is absent (in a conventional two-class hypothesis is further divided into quantized regions, whereas the hypothesis that the primary user is present is conserved. The non-radio frequency energy harvesting-equiped secondary user transmits, when the primary user is absent, with transmission power based on the hypothesis result (the energy level of the sensed signal and the residual energy in the battery: the lower the energy of the received signal, the higher the transmission power, and vice versa. Conversely, the lower is the residual energy in the node, the lower is the transmission power. This technique increases the throughput of a secondary link by providing a higher number of transmission events, compared to the conventional two-class hypothesis. Furthermore, transmission with low power for higher energy levels in the sensed signal reduces the probability of interference with primary users if, for instance, detection was missed. The familiar machine learning algorithm known as a support vector machine (SVM is used in a one-versus-rest approach to classify the input signal into predefined classes. The input signal to the SVM is composed of three statistical features extracted from the sensed signal and a number ranging from 0 to 100 representing the percentage of residual energy in the node’s battery. To increase the generalization of the classifier, k-fold cross-validation is utilized in the training phase. The experimental results show that an SVM with the given features performs satisfactorily for all kernels, but an SVM with a polynomial kernel outperforms linear and radial-basis function kernels in terms of accuracy. Furthermore, the proposed multi-class hypothesis achieves higher throughput compared to the
ALGORITHM OF PREPARATION OF THE TRAINING SAMPLE USING 3D-FACE MODELING

Directory of Open Access Journals (Sweden)

D. I. Samal

2016-01-01

Full Text Available The algorithm of preparation and sampling for training of the multiclass qualifier of support vector machines (SVM is provided. The described approach based on the modeling of possible changes of the face features of recognized person. Additional features like perspectives of shooting, conditions of lighting, tilt angles were introduced to get improved identification results. These synthetic generated changes have some impact on the classifier learning expanding the range of possible variations of the initial image. The classifier learned with such extended example is ready to recognize unknown objects better. The age, emotional looks, turns of the head, various conditions of lighting, noise, and also some combinations of the listed parameters are chosen as the key considered parameters for modeling. The third-party software ‘FaceGen’ allowing to model up to 150 parameters and available in a demoversion for free downloading is used for 3D-modeling.The SVM classifier was chosen to test the impact of the introduced modifications of training sample. The preparation and preliminary processing of images contains the following constituents like detection and localization of area of the person on the image, assessment of an angle of rotation and an inclination, extension of the range of brightness of pixels and an equalization of the histogram to smooth the brightness and contrast characteristics of the processed images, scaling of the localized and processed area of the person, creation of a vector of features of the scaled and processed image of the person by a Principal component analysis (algorithm NIPALS, training of the multiclass SVM-classifier.The provided algorithm of expansion of the training selection is oriented to be used in practice and allows to expand using 3D-models the processed range of 2D – photographs of persons that positively affects results of identification in system of face recognition. This approach allows to compensate
Approximations for Markovian multi-class queues with preemptive priorities

NARCIS (Netherlands)

van der Heijden, Matthijs C.; van Harten, Aart; Sleptchenko, Andrei

2004-01-01

We discuss the approximation of performance measures in multi-class M/M/k queues with preemptive priorities for large problem instances (many classes and servers) using class aggregation and server reduction. We compared our approximations to exact and simulation results and found that our approach
Discrimination of Breast Tumors in Ultrasonic Images by Classifier Ensemble Trained with AdaBoost

Science.gov (United States)

Takemura, Atsushi; Shimizu, Akinobu; Hamamoto, Kazuhiko

In this paper, we propose a novel method for acurate automated discrimination of breast tumors (carcinoma, fibroadenoma, and cyst). We defined 199 features related to diagnositic observations noticed when a doctor judges breast tumors, such as internal echo, shape, and boundary echo. These features included novel features based on a parameter of log-compressed K distribution, which reflect physical characteristics of ultrasonic B-mode imaging. Furthermore, we propose a discrimination method of breast tumors by using an ensemble classifier based on the multi-class AdaBoost algorithm with effective features selection. Verification by analyzing 200 carcinomas, 30 fibroadenomas and 30 cycts showed the usefulness of the newly defined features and the effectiveness of the discrimination by using an ensemble classifier trained by AdaBoost.
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

Science.gov (United States)

Theis, Fabian J.

2017-01-01

Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
Multi-view Multi-sparsity Kernel Reconstruction for Multi-class Image Classification

KAUST Repository

Zhu, Xiaofeng; Xie, Qing; Zhu, Yonghua; Liu, Xingyi; Zhang, Shichao

2015-01-01

This paper addresses the problem of multi-class image classification by proposing a novel multi-view multi-sparsity kernel reconstruction (MMKR for short) model. Given images (including test images and training images) representing with multiple
Online scheduling policies for multiclass call centers with impatient customers

NARCIS (Netherlands)

Jouini, O.; Pot, S.A.; Koole, G.M.; Dallery, Y.

2010-01-01

We consider a call center with two classes of impatient customers: premium and regular classes. Modeling our call center as a multiclass GI / GI / s + M queue, we focus on developing scheduling policies that satisfy a target ratio constraint on the abandonment probabilities of premium customers to

Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM

Science.gov (United States)

Ganapathy, S.; Yogesh, P.; Kannan, A.

2012-01-01

Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set. PMID:23056036
Multi-Class load balancing scheme for QoS and energy ...

African Journals Online (AJOL)

Multi-Class load balancing scheme for QoS and energy conservation in cloud computing. ... If you would like more information about how to print, save, and work with PDFs, Highwire Press provides a helpful Frequently Asked Questions about PDFs. Alternatively, you can download the PDF file directly to your computer, from ...
Single-Pol Synthetic Aperture Radar Terrain Classification using Multiclass Confidence for One-Class Classifiers

Energy Technology Data Exchange (ETDEWEB)

Koch, Mark William [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Steinbach, Ryan Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Moya, Mary M [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2015-10-01

Except in the most extreme conditions, Synthetic aperture radar (SAR) is a remote sensing technology that can operate day or night. A SAR can provide surveillance over a long time period by making multiple passes over a wide area. For object-based intelligence it is convenient to segment and classify the SAR images into objects that identify various terrains and man-made structures that we call “static features.” In this paper we introduce a novel SAR image product that captures how different regions decorrelate at different rates. Using superpixels and their first two moments we develop a series of one-class classification algorithms using a goodness-of-fit metric. P-value fusion is used to combine the results from different classes. We also show how to combine multiple one-class classifiers to get a confidence about a classification. This can be used by downstream algorithms such as a conditional random field to enforce spatial constraints.
Multi-Class Motor Imagery EEG Decoding for Brain-Computer Interfaces

Science.gov (United States)

Wang, Deng; Miao, Duoqian; Blohm, Gunnar

2012-01-01

Recent studies show that scalp electroencephalography (EEG) as a non-invasive interface has great potential for brain-computer interfaces (BCIs). However, one factor that has limited practical applications for EEG-based BCI so far is the difficulty to decode brain signals in a reliable and efficient way. This paper proposes a new robust processing framework for decoding of multi-class motor imagery (MI) that is based on five main processing steps. (i) Raw EEG segmentation without the need of visual artifact inspection. (ii) Considering that EEG recordings are often contaminated not just by electrooculography (EOG) but also other types of artifacts, we propose to first implement an automatic artifact correction method that combines regression analysis with independent component analysis for recovering the original source signals. (iii) The significant difference between frequency components based on event-related (de-) synchronization and sample entropy is then used to find non-contiguous discriminating rhythms. After spectral filtering using the discriminating rhythms, a channel selection algorithm is used to select only relevant channels. (iv) Feature vectors are extracted based on the inter-class diversity and time-varying dynamic characteristics of the signals. (v) Finally, a support vector machine is employed for four-class classification. We tested our proposed algorithm on experimental data that was obtained from dataset 2a of BCI competition IV (2008). The overall four-class kappa values (between 0.41 and 0.80) were comparable to other models but without requiring any artifact-contaminated trial removal. The performance showed that multi-class MI tasks can be reliably discriminated using artifact-contaminated EEG recordings from a few channels. This may be a promising avenue for online robust EEG-based BCI applications. PMID:23087607
Development of an Analytical Procedure for the Determination of Multiclass Compounds for Forensic Veterinary Toxicology.

Science.gov (United States)

Sell, Bartosz; Sniegocki, Tomasz; Zmudzki, Jan; Posyniak, Andrzej

2018-04-01

Reported here is a new analytical multiclass method based on QuEChERS technique, which has proven to be effective in diagnosing fatal poisoning cases in animals. This method has been developed for the determination of analytes in liver samples comprising rodenticides, carbamate and organophosphorus pesticides, coccidiostats and mycotoxins. The procedure entails addition of acetonitrile and sodium acetate to 2 g of homogenized liver sample. The mixture was shaken intensively and centrifuged for phase separation, which was followed by an organic phase transfer into a tube containing sorbents (PSA and C18) and magnesium sulfate, then it was centrifuged, the supernatant was filtered and analyzed by liquid chromatography tandem mass spectrometry. A validation of the procedure was performed. Repeatability variation coefficients forensic toxicology cases.
Comparison of two microextraction methods based on solidification of floating organic droplet for the determination of multiclass analytes in river water samples by liquid chromatography tandem mass spectrometry using Central Composite Design.

Science.gov (United States)

Asati, Ankita; Satyanarayana, G N V; Patel, Devendra K

2017-09-01

Two low density organic solvents based liquid-liquid microextraction methods, namely Vortex assisted liquid-liquid microextraction based on solidification of floating organic droplet (VALLME-SFO) and Dispersive liquid-liquid microextraction based on solidification of floating organic droplet(DLLME-SFO) have been compared for the determination of multiclass analytes (pesticides, plasticizers, pharmaceuticals and personal care products) in river water samples by using liquid chromatography tandem mass spectrometry (LC-MS/MS). The effect of various experimental parameters on the efficiency of the two methods and their optimum values were studied with the aid of Central Composite Design (CCD) and Response Surface Methodology(RSM). Under optimal conditions, VALLME-SFO was validated in terms of limit of detection, limit of quantification, dynamic linearity range, determination of coefficient, enrichment factor and extraction recovery for which the respective values were (0.011-0.219ngmL -1 ), (0.035-0.723ngmL -1 ), (0.050-0.500ngmL -1 ), (R 2 =0.992-0.999), (40-56), (80-106%). However, when the DLLME-SFO method was validated under optimal conditions, the range of values of limit of detection, limit of quantification, dynamic linearity range, determination of coefficient, enrichment factor and extraction recovery were (0.025-0.377ngmL -1 ), (0.083-1.256ngmL -1 ), (0.100-1.000ngmL -1 ), (R 2 =0.990-0.999), (35-49), (69-98%) respectively. Interday and intraday precisions were calculated as percent relative standard deviation (%RSD) and the values were ≤15% for VALLME-SFO and DLLME-SFO methods. Both methods were successfully applied for determining multiclass analytes in river water samples. Copyright © 2017 Elsevier B.V. All rights reserved.
Diagnosis of oral lichen planus from analysis of saliva samples using terahertz time-domain spectroscopy and chemometrics

Science.gov (United States)

Kistenev, Yury V.; Borisov, Alexey V.; Titarenko, Maria A.; Baydik, Olga D.; Shapovalov, Alexander V.

2018-04-01

The ability to diagnose oral lichen planus (OLP) based on saliva analysis using THz time-domain spectroscopy and chemometrics is discussed. The study involved 30 patients (2 male and 28 female) with OLP. This group consisted of two subgroups with the erosive form of OLP (n = 15) and with the reticular and papular forms of OLP (n = 15). The control group consisted of six healthy volunteers (one male and five females) without inflammation in the mucous membrane in the oral cavity and without periodontitis. Principal component analysis was used to reveal informative features in the experimental data. The one-versus-one multiclass classifier using support vector machine binary classifiers was used. The two-stage classification approach using several absorption spectra scans for an individual saliva sample provided 100% accuracy of differential classification between OLP subgroups and control group.
Classifier-guided sampling for discrete variable, discontinuous design space exploration: Convergence and computational performance

Energy Technology Data Exchange (ETDEWEB)

Backlund, Peter B. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Shahan, David W. [HRL Labs., LLC, Malibu, CA (United States); Seepersad, Carolyn Conner [Univ. of Texas, Austin, TX (United States)

2014-04-22

A classifier-guided sampling (CGS) method is introduced for solving engineering design optimization problems with discrete and/or continuous variables and continuous and/or discontinuous responses. The method merges concepts from metamodel-guided sampling and population-based optimization algorithms. The CGS method uses a Bayesian network classifier for predicting the performance of new designs based on a set of known observations or training points. Unlike most metamodeling techniques, however, the classifier assigns a categorical class label to a new design, rather than predicting the resulting response in continuous space, and thereby accommodates nondifferentiable and discontinuous functions of discrete or categorical variables. The CGS method uses these classifiers to guide a population-based sampling process towards combinations of discrete and/or continuous variable values with a high probability of yielding preferred performance. Accordingly, the CGS method is appropriate for discrete/discontinuous design problems that are ill-suited for conventional metamodeling techniques and too computationally expensive to be solved by population-based algorithms alone. In addition, the rates of convergence and computational properties of the CGS method are investigated when applied to a set of discrete variable optimization problems. Results show that the CGS method significantly improves the rate of convergence towards known global optima, on average, when compared to genetic algorithms.
Multiclass AdaBoost ELM and Its Application in LBP Based Face Recognition

Directory of Open Access Journals (Sweden)

Yunliang Jiang

2015-01-01

Full Text Available Extreme learning machine (ELM is a competitive machine learning technique, which is simple in theory and fast in implementation; it can identify faults quickly and precisely as compared with traditional identification techniques such as support vector machines (SVM. As verified by the simulation results, ELM tends to have better scalability and can achieve much better generalization performance and much faster learning speed compared with traditional SVM. In this paper, we introduce a multiclass AdaBoost based ELM ensemble method. In our approach, the ELM algorithm is selected as the basic ensemble predictor due to its rapid speed and good performance. Compared with the existing boosting ELM algorithm, our algorithm can be directly used in multiclass classification problem. We also carried out comparable experiments with face recognition datasets. The experimental results show that the proposed algorithm can not only make the predicting result more stable, but also achieve better generalization performance.
AN APPLICATION OF FUNCTIONAL MULTIVARIATE REGRESSION MODEL TO MULTICLASS CLASSIFICATION

OpenAIRE

Krzyśko, Mirosław; Smaga, Łukasz

2017-01-01

In this paper, the scale response functional multivariate regression model is considered. By using the basis functions representation of functional predictors and regression coefficients, this model is rewritten as a multivariate regression model. This representation of the functional multivariate regression model is used for multiclass classification for multivariate functional data. Computational experiments performed on real labelled data sets demonstrate the effectiveness of the proposed ...
Multi-class parkinsonian disorders classification with quantitative MR markers and graph-based features using support vector machines.

Science.gov (United States)

Morisi, Rita; Manners, David Neil; Gnecco, Giorgio; Lanconelli, Nico; Testa, Claudia; Evangelisti, Stefania; Talozzi, Lia; Gramegna, Laura Ludovica; Bianchini, Claudio; Calandra-Buonaura, Giovanna; Sambati, Luisa; Giannini, Giulia; Cortelli, Pietro; Tonon, Caterina; Lodi, Raffaele

2018-02-01

In this study we attempt to automatically classify individual patients with different parkinsonian disorders, making use of pattern recognition techniques to distinguish among several forms of parkinsonisms (multi-class classification), based on a set of binary classifiers that discriminate each disorder from all others. We combine diffusion tensor imaging, proton spectroscopy and morphometric-volumetric data to obtain MR quantitative markers, which are provided to support vector machines with the aim of recognizing the different parkinsonian disorders. Feature selection is used to find the most important features for classification. We also exploit a graph-based technique on the set of quantitative markers to extract additional features from the dataset, and increase classification accuracy. When graph-based features are not used, the MR markers that are most frequently automatically extracted by the feature selection procedure reflect alterations in brain regions that are also usually considered to discriminate parkinsonisms in routine clinical practice. Graph-derived features typically increase the diagnostic accuracy, and reduce the number of features required. The results obtained in the work demonstrate that support vector machines applied to multimodal brain MR imaging and using graph-based features represent a novel and highly accurate approach to discriminate parkinsonisms, and a useful tool to assist the diagnosis. Copyright © 2017 Elsevier Ltd. All rights reserved.
Multiclass Data Segmentation using Diffuse Interface Methods on Graphs

Science.gov (United States)

2014-01-01

the Claremont Graduate University and San Diego State University joint program, working under the supervision of Prof. Allon Percus and Dr. Arjuna...identifying some borders of the red cow as part of the black cow. Multiclass GL also has problems identifying parts of the grass. C. MNIST Data The MNIST data...from the Université Paris-Sud, Orsay in 1997. He was a member of the scientific staff at Los Alamos National Laboratory in the Division of Computer
Combining multiple decisions: applications to bioinformatics

International Nuclear Information System (INIS)

Yukinawa, N; Ishii, S; Takenouchi, T; Oba, S

2008-01-01

Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods
Network Intrusion Detection System (NIDS in Cloud Environment based on Hidden Naïve Bayes Multiclass Classifier

Directory of Open Access Journals (Sweden)

Hafza A. Mahmood

2018-04-01

Full Text Available Cloud Environment is next generation internet based computing system that supplies customiza-ble services to the end user to work or access to the various cloud applications. In order to provide security and decrease the damage of information system, network and computer system it is im-portant to provide intrusion detection system (IDS. Now Cloud environment are under threads from network intrusions, as one of most prevalent and offensive means Denial of Service (DoS attacks that cause dangerous impact on cloud computing systems. This paper propose Hidden naïve Bayes (HNB Classifier to handle DoS attacks which is a data mining (DM model used to relaxes the conditional independence assumption of Naïve Bayes classifier (NB, proposed sys-tem used HNB Classifier supported with discretization and feature selection where select the best feature enhance the performance of the system and reduce consuming time. To evaluate the per-formance of proposal system, KDD 99 CUP and NSL KDD Datasets has been used. The experi-mental results show that the HNB classifier improves the performance of NIDS in terms of accu-racy and detecting DoS attacks, where the accuracy of detect DoS is 100% in three test KDD cup 99 dataset by used only 12 feature that selected by use gain ratio while in NSL KDD Dataset the accuracy of detect DoS attack is 90 % in three Experimental NSL KDD dataset by select 10 fea-ture only.
Human Activity Recognition from Smart-Phone Sensor Data using a Multi-Class Ensemble Learning in Home Monitoring.

Science.gov (United States)

Ghose, Soumya; Mitra, Jhimli; Karunanithi, Mohan; Dowling, Jason

2015-01-01

Home monitoring of chronically ill or elderly patient can reduce frequent hospitalisations and hence provide improved quality of care at a reduced cost to the community, therefore reducing the burden on the healthcare system. Activity recognition of such patients is of high importance in such a design. In this work, a system for automatic human physical activity recognition from smart-phone inertial sensors data is proposed. An ensemble of decision trees framework is adopted to train and predict the multi-class human activity system. A comparison of our proposed method with a multi-class traditional support vector machine shows significant improvement in activity recognition accuracies.
Mass Spectrometry Parameters Optimization for the 46 Multiclass Pesticides Determination in Strawberries with Gas Chromatography Ion-Trap Tandem Mass Spectrometry

Science.gov (United States)

Fernandes, Virgínia C.; Vera, Jose L.; Domingues, Valentina F.; Silva, Luís M. S.; Mateus, Nuno; Delerue-Matos, Cristina

2012-12-01

Multiclass analysis method was optimized in order to analyze pesticides traces by gas chromatography with ion-trap and tandem mass spectrometry (GC-MS/MS). The influence of some analytical parameters on pesticide signal response was explored. Five ion trap mass spectrometry (IT-MS) operating parameters, including isolation time (IT), excitation voltage (EV), excitation time (ET), maximum excitation energy or " q" value (q), and isolation mass window (IMW) were numerically tested in order to maximize the instrument analytical signal response. For this, multiple linear regression was used in data analysis to evaluate the influence of the five parameters on the analytical response in the ion trap mass spectrometer and to predict its response. The assessment of the five parameters based on the regression equations substantially increased the sensitivity of IT-MS/MS in the MS/MS mode. The results obtained show that for most of the pesticides, these parameters have a strong influence on both signal response and detection limit. Using the optimized method, a multiclass pesticide analysis was performed for 46 pesticides in a strawberry matrix. Levels higher than the limit established for strawberries by the European Union were found in some samples.
Eco-friendly LC-MS/MS method for analysis of multi-class micropollutants in tap, fountain, and well water from northern Portugal.

Science.gov (United States)

Barbosa, Marta O; Ribeiro, Ana R; Pereira, Manuel F R; Silva, Adrián M T

2016-11-01

Organic micropollutants present in drinking water (DW) may cause adverse effects for public health, and so reliable analytical methods are required to detect these pollutants at trace levels in DW. This work describes the first green analytical methodology for multi-class determination of 21 pollutants in DW: seven pesticides, an industrial compound, 12 pharmaceuticals, and a metabolite (some included in Directive 2013/39/EU or Decision 2015/495/EU). A solid-phase extraction procedure followed by ultra-high-performance liquid chromatography coupled to tandem mass spectrometry (offline SPE-UHPLC-MS/MS) method was optimized using eco-friendly solvents, achieving detection limits below 0.20 ng L -1 . The validated analytical method was successfully applied to DW samples from different sources (tap, fountain, and well waters) from different locations in the north of Portugal, as well as before and after bench-scale UV and ozonation experiments in spiked tap water samples. Thirteen compounds were detected, many of them not regulated yet, in the following order of frequency: diclofenac > norfluoxetine > atrazine > simazine > warfarin > metoprolol > alachlor > chlorfenvinphos > trimethoprim > clarithromycin ≈ carbamazepine ≈ PFOS > citalopram. Hazard quotients were also estimated for the quantified substances and suggested no adverse effects to humans. Graphical Abstract Occurrence and removal of multi-class micropollutants in drinking water, analyzed by an eco-friendly LC-MS/MS method.
Simultaneous determination of 41 multiclass organic pollutants in environmental waters by means of polyethersulfone microextraction followed by liquid chromatography-tandem mass spectrometry.

Science.gov (United States)

Mijangos, Leire; Ziarrusta, Haizea; Olivares, Maitane; Zuloaga, Olatz; Möder, Monika; Etxebarria, Nestor; Prieto, Ailette

2018-01-01

A new procedure using polyethersulfone (PES) microextraction followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis was developed in this work for the simultaneous determination of 41 multiclass priority and emerging organic pollutants including herbicides, hormones, personal care products, and pharmaceuticals, among others, in seawater, wastewater treatment plant (WWTP) effluents, and estuary samples. The optimization of the analysis included two different chromatographic columns and different variables (polarity, fragmentor voltage, collision energy, and collision cell accelerator) of the mass spectrometer. In the case of PES extraction, ion strength of the water, pH, addition of EDTA, and the amount of the polymeric material were thoroughly investigated. The developed procedure was compared with a previously validated one based on a standard solid-phase extraction (SPE). In contrast to the SPE protocol, the PES method allowed a cost-efficient extraction of complex aqueous samples with lower matrix effect from 120 mL of water sample. Satisfactory and comparable apparent recovery values (80-119 and 70-131%) and method quantification limits (MQLs, 0.4-26 and 0.2-23 ng/L) were obtained for PES and SPE procedures, respectively, regardless of the matrix. Repeatability values lower than 27% were obtained. Finally, the developed methods were applied to the analysis of real samples from the Basque Country and irbesartan, valsartan, acesulfame, and sucralose were the analytes most often detected at the highest concentrations (51-1096 ng/L). Graphical abstract Forty-one multiclass pollutant determination in environmental waters by means of PES/SPE-LC-MS/MS.
On first-come first-served versus random service discipline in multiclass closed queueing networks

NARCIS (Netherlands)

Buitenhek, R.; van Houtum, Geert-Jan; van Ommeren, Jan C.W.

1997-01-01

We consider multiclass closed queueing networks. For these networks, a lot of work has been devoted to characterizing and weakening the conditions under which a product-form solution is obtained for the steady-state distribution. From this work, it is known that, under certain conditions, all
Multiclass determination and confirmation of antibiotic residues in honey using LC-MS/MS.

Science.gov (United States)

Lopez, Mayda I; Pettis, Jeffery S; Smith, I Barton; Chu, Pak-Sin

2008-03-12

A multiclass method has been developed for the determination and confirmation in honey of tetracyclines (chlortetracycline, doxycycline, oxytetracycline, and tetracycline), fluoroquinolones (ciprofloxacin, danofloxacin, difloxacin, enrofloxacin, and sarafloxacin), macrolides (tylosin), lincosamides (lincomycin), aminoglycosides (streptomycin), sulfonamides (sulfathiazole), phenicols (chloramphenicol), and fumagillin residues using liquid chromatography tandem mass spectrometry (LC-MS/MS). Erythromycin (a macrolide) and monensin (an ionophore) can be detected and confirmed but not quantitated. Honey samples (approximately 2 g) are dissolved in 10 mL of water and centrifuged. An aliquot of the supernatant is used to determine streptomycin. The remaining supernatant is filtered through a fine-mesh nylon fabric and cleaned up by solid phase extraction. After solvent evaporation and sample reconstitution, 15 antibiotics are assayed by LC-MS/MS using electrospray ionization (ESI) in positive ion mode. Afterward, chloramphenicol is assayed using ESI in negative ion mode. The method has been validated at the low part per billion levels for most of the drugs with accuracies between 65 and 104% and coefficients of variation less than 17%. The evaluation of matrix effects caused by honey of different floral origin is presented.

Using multiclass classification to automate the identification of patient safety incident reports by type and severity.

Science.gov (United States)

Wang, Ying; Coiera, Enrico; Runciman, William; Magrabi, Farah

2017-06-12

Approximately 10% of admissions to acute-care hospitals are associated with an adverse event. Analysis of incident reports helps to understand how and why incidents occur and can inform policy and practice for safer care. Unfortunately our capacity to monitor and respond to incident reports in a timely manner is limited by the sheer volumes of data collected. In this study, we aim to evaluate the feasibility of using multiclass classification to automate the identification of patient safety incidents in hospitals. Text based classifiers were applied to identify 10 incident types and 4 severity levels. Using the one-versus-one (OvsO) and one-versus-all (OvsA) ensemble strategies, we evaluated regularized logistic regression, linear support vector machine (SVM) and SVM with a radial-basis function (RBF) kernel. Classifiers were trained and tested with "balanced" datasets (n_ Type = 2860, n_ SeverityLevel = 1160) from a state-wide incident reporting system. Testing was also undertaken with imbalanced "stratified" datasets (n_ Type = 6000, n_ SeverityLevel =5950) from the state-wide system and an independent hospital reporting system. Classifier performance was evaluated using a confusion matrix, as well as F-score, precision and recall. The most effective combination was a OvsO ensemble of binary SVM RBF classifiers with binary count feature extraction. For incident type, classifiers performed well on balanced and stratified datasets (F-score: 78.3, 73.9%), but were worse on independent datasets (68.5%). Reports about falls, medications, pressure injury, aggression and blood products were identified with high recall and precision. "Documentation" was the hardest type to identify. For severity level, F-score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, high recall was achieved for SAC1 (82.8-84%) but precision was poor (6.8-11.2%). High risk incidents (SAC2) were confused
Deformation of log-likelihood loss function for multiclass boosting.

Science.gov (United States)

Kanamori, Takafumi

2010-09-01

The purpose of this paper is to study loss functions in multiclass classification. In classification problems, the decision function is estimated by minimizing an empirical loss function, and then, the output label is predicted by using the estimated decision function. We propose a class of loss functions which is obtained by a deformation of the log-likelihood loss function. There are four main reasons why we focus on the deformed log-likelihood loss function: (1) this is a class of loss functions which has not been deeply investigated so far, (2) in terms of computation, a boosting algorithm with a pseudo-loss is available to minimize the proposed loss function, (3) the proposed loss functions provide a clear correspondence between the decision functions and conditional probabilities of output labels, (4) the proposed loss functions satisfy the statistical consistency of the classification error rate which is a desirable property in classification problems. Based on (3), we show that the deformed log-likelihood loss provides a model of mislabeling which is useful as a statistical model of medical diagnostics. We also propose a robust loss function against outliers in multiclass classification based on our approach. The robust loss function is a natural extension of the existing robust loss function for binary classification. A model of mislabeling and a robust loss function are useful to cope with noisy data. Some numerical studies are presented to show the robustness of the proposed loss function. A mathematical characterization of the deformed log-likelihood loss function is also presented. Copyright 2010 Elsevier Ltd. All rights reserved.
Supervised remote sensing image classification: An example of a ...

African Journals Online (AJOL)

These conventional multi-class classifiers/algorithms are usually written in programming languages such as C, C++, and python. The objective of this research is to experiment the use of a binary classifier/algorithm for multi-class remote sensing task, implemented in MATLAB. MATLAB is a programming language just like C ...
Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery

Science.gov (United States)

Zhong, Yanfei; Han, Xiaobing; Zhang, Liangpei

2018-04-01

Multi-class geospatial object detection from high spatial resolution (HSR) remote sensing imagery is attracting increasing attention in a wide range of object-related civil and engineering applications. However, the distribution of objects in HSR remote sensing imagery is location-variable and complicated, and how to accurately detect the objects in HSR remote sensing imagery is a critical problem. Due to the powerful feature extraction and representation capability of deep learning, the deep learning based region proposal generation and object detection integrated framework has greatly promoted the performance of multi-class geospatial object detection for HSR remote sensing imagery. However, due to the translation caused by the convolution operation in the convolutional neural network (CNN), although the performance of the classification stage is seldom influenced, the localization accuracies of the predicted bounding boxes in the detection stage are easily influenced. The dilemma between translation-invariance in the classification stage and translation-variance in the object detection stage has not been addressed for HSR remote sensing imagery, and causes position accuracy problems for multi-class geospatial object detection with region proposal generation and object detection. In order to further improve the performance of the region proposal generation and object detection integrated framework for HSR remote sensing imagery object detection, a position-sensitive balancing (PSB) framework is proposed in this paper for multi-class geospatial object detection from HSR remote sensing imagery. The proposed PSB framework takes full advantage of the fully convolutional network (FCN), on the basis of a residual network, and adopts the PSB framework to solve the dilemma between translation-invariance in the classification stage and translation-variance in the object detection stage. In addition, a pre-training mechanism is utilized to accelerate the training procedure
An exact solution for the state probabilities of the multi-class, multi-server queue with preemptive priorities

NARCIS (Netherlands)

Sleptchenko, Andrei; van Harten, Aart; van der Heijden, Matthijs C.

2005-01-01

We consider a multi-class, multi-server queueing system with preemptive priorities. We distinguish two groups of priority classes that consist of multiple customer types, each having their own arrival and service rate. We assume Poisson arrival processes and exponentially distributed service times.
Efficacy of hidden markov model over support vector machine on multiclass classification of healthy and cancerous cervical tissues

Science.gov (United States)

Mukhopadhyay, Sabyasachi; Kurmi, Indrajit; Pratiher, Sawon; Mukherjee, Sukanya; Barman, Ritwik; Ghosh, Nirmalya; Panigrahi, Prasanta K.

2018-02-01

In this paper, a comparative study between SVM and HMM has been carried out for multiclass classification of cervical healthy and cancerous tissues. In our study, the HMM methodology is more promising to produce higher accuracy in classification.
Combined principal component preprocessing and n-tuple neural networks for improved classification

DEFF Research Database (Denmark)

Høskuldsson, Agnar; Linneberg, Christian

2000-01-01

We present a combined principal component analysis/neural network scheme for classification. The data used to illustrate the method consist of spectral fluorescence recordings from seven different production facilities, and the task is to relate an unknown sample to one of these seven factories....... The data are first preprocessed by performing an individual principal component analysis on each of the seven groups of data. The components found are then used for classifying the data, but instead of making a single multiclass classifier, we follow the ideas of turning a multiclass problem into a number...... of two-class problems. For each possible pair of classes we further apply a transformation to the calculated principal components in order to increase the separation between the classes. Finally we apply the so-called n-tuple neural network to the transformed data in order to give the classification...
Throughput performance analysis of multirate, multiclass S-ALOHA OFFH-CDMA packet networks

DEFF Research Database (Denmark)

Raddo, Thiago R.; Sanches, Anderson L.; Borges, Ben Hur V

2015-01-01

In this paper, we propose a new throughput expression for multirate, multiclass slotted-ALOHA optical fast frequency hopping code-division multiple-access (OFFH-CDMA) packet networks considering a Poisson distribution for packet composite arrivals. We analyze the packet throughput performance...... of a three-class OFFH-CDMA network, where multirate transmissions are achieved via manipulation of the user's code parameters. It is shown that users transmitting at low rates interfere considerably in the performance of high rate users. Finally, we perform a validation procedure to demonstrate...
Ingenious Snake: An Adaptive Multi-Class Contours Extraction

Science.gov (United States)

Li, Baolin; Zhou, Shoujun

2018-04-01

Active contour model (ACM) plays an important role in computer vision and medical image application. The traditional ACMs were used to extract single-class of object contours. While, simultaneous extraction of multi-class of interesting contours (i.e., various contours with closed- or open-ended) have not been solved so far. Therefore, a novel ACM model named “Ingenious Snake” is proposed to adaptively extract these interesting contours. In the first place, the ridge-points are extracted based on the local phase measurement of gradient vector flow field; the consequential ridgelines initialization are automated with high speed. Secondly, the contours’ deformation and evolvement are implemented with the ingenious snake. In the experiments, the result from initialization, deformation and evolvement are compared with the existing methods. The quantitative evaluation of the structure extraction is satisfying with respect of effectiveness and accuracy.
RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics.

Science.gov (United States)

Best, Myron G; Sol, Nik; Kooi, Irsan; Tannous, Jihane; Westerman, Bart A; Rustenburg, François; Schellen, Pepijn; Verschueren, Heleen; Post, Edward; Koster, Jan; Ylstra, Bauke; Ameziane, Najim; Dorsman, Josephine; Smit, Egbert F; Verheul, Henk M; Noske, David P; Reijneveld, Jaap C; Nilsson, R Jonas A; Tannous, Bakhos A; Wesseling, Pieter; Wurdinger, Thomas

2015-11-09

Tumor-educated blood platelets (TEPs) are implicated as central players in the systemic and local responses to tumor growth, thereby altering their RNA profile. We determined the diagnostic potential of TEPs by mRNA sequencing of 283 platelet samples. We distinguished 228 patients with localized and metastasized tumors from 55 healthy individuals with 96% accuracy. Across six different tumor types, the location of the primary tumor was correctly identified with 71% accuracy. Also, MET or HER2-positive, and mutant KRAS, EGFR, or PIK3CA tumors were accurately distinguished using surrogate TEP mRNA profiles. Our results indicate that blood platelets provide a valuable platform for pan-cancer, multiclass cancer, and companion diagnostics, possibly enabling clinical advances in blood-based "liquid biopsies". Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Enhanced HMAX model with feedforward feature learning for multiclass categorization

Directory of Open Access Journals (Sweden)

Yinlin eLi

2015-10-01

Full Text Available In recent years, the interdisciplinary research between neuroscience and computer vision has promoted the development in both fields. Many biologically inspired visual models are proposed, and among them, the Hierarchical Max-pooling model (HMAX is a feedforward model mimicking the structures and functions of V1 to posterior inferotemporal (PIT layer of the primate visual cortex, which could generate a series of position- and scale- invariant features. However, it could be improved with attention modulation and memory processing, which are two important properties of the primate visual cortex. Thus, in this paper, based on recent biological research on the primate visual cortex, we still mimic the first 100-150 milliseconds of visual cognition to enhance the HMAX model, which mainly focuses on the unsupervised feedforward feature learning process. The main modifications are as follows: 1 To mimic the attention modulation mechanism of V1 layer, a bottom-up saliency map is computed in the S1 layer of the HMAX model, which can support the initial feature extraction for memory processing; 2 To mimic the learning, clustering and short-term memory to long-term memory conversion abilities of V2 and IT, an unsupervised iterative clustering method is used to learn clusters with multiscale middle level patches, which are taken as long-term memory; 3 Inspired by the multiple feature encoding mode of the primate visual cortex, information including color, orientation, and spatial position are encoded in different layers of the HMAX model progressively. By adding a softmax layer at the top of the model, multiclass categorization experiments can be conducted, and the results on Caltech101 show that the enhanced model with a smaller memory size exhibits higher accuracy than the original HMAX model, and could also achieve better accuracy than other unsupervised feature learning methods in multiclass categorization task.
Enhanced HMAX model with feedforward feature learning for multiclass categorization.

Science.gov (United States)

Li, Yinlin; Wu, Wei; Zhang, Bo; Li, Fengfu

2015-01-01

In recent years, the interdisciplinary research between neuroscience and computer vision has promoted the development in both fields. Many biologically inspired visual models are proposed, and among them, the Hierarchical Max-pooling model (HMAX) is a feedforward model mimicking the structures and functions of V1 to posterior inferotemporal (PIT) layer of the primate visual cortex, which could generate a series of position- and scale- invariant features. However, it could be improved with attention modulation and memory processing, which are two important properties of the primate visual cortex. Thus, in this paper, based on recent biological research on the primate visual cortex, we still mimic the first 100-150 ms of visual cognition to enhance the HMAX model, which mainly focuses on the unsupervised feedforward feature learning process. The main modifications are as follows: (1) To mimic the attention modulation mechanism of V1 layer, a bottom-up saliency map is computed in the S1 layer of the HMAX model, which can support the initial feature extraction for memory processing; (2) To mimic the learning, clustering and short-term memory to long-term memory conversion abilities of V2 and IT, an unsupervised iterative clustering method is used to learn clusters with multiscale middle level patches, which are taken as long-term memory; (3) Inspired by the multiple feature encoding mode of the primate visual cortex, information including color, orientation, and spatial position are encoded in different layers of the HMAX model progressively. By adding a softmax layer at the top of the model, multiclass categorization experiments can be conducted, and the results on Caltech101 show that the enhanced model with a smaller memory size exhibits higher accuracy than the original HMAX model, and could also achieve better accuracy than other unsupervised feature learning methods in multiclass categorization task.
Convolutional Neural Networks with Batch Normalization for Classifying Hi-hat, Snare, and Bass Percussion Sound Samples

DEFF Research Database (Denmark)

Gajhede, Nicolai; Beck, Oliver; Purwins, Hendrik

2016-01-01

After having revolutionized image and speech processing, convolu- tional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples...
Direct immersion single drop micro-extraction method for multi-class pesticides analysis in mango using GC-MS.

Science.gov (United States)

Pano-Farias, Norma S; Ceballos-Magaña, Silvia G; Muñiz-Valencia, Roberto; Jurado, Jose M; Alcázar, Ángela; Aguayo-Villarreal, Ismael A

2017-12-15

Due the negative effects of pesticides on environment and human health, more efficient and environmentally friendly methods are needed. In this sense, a simple, fast, free from memory effects and economical direct-immersion single drop micro-extraction (SDME) method and GC-MS for multi-class pesticides determination in mango samples was developed. Sample pre-treatment using ultrasound-assisted solvent extraction and factors affecting the SDME procedure (extractant solvent, drop volume, stirring rate, ionic strength, time, pH and temperature) were optimized using factorial experimental design. This method presented high sensitive (LOD: 0.14-169.20μgkg -1 ), acceptable precision (RSD: 0.7-19.1%), satisfactory recovery (69-119%) and high enrichment factors (20-722). Several obtained LOQs are below the MRLs established by the European Commission; therefore, the method could be applied for pesticides determination in routing analysis and custom laboratories. Moreover, this method has shown to be suitable for determination of some of the studied pesticides in lime, melon, papaya, banana, tomato, and lettuce. Copyright © 2017 Elsevier Ltd. All rights reserved.
Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series.

Science.gov (United States)

Gálvez, Juan Manuel; Castillo, Daniel; Herrera, Luis Javier; San Román, Belén; Valenzuela, Olga; Ortuño, Francisco Manuel; Rojas, Ignacio

2018-01-01

Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number of states or pathologies usually addressed. Moreover, the influence of potential deviations on the gene expression quantification is usually disregarded. In spite of the continuous changes in omic sciences, reflected for instance in the emergence of new Next-Generation Sequencing-related technologies, the existing availability of a vast amount of gene expression microarray datasets should be properly exploited. Therefore, this work proposes a novel methodological approach involving the integration of several heterogeneous skin cancer series, and a later multiclass classifier design. This approach is thus a way to provide the clinicians with an intelligent diagnosis support tool based on the use of a robust set of selected biomarkers, which simultaneously distinguishes among different cancer-related skin states. To achieve this, a multi-platform combination of microarray datasets from Affymetrix and Illumina manufacturers was carried out. This integration is expected to strengthen the statistical robustness of the study as well as the finding of highly-reliable skin cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish among 7 involved skin states. These genes were obtained from the assessment of a number of potential batch effects on the gene expression data. The biological interpretation of these genes was inspected in the specific literature to understand their underlying information in relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis, a cross-validation Support Vector Machines (SVM
Reliable Fault Classification of Induction Motors Using Texture Feature Extraction and a Multiclass Support Vector Machine

Directory of Open Access Journals (Sweden)

Jia Uddin

2014-01-01

Full Text Available This paper proposes a method for the reliable fault detection and classification of induction motors using two-dimensional (2D texture features and a multiclass support vector machine (MCSVM. The proposed model first converts time-domain vibration signals to 2D gray images, resulting in texture patterns (or repetitive patterns, and extracts these texture features by generating the dominant neighborhood structure (DNS map. The principal component analysis (PCA is then used for the purpose of dimensionality reduction of the high-dimensional feature vector including the extracted texture features due to the fact that the high-dimensional feature vector can degrade classification performance, and this paper configures an effective feature vector including discriminative fault features for diagnosis. Finally, the proposed approach utilizes the one-against-all (OAA multiclass support vector machines (MCSVMs to identify induction motor failures. In this study, the Gaussian radial basis function kernel cooperates with OAA MCSVMs to deal with nonlinear fault features. Experimental results demonstrate that the proposed approach outperforms three state-of-the-art fault diagnosis algorithms in terms of fault classification accuracy, yielding an average classification accuracy of 100% even in noisy environments.
Predicting Assignment Submissions in a Multiclass Classification Problem

Directory of Open Access Journals (Sweden)

Bogdan Drăgulescu

2015-08-01

Full Text Available Predicting student failure is an important task that can empower educators to counteract the factors that affect student performance. In this paper, a part of the bigger problem of predicting student failure is addressed: predicting the students that do not complete their assignment tasks. For solving this problem, real data collected by our university’s educational platform was used. Because the problem consisted of predicting one of three possible classes (multi-class classification, the appropriate algorithms and methods were selected. Several experiments were carried out to find the best approach for this prediction problem and the used data set. An approach of time segmentation is proposed in order to facilitate the prediction from early on. Methods that address the problems of high dimensionality and imbalanced data were also evaluated. The outcome of each approach is shown and compared in order to select the best performing classification algorithm for the problem at hand.
EEG Classification for Hybrid Brain-Computer Interface Using a Tensor Based Multiclass Multimodal Analysis Scheme.

Science.gov (United States)

Ji, Hongfei; Li, Jie; Lu, Rongrong; Gu, Rong; Cao, Lei; Gong, Xiaoliang

2016-01-01

Electroencephalogram- (EEG-) based brain-computer interface (BCI) systems usually utilize one type of changes in the dynamics of brain oscillations for control, such as event-related desynchronization/synchronization (ERD/ERS), steady state visual evoked potential (SSVEP), and P300 evoked potentials. There is a recent trend to detect more than one of these signals in one system to create a hybrid BCI. However, in this case, EEG data were always divided into groups and analyzed by the separate processing procedures. As a result, the interactive effects were ignored when different types of BCI tasks were executed simultaneously. In this work, we propose an improved tensor based multiclass multimodal scheme especially for hybrid BCI, in which EEG signals are denoted as multiway tensors, a nonredundant rank-one tensor decomposition model is proposed to obtain nonredundant tensor components, a weighted fisher criterion is designed to select multimodal discriminative patterns without ignoring the interactive effects, and support vector machine (SVM) is extended to multiclass classification. Experiment results suggest that the proposed scheme can not only identify the different changes in the dynamics of brain oscillations induced by different types of tasks but also capture the interactive effects of simultaneous tasks properly. Therefore, it has great potential use for hybrid BCI.
Evaluation of Multiclass Model Observers in PET LROC Studies

Science.gov (United States)

Gifford, H. C.; Kinahan, P. E.; Lartizien, C.; King, M. A.

2007-02-01

A localization ROC (LROC) study was conducted to evaluate nonprewhitening matched-filter (NPW) and channelized NPW (CNPW) versions of a multiclass model observer as predictors of human tumor-detection performance with PET images. Target localization is explicitly performed by these model observers. Tumors were placed in the liver, lungs, and background soft tissue of a mathematical phantom, and the data simulation modeled a full-3D acquisition mode. Reconstructions were performed with the FORE+AWOSEM algorithm. The LROC study measured observer performance with 2D images consisting of either coronal, sagittal, or transverse views of the same set of cases. Versions of the CNPW observer based on two previously published difference-of-Gaussian channel models demonstrated good quantitative agreement with human observers. One interpretation of these results treats the CNPW observer as a channelized Hotelling observer with implicit internal noise
Hyperspectral image classification using Support Vector Machine

International Nuclear Information System (INIS)

Moughal, T A

2013-01-01

Classification of land cover hyperspectral images is a very challenging task due to the unfavourable ratio between the number of spectral bands and the number of training samples. The focus in many applications is to investigate an effective classifier in terms of accuracy. The conventional multiclass classifiers have the ability to map the class of interest but the considerable efforts and large training sets are required to fully describe the classes spectrally. Support Vector Machine (SVM) is suggested in this paper to deal with the multiclass problem of hyperspectral imagery. The attraction to this method is that it locates the optimal hyper plane between the class of interest and the rest of the classes to separate them in a new high-dimensional feature space by taking into account only the training samples that lie on the edge of the class distributions known as support vectors and the use of the kernel functions made the classifier more flexible by making it robust against the outliers. A comparative study has undertaken to find an effective classifier by comparing Support Vector Machine (SVM) to the other two well known classifiers i.e. Maximum likelihood (ML) and Spectral Angle Mapper (SAM). At first, the Minimum Noise Fraction (MNF) was applied to extract the best possible features form the hyperspectral imagery and then the resulting subset of the features was applied to the classifiers. Experimental results illustrate that the integration of MNF and SVM technique significantly reduced the classification complexity and improves the classification accuracy.

Facial Expression Recognition using Multiclass Ensemble Least-Square Support Vector Machine

Science.gov (United States)

Lawi, Armin; Sya'Rani Machrizzandi, M.

2018-03-01

Facial expression is one of behavior characteristics of human-being. The use of biometrics technology system with facial expression characteristics makes it possible to recognize a person’s mood or emotion. The basic components of facial expression analysis system are face detection, face image extraction, facial classification and facial expressions recognition. This paper uses Principal Component Analysis (PCA) algorithm to extract facial features with expression parameters, i.e., happy, sad, neutral, angry, fear, and disgusted. Then Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM) is used for the classification process of facial expression. The result of MELS-SVM model obtained from our 185 different expression images of 10 persons showed high accuracy level of 99.998% using RBF kernel.
Multiclass classification of obstructive sleep apnea/hypopnea based on a convolutional neural network from a single-lead electrocardiogram.

Science.gov (United States)

Urtnasan, Erdenebayar; Park, Jong-Uk; Lee, Kyoung-Joung

2018-05-24

In this paper, we propose a convolutional neural network (CNN)-based deep learning architecture for multiclass classification of obstructive sleep apnea and hypopnea (OSAH) using single-lead electrocardiogram (ECG) recordings. OSAH is the most common sleep-related breathing disorder. Many subjects who suffer from OSAH remain undiagnosed; thus, early detection of OSAH is important. In this study, automatic classification of three classes-normal, hypopnea, and apnea-based on a CNN is performed. An optimal six-layer CNN model is trained on a training dataset (45,096 events) and evaluated on a test dataset (11,274 events). The training set (69 subjects) and test set (17 subjects) were collected from 86 subjects with length of approximately 6 h and segmented into 10 s durations. The proposed CNN model reaches a mean -score of 93.0 for the training dataset and 87.0 for the test dataset. Thus, proposed deep learning architecture achieved a high performance for multiclass classification of OSAH using single-lead ECG recordings. The proposed method can be employed in screening of patients suspected of having OSAH. © 2018 Institute of Physics and Engineering in Medicine.
DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier

KAUST Repository

Kulmanov, Maxat

2017-09-27

Motivation A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations.
HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source.

Science.gov (United States)

Wan, Shixiang; Duan, Yucong; Zou, Quan

2017-09-01

Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. First, most of the existing techniques are designed to deal with multi-class rather than multi-label classification, which ignores connections between multiple labels. In reality, multiple locations of particular proteins imply that there are vital and unique biological significances that deserve special focus and cannot be ignored. Second, techniques for handling imbalanced data in multi-label classification problems are necessary, but never employed. For solving these two issues, we have developed an ensemble multi-label classifier called HPSLPred, which can be applied for multi-label classification with an imbalanced protein source. For convenience, a user-friendly webserver has been established at http://server.malab.cn/HPSLPred. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
LCC: Light Curves Classifier

Science.gov (United States)

Vo, Martin

2017-08-01

Light Curves Classifier uses data mining and machine learning to obtain and classify desired objects. This task can be accomplished by attributes of light curves or any time series, including shapes, histograms, or variograms, or by other available information about the inspected objects, such as color indices, temperatures, and abundances. After specifying features which describe the objects to be searched, the software trains on a given training sample, and can then be used for unsupervised clustering for visualizing the natural separation of the sample. The package can be also used for automatic tuning parameters of used methods (for example, number of hidden neurons or binning ratio). Trained classifiers can be used for filtering outputs from astronomical databases or data stored locally. The Light Curve Classifier can also be used for simple downloading of light curves and all available information of queried stars. It natively can connect to OgleII, OgleIII, ASAS, CoRoT, Kepler, Catalina and MACHO, and new connectors or descriptors can be implemented. In addition to direct usage of the package and command line UI, the program can be used through a web interface. Users can create jobs for ”training” methods on given objects, querying databases and filtering outputs by trained filters. Preimplemented descriptors, classifier and connectors can be picked by simple clicks and their parameters can be tuned by giving ranges of these values. All combinations are then calculated and the best one is used for creating the filter. Natural separation of the data can be visualized by unsupervised clustering.
High dimensional classifiers in the imbalanced case

DEFF Research Database (Denmark)

Bak, Britta Anker; Jensen, Jens Ledet

We consider the binary classification problem in the imbalanced case where the number of samples from the two groups differ. The classification problem is considered in the high dimensional case where the number of variables is much larger than the number of samples, and where the imbalance leads...... to a bias in the classification. A theoretical analysis of the independence classifier reveals the origin of the bias and based on this we suggest two new classifiers that can handle any imbalance ratio. The analytical results are supplemented by a simulation study, where the suggested classifiers in some...
Retrieving clinically relevant diabetic retinopathy images using a multi-class multiple-instance framework

Science.gov (United States)

Chandakkar, Parag S.; Venkatesan, Ragav; Li, Baoxin

2013-02-01

Diabetic retinopathy (DR) is a vision-threatening complication from diabetes mellitus, a medical condition that is rising globally. Unfortunately, many patients are unaware of this complication because of absence of symptoms. Regular screening of DR is necessary to detect the condition for timely treatment. Content-based image retrieval, using archived and diagnosed fundus (retinal) camera DR images can improve screening efficiency of DR. This content-based image retrieval study focuses on two DR clinical findings, microaneurysm and neovascularization, which are clinical signs of non-proliferative and proliferative diabetic retinopathy. The authors propose a multi-class multiple-instance image retrieval framework which deploys a modified color correlogram and statistics of steerable Gaussian Filter responses, for retrieving clinically relevant images from a database of DR fundus image database.
DisoMCS: Accurately Predicting Protein Intrinsically Disordered Regions Using a Multi-Class Conservative Score Approach.

Directory of Open Access Journals (Sweden)

Zhiheng Wang

Full Text Available The precise prediction of protein intrinsically disordered regions, which play a crucial role in biological procedures, is a necessary prerequisite to further the understanding of the principles and mechanisms of protein function. Here, we propose a novel predictor, DisoMCS, which is a more accurate predictor of protein intrinsically disordered regions. The DisoMCS bases on an original multi-class conservative score (MCS obtained by sequence-order/disorder alignment. Initially, near-disorder regions are defined on fragments located at both the terminus of an ordered region connecting a disordered region. Then the multi-class conservative score is generated by sequence alignment against a known structure database and represented as order, near-disorder and disorder conservative scores. The MCS of each amino acid has three elements: order, near-disorder and disorder profiles. Finally, the MCS is exploited as features to identify disordered regions in sequences. DisoMCS utilizes a non-redundant data set as the training set, MCS and predicted secondary structure as features, and a conditional random field as the classification algorithm. In predicted near-disorder regions a residue is determined as an order or a disorder according to the optimized decision threshold. DisoMCS was evaluated by cross-validation, large-scale prediction, independent tests and CASP (Critical Assessment of Techniques for Protein Structure Prediction tests. All results confirmed that DisoMCS was very competitive in terms of accuracy of prediction when compared with well-established publicly available disordered region predictors. It also indicated our approach was more accurate when a query has higher homologous with the knowledge database.The DisoMCS is available at http://cal.tongji.edu.cn/disorder/.
Classification of Automated Search Traffic

Science.gov (United States)

Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.

As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.
Consensus of sample-balanced classifiers for identifying ligand-binding residue by co-evolutionary physicochemical characteristics of amino acids

KAUST Repository

Chen, Peng

2013-01-01

Protein-ligand binding is an important mechanism for some proteins to perform their functions, and those binding sites are the residues of proteins that physically bind to ligands. So far, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. Due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets, for each of which a random forest (RF)-based classifier was trained. The ensemble of these RF classifiers formed a sequence-based protein-ligand binding site predictor. Experimental results on CASP9 targets demonstrated that our method compared favorably with the state-of-the-art. © Springer-Verlag Berlin Heidelberg 2013.
Three data partitioning strategies for building local classifiers (Chapter 14)

NARCIS (Netherlands)

Zliobaite, I.; Okun, O.; Valentini, G.; Re, M.

2011-01-01

Divide-and-conquer approach has been recognized in multiple classifier systems aiming to utilize local expertise of individual classifiers. In this study we experimentally investigate three strategies for building local classifiers that are based on different routines of sampling data for training.
Performance improvement of multi-class detection using greedy algorithm for Viola-Jones cascade selection

Science.gov (United States)

Tereshin, Alexander A.; Usilin, Sergey A.; Arlazarov, Vladimir V.

2018-04-01

This paper aims to study the problem of multi-class object detection in video stream with Viola-Jones cascades. An adaptive algorithm for selecting Viola-Jones cascade based on greedy choice strategy in solution of the N-armed bandit problem is proposed. The efficiency of the algorithm on the problem of detection and recognition of the bank card logos in the video stream is shown. The proposed algorithm can be effectively used in documents localization and identification, recognition of road scene elements, localization and tracking of the lengthy objects , and for solving other problems of rigid object detection in a heterogeneous data flows. The computational efficiency of the algorithm makes it possible to use it both on personal computers and on mobile devices based on processors with low power consumption.
Consistency Analysis of Nearest Subspace Classifier

OpenAIRE

Wang, Yi

2015-01-01

The Nearest subspace classifier (NSS) finds an estimation of the underlying subspace within each class and assigns data points to the class that corresponds to its nearest subspace. This paper mainly studies how well NSS can be generalized to new samples. It is proved that NSS is strongly consistent under certain assumptions. For completeness, NSS is evaluated through experiments on various simulated and real data sets, in comparison with some other linear model based classifiers. It is also ...
How large a training set is needed to develop a classifier for microarray data?

Science.gov (United States)

Dobbin, Kevin K; Zhao, Yingdong; Simon, Richard M

2008-01-01

A common goal of gene expression microarray studies is the development of a classifier that can be used to divide patients into groups with different prognoses, or with different expected responses to a therapy. These types of classifiers are developed on a training set, which is the set of samples used to train a classifier. The question of how many samples are needed in the training set to produce a good classifier from high-dimensional microarray data is challenging. We present a model-based approach to determining the sample size required to adequately train a classifier. It is shown that sample size can be determined from three quantities: standardized fold change, class prevalence, and number of genes or features on the arrays. Numerous examples and important experimental design issues are discussed. The method is adapted to address ex post facto determination of whether the size of a training set used to develop a classifier was adequate. An interactive web site for performing the sample size calculations is provided. We showed that sample size calculations for classifier development from high-dimensional microarray data are feasible, discussed numerous important considerations, and presented examples.
Learning machines and sleeping brains: Automatic sleep stage classification using decision-tree multi-class support vector machines.

Science.gov (United States)

Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim

2015-07-30

Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.
Automatic classification and detection of clinically relevant images for diabetic retinopathy

Science.gov (United States)

Xu, Xinyu; Li, Baoxin

2008-03-01

We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.
Classifying brain metastases by their primary site of origin using a radiomics approach based on texture analysis: a feasibility study.

Science.gov (United States)

Ortiz-Ramón, Rafael; Larroza, Andrés; Ruiz-España, Silvia; Arana, Estanislao; Moratal, David

2018-05-14

To examine the capability of MRI texture analysis to differentiate the primary site of origin of brain metastases following a radiomics approach. Sixty-seven untreated brain metastases (BM) were found in 3D T1-weighted MRI of 38 patients with cancer: 27 from lung cancer, 23 from melanoma and 17 from breast cancer. These lesions were segmented in 2D and 3D to compare the discriminative power of 2D and 3D texture features. The images were quantized using different number of gray-levels to test the influence of quantization. Forty-three rotation-invariant texture features were examined. Feature selection and random forest classification were implemented within a nested cross-validation structure. Classification was evaluated with the area under receiver operating characteristic curve (AUC) considering two strategies: multiclass and one-versus-one. In the multiclass approach, 3D texture features were more discriminative than 2D features. The best results were achieved for images quantized with 32 gray-levels (AUC = 0.873 ± 0.064) using the top four features provided by the feature selection method based on the p-value. In the one-versus-one approach, high accuracy was obtained when differentiating lung cancer BM from breast cancer BM (four features, AUC = 0.963 ± 0.054) and melanoma BM (eight features, AUC = 0.936 ± 0.070) using the optimal dataset (3D features, 32 gray-levels). Classification of breast cancer and melanoma BM was unsatisfactory (AUC = 0.607 ± 0.180). Volumetric MRI texture features can be useful to differentiate brain metastases from different primary cancers after quantizing the images with the proper number of gray-levels. • Texture analysis is a promising source of biomarkers for classifying brain neoplasms. • MRI texture features of brain metastases could help identifying the primary cancer. • Volumetric texture features are more discriminative than traditional 2D texture features.
Frog sound identification using extended k-nearest neighbor classifier

Science.gov (United States)

Mukahar, Nordiana; Affendi Rosdi, Bakhtiar; Athiar Ramli, Dzati; Jaafar, Haryati

2017-09-01

Frog sound identification based on the vocalization becomes important for biological research and environmental monitoring. As a result, different types of feature extractions and classifiers have been employed to evaluate the accuracy of frog sound identification. This paper presents a frog sound identification with Extended k-Nearest Neighbor (EKNN) classifier. The EKNN classifier integrates the nearest neighbors and mutual sharing of neighborhood concepts, with the aims of improving the classification performance. It makes a prediction based on who are the nearest neighbors of the testing sample and who consider the testing sample as their nearest neighbors. In order to evaluate the classification performance in frog sound identification, the EKNN classifier is compared with competing classifier, k -Nearest Neighbor (KNN), Fuzzy k -Nearest Neighbor (FKNN) k - General Nearest Neighbor (KGNN)and Mutual k -Nearest Neighbor (MKNN) on the recorded sounds of 15 frog species obtained in Malaysia forest. The recorded sounds have been segmented using Short Time Energy and Short Time Average Zero Crossing Rate (STE+STAZCR), sinusoidal modeling (SM), manual and the combination of Energy (E) and Zero Crossing Rate (ZCR) (E+ZCR) while the features are extracted by Mel Frequency Cepstrum Coefficient (MFCC). The experimental results have shown that the EKNCN classifier exhibits the best performance in terms of accuracy compared to the competing classifiers, KNN, FKNN, GKNN and MKNN for all cases.
Modeling self-consistent multi-class dynamic traffic flow

Science.gov (United States)

Cho, Hsun-Jung; Lo, Shih-Ching

2002-09-01

In this study, we present a systematic self-consistent multiclass multilane traffic model derived from the vehicular Boltzmann equation and the traffic dispersion model. The multilane domain is considered as a two-dimensional space and the interaction among vehicles in the domain is described by a dispersion model. The reason we consider a multilane domain as a two-dimensional space is that the driving behavior of road users may not be restricted by lanes, especially motorcyclists. The dispersion model, which is a nonlinear Poisson equation, is derived from the car-following theory and the equilibrium assumption. Under the concept that all kinds of users share the finite section, the density is distributed on a road by the dispersion model. In addition, the dynamic evolution of the traffic flow is determined by the systematic gas-kinetic model derived from the Boltzmann equation. Multiplying Boltzmann equation by the zeroth, first- and second-order moment functions, integrating both side of the equation and using chain rules, we can derive continuity, motion and variance equation, respectively. However, the second-order moment function, which is the square of the individual velocity, is employed by previous researches does not have physical meaning in traffic flow. Although the second-order expansion results in the velocity variance equation, additional terms may be generated. The velocity variance equation we propose is derived from multiplying Boltzmann equation by the individual velocity variance. It modifies the previous model and presents a new gas-kinetic traffic flow model. By coupling the gas-kinetic model and the dispersion model, a self-consistent system is presented.
Behavior of Multiclass Pesticide Residue Concentrations during the Transformation from Rose Petals to Rose Absolute.

Science.gov (United States)

Tascone, Oriane; Fillâtre, Yoann; Roy, Céline; Meierhenrich, Uwe J

2015-05-27

This study investigates the concentrations of 54 multiclass pesticides during the transformation processes from rose petal to concrete and absolute using roses spiked with pesticides as a model. The concentrations of the pesticides were followed during the process of transforming the spiked rose flowers from an organic field into concrete and then into absolute. The rose flowers, the concrete, and the absolute, as well as their transformation intermediates, were analyzed for pesticide content using gas chromatography/tandem mass spectrometry. We observed that all the pesticides were extracted and concentrated in the absolute, with the exception of three molecules: fenthion, fenamiphos, and phorate. Typical pesticides were found to be concentrated by a factor of 100-300 from the rose flowers to the rose absolute. The observed effect of pesticide enrichment was also studied in roses and their extracts from four classically phytosanitary treated fields. Seventeen pesticides were detected in at least one of the extracts. Like the case for the spiked samples in our model, the pesticides present in the rose flowers from Turkey were concentrated in the absolute. Two pesticides, methidathion and chlorpyrifos, were quantified in the rose flowers at approximately 0.01 and 0.01-0.05 mg kg(-1), respectively, depending on the treated field. The concentrations determined for the corresponding rose absolutes were 4.7 mg kg(-1) for methidathion and 0.65-27.25 mg kg(-1) for chlorpyrifos.

Combining binary classifiers to improve tree species discrimination at leaf level

CSIR Research Space (South Africa)

Dastile, X

2012-11-01

Full Text Available ) for training. The neural networks toolbox (version 5.0.2 (R2007a))) of MATLAB was used. The training parameter ?goal? was set to 0.03. Training is stopped if the error function falls below this value. The training data was presented to the neural networks... and Systems, 2, 303?314. Diamond, P. & Kloeden, P. (1994). Metric Spaces of Fuzzy Sets Theory and Applications. World Scientific Publishing Co. Pty. Ltd. Dietterich, T. G. & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output...
COMPARISON OF PERFORMANCES OF DIFFERENT SVM IMPLEMENTATIONS WHEN USED FOR AUTOMATED EVALUATION OF DESCRIPTIVE ANSWERS

Directory of Open Access Journals (Sweden)

C. Sunil Kumar

2015-04-01

Full Text Available In this paper, we studied the performances of models built using various SVM implementations during the multiclass classification task of automated evaluation of descriptive answers. The performances were evaluated on five datasets each with 900 samples and with each of the datasets treated using symmetric uncertainty feature selection filter. We quantitatively analyzed the best SVM implementation technique from amongst the 17 different SVM implementation combinations derived by using various SVM classifier libraries, SVM types and Kernel methods. Accuracy, F Score, Kappa and Area under ROC curve are used as model evaluation metrics in order to evaluate the models and rank them according to their performances. Based on the results, we derived the conclusion that SMO classifier when used with Polynomial kernel is the overall best performing classifier applicable for auto evaluation of descriptive answers.
An auditory multiclass brain-computer interface with natural stimuli: Usability evaluation with healthy participants and a motor impaired end user.

Science.gov (United States)

Simon, Nadine; Käthner, Ivo; Ruf, Carolin A; Pasqualotto, Emanuele; Kübler, Andrea; Halder, Sebastian

2014-01-01

Brain-computer interfaces (BCIs) can serve as muscle independent communication aids. Persons, who are unable to control their eye muscles (e.g., in the completely locked-in state) or have severe visual impairments for other reasons, need BCI systems that do not rely on the visual modality. For this reason, BCIs that employ auditory stimuli were suggested. In this study, a multiclass BCI spelling system was implemented that uses animal voices with directional cues to code rows and columns of a letter matrix. To reveal possible training effects with the system, 11 healthy participants performed spelling tasks on 2 consecutive days. In a second step, the system was tested by a participant with amyotrophic lateral sclerosis (ALS) in two sessions. In the first session, healthy participants spelled with an average accuracy of 76% (3.29 bits/min) that increased to 90% (4.23 bits/min) on the second day. Spelling accuracy by the participant with ALS was 20% in the first and 47% in the second session. The results indicate a strong training effect for both the healthy participants and the participant with ALS. While healthy participants reached high accuracies in the first session and second session, accuracies for the participant with ALS were not sufficient for satisfactory communication in both sessions. More training sessions might be needed to improve spelling accuracies. The study demonstrated the feasibility of the auditory BCI with healthy users and stresses the importance of training with auditory multiclass BCIs, especially for potential end-users of BCI with disease.
An auditory multiclass brain-computer interface with natural stimuli: usability evaluation with healthy participants and a motor impaired end user

Directory of Open Access Journals (Sweden)

Nadine eSimon

2015-01-01

Full Text Available Brain-computer interfaces (BCIs can serve as muscle independent communication aids. Persons, who are unable to control their eye muscles (e.g. in the completely locked-in state or have severe visual impairments for other reasons, need BCI systems that do not rely on the visual modality. For this reason, BCIs that employ auditory stimuli were suggested. In this study, a multiclass BCI spelling system was implemented that uses animal voices with directional cues to code rows and columns of a letter matrix. To reveal possible training effects with the system, 11 healthy participants performed spelling tasks on two consecutive days. In a second step, the system was tested by a participant with amyotrophic lateral sclerosis (ALS in two sessions. In the first session, healthy participants spelled with an average accuracy of 76% (3.29 bits/min that increased to 90% (4.23 bits/min on the second day. Spelling accuracy by the participant with ALS was 20% in the first and 47% in the second session. The results indicate a strong training effect for both the healthy participants and the participant with ALS. While healthy participants reached high accuracies in the first session and second session, accuracies for the participant with ALS were not sufficient for satisfactory communication in both sessions. More training sessions might be needed to improve spelling accuracies. The study demonstrated the feasibility of the auditory BCI with healthy users and stresses the importance of training with auditory multiclass BCIs, especially for potential end-users of BCI with disease.
DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.

Science.gov (United States)

Kulmanov, Maxat; Khan, Mohammed Asif; Hoehndorf, Robert; Wren, Jonathan

2018-02-15

A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein-protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

Directory of Open Access Journals (Sweden)

Martin Alberto JM

2009-01-01

Full Text Available Abstract Background Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure. Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure. Results We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that Cα trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10% yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 �
The decision tree classifier - Design and potential. [for Landsat-1 data

Science.gov (United States)

Hauska, H.; Swain, P. H.

1975-01-01

A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.
A Multiagent-based Intrusion Detection System with the Support of Multi-Class Supervised Classification

Science.gov (United States)

Shyu, Mei-Ling; Sainani, Varsha

The increasing number of network security related incidents have made it necessary for the organizations to actively protect their sensitive data with network intrusion detection systems (IDSs). IDSs are expected to analyze a large volume of data while not placing a significantly added load on the monitoring systems and networks. This requires good data mining strategies which take less time and give accurate results. In this study, a novel data mining assisted multiagent-based intrusion detection system (DMAS-IDS) is proposed, particularly with the support of multiclass supervised classification. These agents can detect and take predefined actions against malicious activities, and data mining techniques can help detect them. Our proposed DMAS-IDS shows superior performance compared to central sniffing IDS techniques, and saves network resources compared to other distributed IDS with mobile agents that activate too many sniffers causing bottlenecks in the network. This is one of the major motivations to use a distributed model based on multiagent platform along with a supervised classification technique.
The multi-class binomial failure rate model for the treatment of common-cause failures

International Nuclear Information System (INIS)

Hauptmanns, U.

1995-01-01

The impact of common cause failures (CCF) on PSA results for NPPs is in sharp contrast with the limited quality which can be achieved in their assessment. This is due to the dearth of observations and cannot be remedied in the short run. Therefore the methods employed for calculating failure rates should be devised such as to make the best use of the few available observations on CCF. The Multi-Class Binomial Failure Rate (MCBFR) Model achieves this by assigning observed failures to different classes according to their technical characteristics and applying the BFR formalism to each of these. The results are hence determined by a superposition of BFR type expressions for each class, each of them with its own coupling factor. The model thus obtained flexibly reproduces the dependence of CCF rates on failure multiplicity insinuated by the observed failure multiplicities. This is demonstrated by evaluating CCFs observed for combined impulse pilot valves in German NPPs. (orig.) [de
Classifying Radio Galaxies with the Convolutional Neural Network

Energy Technology Data Exchange (ETDEWEB)

Aniyan, A. K.; Thorat, K. [Department of Physics and Electronics, Rhodes University, Grahamstown (South Africa)

2017-06-01

We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.
Classifying Radio Galaxies with the Convolutional Neural Network

Science.gov (United States)

Aniyan, A. K.; Thorat, K.

2017-06-01

We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff-Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ˜200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.
Classifying Radio Galaxies with the Convolutional Neural Network

International Nuclear Information System (INIS)

Aniyan, A. K.; Thorat, K.

2017-01-01

We present the application of a deep machine learning technique to classify radio images of extended sources on a morphological basis using convolutional neural networks (CNN). In this study, we have taken the case of the Fanaroff–Riley (FR) class of radio galaxies as well as radio galaxies with bent-tailed morphology. We have used archival data from the Very Large Array (VLA)—Faint Images of the Radio Sky at Twenty Centimeters survey and existing visually classified samples available in the literature to train a neural network for morphological classification of these categories of radio sources. Our training sample size for each of these categories is ∼200 sources, which has been augmented by rotated versions of the same. Our study shows that CNNs can classify images of the FRI and FRII and bent-tailed radio galaxies with high accuracy (maximum precision at 95%) using well-defined samples and a “fusion classifier,” which combines the results of binary classifications, while allowing for a mechanism to find sources with unusual morphologies. The individual precision is highest for bent-tailed radio galaxies at 95% and is 91% and 75% for the FRI and FRII classes, respectively, whereas the recall is highest for FRI and FRIIs at 91% each, while the bent-tailed class has a recall of 79%. These results show that our results are comparable to that of manual classification, while being much faster. Finally, we discuss the computational and data-related challenges associated with the morphological classification of radio galaxies with CNNs.
A multiclass vehicular dynamic traffic flow model for main roads and dedicated lanes/roads of multimodal transport network

Energy Technology Data Exchange (ETDEWEB)

Sossoe, K.S., E-mail: kwami.sossoe@irt-systemx.fr [TECHNOLOGICAL RESEARCH INSTITUTE SYSTEMX (France); Lebacque, J-P., E-mail: jean-patrick.lebacque@ifsttar.fr [UPE/IFSTTAR-COSYS-GRETTIA (France)

2015-03-10

We present in this paper a model of vehicular traffic flow for a multimodal transportation road network. We introduce the notion of class of vehicles to refer to vehicles of different transport modes. Our model describes the traffic on highways (which may contain several lanes) and network transit for pubic transportation. The model is drafted with Eulerian and Lagrangian coordinates and uses a Logit model to describe the traffic assignment of our multiclass vehicular flow description on shared roads. The paper also discusses traffic streams on dedicated lanes for specific class of vehicles with event-based traffic laws. An Euler-Lagrangian-remap scheme is introduced to numerically approximate the model’s flow equations.
Improved binary dragonfly optimization algorithm and wavelet packet based non-linear features for infant cry classification.

Science.gov (United States)

Hariharan, M; Sindhu, R; Vijean, Vikneswaran; Yazid, Haniza; Nadarajaw, Thiyagar; Yaacob, Sazali; Polat, Kemal

2018-03-01

Infant cry signal carries several levels of information about the reason for crying (hunger, pain, sleepiness and discomfort) or the pathological status (asphyxia, deaf, jaundice, premature condition and autism, etc.) of an infant and therefore suited for early diagnosis. In this work, combination of wavelet packet based features and Improved Binary Dragonfly Optimization based feature selection method was proposed to classify the different types of infant cry signals. Cry signals from 2 different databases were utilized. First database contains 507 cry samples of normal (N), 340 cry samples of asphyxia (A), 879 cry samples of deaf (D), 350 cry samples of hungry (H) and 192 cry samples of pain (P). Second database contains 513 cry samples of jaundice (J), 531 samples of premature (Prem) and 45 samples of normal (N). Wavelet packet transform based energy and non-linear entropies (496 features), Linear Predictive Coding (LPC) based cepstral features (56 features), Mel-frequency Cepstral Coefficients (MFCCs) were extracted (16 features). The combined feature set consists of 568 features. To overcome the curse of dimensionality issue, improved binary dragonfly optimization algorithm (IBDFO) was proposed to select the most salient attributes or features. Finally, Extreme Learning Machine (ELM) kernel classifier was used to classify the different types of infant cry signals using all the features and highly informative features as well. Several experiments of two-class and multi-class classification of cry signals were conducted. In binary or two-class experiments, maximum accuracy of 90.18% for H Vs P, 100% for A Vs N, 100% for D Vs N and 97.61% J Vs Prem was achieved using the features selected (only 204 features out of 568) by IBDFO. For the classification of multiple cry signals (multi-class problem), the selected features could differentiate between three classes (N, A & D) with the accuracy of 100% and seven classes with the accuracy of 97.62%. The experimental
Classifying Microorganisms

DEFF Research Database (Denmark)

Sommerlund, Julie

2006-01-01

This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological characteris......This paper describes the coexistence of two systems for classifying organisms and species: a dominant genetic system and an older naturalist system. The former classifies species and traces their evolution on the basis of genetic characteristics, while the latter employs physiological...... characteristics. The coexistence of the classification systems does not lead to a conflict between them. Rather, the systems seem to co-exist in different configurations, through which they are complementary, contradictory and inclusive in different situations-sometimes simultaneously. The systems come...
Automatic SLEEP staging: From young aduslts to elderly patients using multi-class support vector machine

DEFF Research Database (Denmark)

Kempfner, Jacob; Jennum, Poul; Sorensen, Helge B. D.

2013-01-01

an automatic sleep stage detector, which can separate wakefulness, rapid-eye-movement (REM) sleep and non-REM (NREM) sleep using only EEG and EOG. Most sleep events, which define the sleep stages, are reduced with age. This is addressed by focusing on the amplitude of the clinical EEG bands......Aging is a process that is inevitable, and makes our body vulnerable to age-related diseases. Age is the most consistent factor affecting the sleep structure. Therefore, new automatic sleep staging methods, to be used in both of young and elderly patients, are needed. This study proposes......, and not the affected sleep events. The age-related influences are then reduced by robust subject-specific scaling. The classification of the three sleep stages are achieved by a multi-class support vector machine using the one-versus-rest scheme. It was possible to obtain a high classification accuracy of 0...
Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

Science.gov (United States)

Guo, Yiqing; Jia, Xiuping; Paull, David

2018-06-01

The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.
Trace analysis of multi-class pesticide residues in Chinese medicinal health wines using gas chromatography with electron capture detection

Science.gov (United States)

Kong, Wei-Jun; Liu, Qiu-Tao; Kong, Dan-Dan; Liu, Qian-Zhen; Ma, Xin-Ping; Yang, Mei-Hua

2016-02-01

A method is described for multi-residue, high-throughput determination of trace levels of 22 organochlorine pesticides (OCPs) and 5 pyrethroid pesticides (PYPs) in Chinese medicinal (CM) health wines using a QuEChERS (quick, easy, cheap, effective, rugged, and safe) based extraction method and gas chromatography-electron capture detection (GC-ECD). Several parameters were optimized to improve preparation and separation time while still maintaining high sensitivity. Validation tests of spiked samples showed good linearities for 27 pesticides (R = 0.9909-0.9996) over wide concentration ranges. Limits of detection (LODs) and quantification (LOQs) were measured at ng/L levels, 0.06-2 ng/L and 0.2-6 ng/L for OCPs and 0.02-3 ng/L and 0.06-7 ng/L for PYPs, respectively. Inter- and intra-day precision tests showed variations of 0.65-9.89% for OCPs and 0.98-13.99% for PYPs, respectively. Average recoveries were in the range of 47.74-120.31%, with relative standard deviations below 20%. The developed method was then applied to analyze 80 CM wine samples. Beta-BHC (Benzene hexachloride) was the most frequently detected pesticide at concentration levels of 5.67-31.55 mg/L, followed by delta-BHC, trans-chlordane, gamma-BHC, and alpha-BHC. The validated method is simple and economical, with adequate sensitivity for trace levels of multi-class pesticides. It could be adopted by laboratories for this and other types of complex matrices analysis.
Stack filter classifiers

Energy Technology Data Exchange (ETDEWEB)

Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory

2009-01-01

Just as linear models generalize the sample mean and weighted average, weighted order statistic models generalize the sample median and weighted median. This analogy can be continued informally to generalized additive modeels in the case of the mean, and Stack Filters in the case of the median. Both of these model classes have been extensively studied for signal and image processing but it is surprising to find that for pattern classification, their treatment has been significantly one sided. Generalized additive models are now a major tool in pattern classification and many different learning algorithms have been developed to fit model parameters to finite data. However Stack Filters remain largely confined to signal and image processing and learning algorithms for classification are yet to be seen. This paper is a step towards Stack Filter Classifiers and it shows that the approach is interesting from both a theoretical and a practical perspective.
Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

Science.gov (United States)

Akhtar, Naveed; Mian, Ajmal

2017-10-03

We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

Classifying a smoker scale in adult daily and nondaily smokers.

Science.gov (United States)

Pulvers, Kim; Scheuermann, Taneisha S; Romero, Devan R; Basora, Brittany; Luo, Xianghua; Ahluwalia, Jasjit S

2014-05-01

Smoker identity, or the strength of beliefs about oneself as a smoker, is a robust marker of smoking behavior. However, many nondaily smokers do not identify as smokers, underestimating their risk for tobacco-related disease and resulting in missed intervention opportunities. Assessing underlying beliefs about characteristics used to classify smokers may help explain the discrepancy between smoking behavior and smoker identity. This study examines the factor structure, reliability, and validity of the Classifying a Smoker scale among a racially diverse sample of adult smokers. A cross-sectional survey was administered through an online panel survey service to 2,376 current smokers who were at least 25 years of age. The sample was stratified to obtain equal numbers of 3 racial/ethnic groups (African American, Latino, and White) across smoking level (nondaily and daily smoking). The Classifying a Smoker scale displayed a single factor structure and excellent internal consistency (α = .91). Classifying a Smoker scores significantly increased at each level of smoking, F(3,2375) = 23.68, p smoker identity, stronger dependence on cigarettes, greater health risk perceptions, more smoking friends, and were more likely to carry cigarettes. Classifying a Smoker scores explained unique variance in smoking variables above and beyond that explained by smoker identity. The present study supports the use of the Classifying a Smoker scale among diverse, experienced smokers. Stronger endorsement of characteristics used to classify a smoker (i.e., stricter criteria) was positively associated with heavier smoking and related characteristics. Prospective studies are needed to inform prevention and treatment efforts.
The Pattern Recognition in Cattle Brand using Bag of Visual Words and Support Vector Machines Multi-Class

Directory of Open Access Journals (Sweden)

Carlos Silva, Mr

2018-03-01

Full Text Available The recognition images of cattle brand in an automatic way is a necessity to governmental organs responsible for this activity. To help this process, this work presents a method that consists in using Bag of Visual Words for extracting of characteristics from images of cattle brand and Support Vector Machines Multi-Class for classification. This method consists of six stages: a select database of images; b extract points of interest (SURF; c create vocabulary (K-means; d create vector of image characteristics (visual words; e train and sort images (SVM; f evaluate the classification results. The accuracy of the method was tested on database of municipal city hall, where it achieved satisfactory results, reporting 86.02% of accuracy and 56.705 seconds of processing time, respectively.
Analysis of Veterinary Drug and Pesticide Residues Using the Ethyl Acetate Multiclass/Multiresidue Method in Milk by Liquid Chromatography-Tandem Mass Spectrometry

Directory of Open Access Journals (Sweden)

Husniye Imamoglu

2016-01-01

Full Text Available A rapid and simple multiclass, ethyl acetate (EtOAc multiresidue method based on liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS detection was developed for the determination and quantification of 26 veterinary drugs and 187 total pesticide residues in milk. Sample preparation was a simple procedure based on liquid–liquid extraction with ethyl acetate containing 0.1% acetic acid, followed by centrifugation and evaporation of the supernatant. The residue was dissolved in ethyl acetate with 0.1% acetic acid and centrifuged prior to LC-MS/MS analysis. Chromatographic separation of analytes was performed on an Inertsil X-Terra C18 column with acetic acid in methanol and water gradient. The repeatability and reproducibility were in the range of 2 to 13% and 6 to 16%, respectively. The average recoveries ranged from 75 to 120% with the RSD (n=18. The developed method was validated according to the criteria set in Commission Decision 2002/657/EC and SANTE/11945/2015. The validated methodology represents a fast and cheap alternative for the simultaneous analysis of veterinary drug and pesticide residues which can be easily extended to other compounds and matrices.
Multi-class multi-residue analysis of veterinary drugs in meat using enhanced matrix removal lipid cleanup and liquid chromatography-tandem mass spectrometry.

Science.gov (United States)

Zhao, Limian; Lucas, Derick; Long, David; Richter, Bruce; Stevens, Joan

2018-05-11

This study presents the development and validation of a quantitation method for the analysis of multi-class, multi-residue veterinary drugs using lipid removal cleanup cartridges, enhanced matrix removal lipid (EMR-Lipid), for different meat matrices by liquid chromatography tandem mass spectrometry detection. Meat samples were extracted using a two-step solid-liquid extraction followed by pass-through sample cleanup. The method was optimized based on the buffer and solvent composition, solvent additive additions, and EMR-Lipid cartridge cleanup. The developed method was then validated in five meat matrices, porcine muscle, bovine muscle, bovine liver, bovine kidney and chicken liver to evaluate the method performance characteristics, such as absolute recoveries and precision at three spiking levels, calibration curve linearity, limit of quantitation (LOQ) and matrix effect. The results showed that >90% of veterinary drug analytes achieved satisfactory recovery results of 60-120%. Over 97% analytes achieved excellent reproducibility results (relative standard deviation (RSD) meat matrices. The matrix co-extractive removal efficiency by weight provided by EMR-lipid cartridge cleanup was 42-58% in samples. The post column infusion study showed that the matrix ion suppression was reduced for samples with the EMR-Lipid cartridge cleanup. The reduced matrix ion suppression effect was also confirmed with 30%) for all tested veterinary drugs in all of meat matrices. The results showed that the two-step solid-liquid extraction provides efficient extraction for the entire spectrum of veterinary drugs, including the difficult classes such as tetracyclines, beta-lactams etc. EMR-Lipid cartridges after extraction provided efficient sample cleanup with easy streamlined protocol and minimal impacts on analytes recovery, improving method reliability and consistency. Copyright © 2018 Elsevier B.V. All rights reserved.
Implications of physical symmetries in adaptive image classifiers

DEFF Research Database (Denmark)

Sams, Thomas; Hansen, Jonas Lundbek

2000-01-01

It is demonstrated that rotational invariance and reflection symmetry of image classifiers lead to a reduction in the number of free parameters in the classifier. When used in adaptive detectors, e.g. neural networks, this may be used to decrease the number of training samples necessary to learn...... a given classification task, or to improve generalization of the neural network. Notably, the symmetrization of the detector does not compromise the ability to distinguish objects that break the symmetry. (C) 2000 Elsevier Science Ltd. All rights reserved....
An ordinal classification approach for CTG categorization.

Science.gov (United States)

Georgoulas, George; Karvelis, Petros; Gavrilis, Dimitris; Stylios, Chrysostomos D; Nikolakopoulos, George

2017-07-01

Evaluation of cardiotocogram (CTG) is a standard approach employed during pregnancy and delivery. But, its interpretation requires high level expertise to decide whether the recording is Normal, Suspicious or Pathological. Therefore, a number of attempts have been carried out over the past three decades for development automated sophisticated systems. These systems are usually (multiclass) classification systems that assign a category to the respective CTG. However most of these systems usually do not take into consideration the natural ordering of the categories associated with CTG recordings. In this work, an algorithm that explicitly takes into consideration the ordering of CTG categories, based on binary decomposition method, is investigated. Achieved results, using as a base classifier the C4.5 decision tree classifier, prove that the ordinal classification approach is marginally better than the traditional multiclass classification approach, which utilizes the standard C4.5 algorithm for several performance criteria.
Mapping online transportation service quality and multiclass classification problem solving priorities

Science.gov (United States)

Alamsyah, Andry; Rachmadiansyah, Imam

2018-03-01

Online transportation service is known for its accessibility, transparency, and tariff affordability. These points make online transportation have advantages over the existing conventional transportation service. Online transportation service is an example of disruptive technology that change the relationship between customers and companies. In Indonesia, there are high competition among online transportation provider, hence the companies must maintain and monitor their service level. To understand their position, we apply both sentiment analysis and multiclass classification to understand customer opinions. From negative sentiments, we can identify problems and establish problem-solving priorities. As a case study, we use the most popular online transportation provider in Indonesia: Gojek and Grab. Since many customers are actively give compliment and complain about company’s service level on Twitter, therefore we collect 61,721 tweets in Bahasa during one month observations. We apply Naive Bayes and Support Vector Machine methods to see which model perform best for our data. The result reveal Gojek has better service quality with 19.76% positive and 80.23% negative sentiments than Grab with 9.2% positive and 90.8% negative. The Gojek highest problem-solving priority is regarding application problems, while Grab is about unusable promos. The overall result shows general problems of both case study are related to accessibility dimension which indicate lack of capability to provide good digital access to the end users.
An ensemble of dissimilarity based classifiers for Mackerel gender determination

Science.gov (United States)

Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

2014-03-01

Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.
An ensemble of dissimilarity based classifiers for Mackerel gender determination

International Nuclear Information System (INIS)

Blanco, A; Rodriguez, R; Martinez-Maranon, I

2014-01-01

Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity
A novel algorithm of super-resolution image reconstruction based on multi-class dictionaries for natural scene

Science.gov (United States)

Wu, Wei; Zhao, Dewei; Zhang, Huan

2015-12-01

Super-resolution image reconstruction is an effective method to improve the image quality. It has important research significance in the field of image processing. However, the choice of the dictionary directly affects the efficiency of image reconstruction. A sparse representation theory is introduced into the problem of the nearest neighbor selection. Based on the sparse representation of super-resolution image reconstruction method, a super-resolution image reconstruction algorithm based on multi-class dictionary is analyzed. This method avoids the redundancy problem of only training a hyper complete dictionary, and makes the sub-dictionary more representatives, and then replaces the traditional Euclidean distance computing method to improve the quality of the whole image reconstruction. In addition, the ill-posed problem is introduced into non-local self-similarity regularization. Experimental results show that the algorithm is much better results than state-of-the-art algorithm in terms of both PSNR and visual perception.
An Improvement To The k-Nearest Neighbor Classifier For ECG Database

Science.gov (United States)

Jaafar, Haryati; Hidayah Ramli, Nur; Nasir, Aimi Salihah Abdul

2018-03-01

The k nearest neighbor (kNN) is a non-parametric classifier and has been widely used for pattern classification. However, in practice, the performance of kNN often tends to fail due to the lack of information on how the samples are distributed among them. Moreover, kNN is no longer optimal when the training samples are limited. Another problem observed in kNN is regarding the weighting issues in assigning the class label before classification. Thus, to solve these limitations, a new classifier called Mahalanobis fuzzy k-nearest centroid neighbor (MFkNCN) is proposed in this study. Here, a Mahalanobis distance is applied to avoid the imbalance of samples distribition. Then, a surrounding rule is employed to obtain the nearest centroid neighbor based on the distributions of training samples and its distance to the query point. Consequently, the fuzzy membership function is employed to assign the query point to the class label which is frequently represented by the nearest centroid neighbor Experimental studies from electrocardiogram (ECG) signal is applied in this study. The classification performances are evaluated in two experimental steps i.e. different values of k and different sizes of feature dimensions. Subsequently, a comparative study of kNN, kNCN, FkNN and MFkCNN classifier is conducted to evaluate the performances of the proposed classifier. The results show that the performance of MFkNCN consistently exceeds the kNN, kNCN and FkNN with the best classification rates of 96.5%.
Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

Science.gov (United States)

McIntyre, Alexa B R; Ounit, Rachid; Afshinnekoo, Ebrahim; Prill, Robert J; Hénaff, Elizabeth; Alexander, Noah; Minot, Samuel S; Danko, David; Foox, Jonathan; Ahsanuddin, Sofia; Tighe, Scott; Hasan, Nur A; Subramanian, Poorani; Moffat, Kelly; Levy, Shawn; Lonardi, Stefano; Greenfield, Nick; Colwell, Rita R; Rosen, Gail L; Mason, Christopher E

2017-09-21

One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.

Science.gov (United States)

Tuo, Youlin; An, Ning; Zhang, Ming

2018-03-01

The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of PSVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several
An Investigation to Improve Classifier Accuracy for Myo Collected Data

Science.gov (United States)

2017-02-01

Bad Samples Effect on Classification Accuracy 7 5.1 Naïve Bayes (NB) Classifier Accuracy 7 5.2 Logistic Model Tree (LMT) 10 5.3 K-Nearest Neighbor...gesture, pitch feature, user 06. All samples exhibit reversed movement...20 Fig. A-2 Come gesture, pitch feature, user 14. All samples exhibit reversed movement
A Simple and Fast Extraction Method for the Determination of Multiclass Antibiotics in Eggs Using LC-MS/MS.

Science.gov (United States)

Wang, Kun; Lin, Kunde; Huang, Xinwen; Chen, Meng

2017-06-21

The purpose of this study was to develop and validate a simple, fast, and specific extraction method for the analysis of 64 antibiotics from nine classes (including sulfonamides, quinolones, tetracyclines, macrolides, lincosamide, nitrofurans, β-lactams, nitromidazoles, and cloramphenicols) in chicken eggs. Briefly, egg samples were simply extracted with a mixture of acetonitrile-water (90:10, v/v) and 0.1 mol·L -1 Na 2 EDTA solution assisted with ultrasonic. The extract was centrifuged, condensed, and directly analyzed on a liquid chromatography coupled to tandem mass spectrometry. Compared with conventional cleanup methods (passing through solid phase extract cartridges), the established method demonstrated comparable efficiencies in eliminating matrix effects and higher or equivalent recoveries for most of the target compounds. Typical validation parameters including specificity, linearity, matrix effect, limits of detection (LODs) and quantification (LOQs), the decision limit, detection capability, trueness, and precision were evaluated. The recoveries of target compounds ranged from 70.8% to 116.1% at three spiking levels (5, 20, and 50 μg·kg -1 ), with relative standard deviations less than 14%. LODs and LOQs were in the ranges of 0.005-2.00 μg·kg -1 and 0.015-6.00 μg·kg -1 for all of the antibiotics, respectively. A total of five antibiotics were successfully detected in 22 commercial eggs from local markets. This work suggests that the method is suitable for the analysis of multiclass antibiotics in eggs.
The decision tree approach to classification

Science.gov (United States)

Wu, C.; Landgrebe, D. A.; Swain, P. H.

1975-01-01

A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.
Just-in-time adaptive classifiers-part II: designing the classifier.

Science.gov (United States)

Alippi, Cesare; Roveri, Manuel

2008-12-01

Aging effects, environmental changes, thermal drifts, and soft and hard faults affect physical systems by changing their nature and behavior over time. To cope with a process evolution adaptive solutions must be envisaged to track its dynamics; in this direction, adaptive classifiers are generally designed by assuming the stationary hypothesis for the process generating the data with very few results addressing nonstationary environments. This paper proposes a methodology based on k-nearest neighbor (NN) classifiers for designing adaptive classification systems able to react to changing conditions just-in-time (JIT), i.e., exactly when it is needed. k-NN classifiers have been selected for their computational-free training phase, the possibility to easily estimate the model complexity k and keep under control the computational complexity of the classifier through suitable data reduction mechanisms. A JIT classifier requires a temporal detection of a (possible) process deviation (aspect tackled in a companion paper) followed by an adaptive management of the knowledge base (KB) of the classifier to cope with the process change. The novelty of the proposed approach resides in the general framework supporting the real-time update of the KB of the classification system in response to novel information coming from the process both in stationary conditions (accuracy improvement) and in nonstationary ones (process tracking) and in providing a suitable estimate of k. It is shown that the classification system grants consistency once the change targets the process generating the data in a new stationary state, as it is the case in many real applications.
Multi-class machine classification of suicide-related communication on Twitter.

Science.gov (United States)

Burnap, Pete; Colombo, Gualtiero; Amery, Rosie; Hodorog, Andrei; Scourfield, Jonathan

2017-08-01

The World Wide Web, and online social networks in particular, have increased connectivity between people such that information can spread to millions of people in a matter of minutes. This form of online collective contagion has provided many benefits to society, such as providing reassurance and emergency management in the immediate aftermath of natural disasters. However, it also poses a potential risk to vulnerable Web users who receive this information and could subsequently come to harm. One example of this would be the spread of suicidal ideation in online social networks, about which concerns have been raised. In this paper we report the results of a number of machine classifiers built with the aim of classifying text relating to suicide on Twitter. The classifier distinguishes between the more worrying content, such as suicidal ideation, and other suicide-related topics such as reporting of a suicide, memorial, campaigning and support. It also aims to identify flippant references to suicide. We built a set of baseline classifiers using lexical, structural, emotive and psychological features extracted from Twitter posts. We then improved on the baseline classifiers by building an ensemble classifier using the Rotation Forest algorithm and a Maximum Probability voting classification decision method, based on the outcome of base classifiers. This achieved an F-measure of 0.728 overall (for 7 classes, including suicidal ideation) and 0.69 for the suicidal ideation class. We summarise the results by reflecting on the most significant predictive principle components of the suicidal ideation class to provide insight into the language used on Twitter to express suicidal ideation. Finally, we perform a 12-month case study of suicide-related posts where we further evaluate the classification approach - showing a sustained classification performance and providing anonymous insights into the trends and demographic profile of Twitter users posting content of this type.
Vision based nutrient deficiency classification in maize plants using multi class support vector machines

Science.gov (United States)

Leena, N.; Saju, K. K.

2018-04-01

Nutritional deficiencies in plants are a major concern for farmers as it affects productivity and thus profit. The work aims to classify nutritional deficiencies in maize plant in a non-destructive mannerusing image processing and machine learning techniques. The colored images of the leaves are analyzed and classified with multi-class support vector machine (SVM) method. Several images of maize leaves with known deficiencies like nitrogen, phosphorous and potassium (NPK) are used to train the SVM classifier prior to the classification of test images. The results show that the method was able to classify and identify nutritional deficiencies.
A method of neighbor classes based SVM classification for optical printed Chinese character recognition.

Science.gov (United States)

Zhang, Jie; Wu, Xiaohong; Yu, Yanmei; Luo, Daisheng

2013-01-01

In optical printed Chinese character recognition (OPCCR), many classifiers have been proposed for the recognition. Among the classifiers, support vector machine (SVM) might be the best classifier. However, SVM is a classifier for two classes. When it is used for multi-classes in OPCCR, its computation is time-consuming. Thus, we propose a neighbor classes based SVM (NC-SVM) to reduce the computation consumption of SVM. Experiments of NC-SVM classification for OPCCR have been done. The results of the experiments have shown that the NC-SVM we proposed can effectively reduce the computation time in OPCCR.

A novel statistical method for classifying habitat generalists and specialists

DEFF Research Database (Denmark)

Chazdon, Robin L; Chao, Anne; Colwell, Robert K

2011-01-01

in second-growth (SG) and old-growth (OG) rain forests in the Caribbean lowlands of northeastern Costa Rica. We evaluate the multinomial model in detail for the tree data set. Our results for birds were highly concordant with a previous nonstatistical classification, but our method classified a higher......: (1) generalist; (2) habitat A specialist; (3) habitat B specialist; and (4) too rare to classify with confidence. We illustrate our multinomial classification method using two contrasting data sets: (1) bird abundance in woodland and heath habitats in southeastern Australia and (2) tree abundance...... fraction (57.7%) of bird species with statistical confidence. Based on a conservative specialization threshold and adjustment for multiple comparisons, 64.4% of tree species in the full sample were too rare to classify with confidence. Among the species classified, OG specialists constituted the largest...
Hybrid classifiers methods of data, knowledge, and classifier combination

CERN Document Server

Wozniak, Michal

2014-01-01

This book delivers a definite and compact knowledge on how hybridization can help improving the quality of computer classification systems. In order to make readers clearly realize the knowledge of hybridization, this book primarily focuses on introducing the different levels of hybridization and illuminating what problems we will face with as dealing with such projects. In the first instance the data and knowledge incorporated in hybridization were the action points, and then a still growing up area of classifier systems known as combined classifiers was considered. This book comprises the aforementioned state-of-the-art topics and the latest research results of the author and his team from Department of Systems and Computer Networks, Wroclaw University of Technology, including as classifier based on feature space splitting, one-class classification, imbalance data, and data stream classification.
Current Directional Protection of Series Compensated Line Using Intelligent Classifier

Directory of Open Access Journals (Sweden)

M. Mollanezhad Heydarabadi

2016-12-01

Full Text Available Current inversion condition leads to incorrect operation of current based directional relay in power system with series compensated device. Application of the intelligent system for fault direction classification has been suggested in this paper. A new current directional protection scheme based on intelligent classifier is proposed for the series compensated line. The proposed classifier uses only half cycle of pre-fault and post fault current samples at relay location to feed the classifier. A lot of forward and backward fault simulations under different system conditions upon a transmission line with a fixed series capacitor are carried out using PSCAD/EMTDC software. The applicability of decision tree (DT, probabilistic neural network (PNN and support vector machine (SVM are investigated using simulated data under different system conditions. The performance comparison of the classifiers indicates that the SVM is a best suitable classifier for fault direction discriminating. The backward faults can be accurately distinguished from forward faults even under current inversion without require to detect of the current inversion condition.
Obscenity detection using haar-like features and Gentle Adaboost classifier.

Science.gov (United States)

Mustafa, Rashed; Min, Yang; Zhu, Dingju

2014-01-01

Large exposure of skin area of an image is considered obscene. This only fact may lead to many false images having skin-like objects and may not detect those images which have partially exposed skin area but have exposed erotogenic human body parts. This paper presents a novel method for detecting nipples from pornographic image contents. Nipple is considered as an erotogenic organ to identify pornographic contents from images. In this research Gentle Adaboost (GAB) haar-cascade classifier and haar-like features used for ensuring detection accuracy. Skin filter prior to detection made the system more robust. The experiment showed that, considering accuracy, haar-cascade classifier performs well, but in order to satisfy detection time, train-cascade classifier is suitable. To validate the results, we used 1198 positive samples containing nipple objects and 1995 negative images. The detection rates for haar-cascade and train-cascade classifiers are 0.9875 and 0.8429, respectively. The detection time for haar-cascade is 0.162 seconds and is 0.127 seconds for train-cascade classifier.
Obscenity Detection Using Haar-Like Features and Gentle Adaboost Classifier

Directory of Open Access Journals (Sweden)

Rashed Mustafa

2014-01-01

Full Text Available Large exposure of skin area of an image is considered obscene. This only fact may lead to many false images having skin-like objects and may not detect those images which have partially exposed skin area but have exposed erotogenic human body parts. This paper presents a novel method for detecting nipples from pornographic image contents. Nipple is considered as an erotogenic organ to identify pornographic contents from images. In this research Gentle Adaboost (GAB haar-cascade classifier and haar-like features used for ensuring detection accuracy. Skin filter prior to detection made the system more robust. The experiment showed that, considering accuracy, haar-cascade classifier performs well, but in order to satisfy detection time, train-cascade classifier is suitable. To validate the results, we used 1198 positive samples containing nipple objects and 1995 negative images. The detection rates for haar-cascade and train-cascade classifiers are 0.9875 and 0.8429, respectively. The detection time for haar-cascade is 0.162 seconds and is 0.127 seconds for train-cascade classifier.
Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes

Science.gov (United States)

Ashkani, Jahanshah; Naidoo, Kevin J.

2016-05-01

Aberrant glycosylation in tumours stem from altered glycosyltransferase (GT) gene expression but can the expression profiles of these signature genes be used to classify cancer types and lead to cancer subtype discovery? The differential structural changes to cellular glycan structures are predominantly regulated by the expression patterns of GT genes and are a hallmark of neoplastic cell metamorphoses. We found that the expression of 210 GT genes taken from 1893 cancer patient samples in The Cancer Genome Atlas (TCGA) microarray data are able to classify six cancers; breast, ovarian, glioblastoma, kidney, colon and lung. The GT gene expression profiles are used to develop cancer classifiers and propose subtypes. The subclassification of breast cancer solid tumour samples illustrates the discovery of subgroups from GT genes that match well against basal-like and HER2-enriched subtypes and correlates to clinical, mutation and survival data. This cancer type glycosyltransferase gene signature finding provides foundational evidence for the centrality of glycosylation in cancer.
Generalized Partial Least Squares Approach for Nominal Multinomial Logit Regression Models with a Functional Covariate

Science.gov (United States)

Albaqshi, Amani Mohammed H.

2017-01-01

Functional Data Analysis (FDA) has attracted substantial attention for the last two decades. Within FDA, classifying curves into two or more categories is consistently of interest to scientists, but multi-class prediction within FDA is challenged in that most classification tools have been limited to binary response applications. The functional…
Support Vector Machines for Pattern Classification

CERN Document Server

Abe, Shigeo

2010-01-01

A guide on the use of SVMs in pattern classification, including a rigorous performance comparison of classifiers and regressors. The book presents architectures for multiclass classification and function approximation problems, as well as evaluation criteria for classifiers and regressors. Features: Clarifies the characteristics of two-class SVMs; Discusses kernel methods for improving the generalization ability of neural networks and fuzzy systems; Contains ample illustrations and examples; Includes performance evaluation using publicly available data sets; Examines Mahalanobis kernels, empir
Automatic feed phase identification in multivariate bioprocess profiles by sequential binary classification.

Science.gov (United States)

Nikzad-Langerodi, Ramin; Lughofer, Edwin; Saminger-Platz, Susanne; Zahel, Thomas; Sagmeister, Patrick; Herwig, Christoph

2017-08-22

In this paper, we propose a new strategy for retrospective identification of feed phases from online sensor-data enriched feed profiles of an Escherichia Coli (E. coli) fed-batch fermentation process. In contrast to conventional (static), data-driven multi-class machine learning (ML), we exploit process knowledge in order to constrain our classification system yielding more parsimonious models compared to static ML approaches. In particular, we enforce unidirectionality on a set of binary, multivariate classifiers trained to discriminate between adjacent feed phases by linking the classifiers through a one-way switch. The switch is activated when the actual classifier output changes. As a consequence, the next binary classifier in the classifier chain is used for the discrimination between the next feed phase pair etc. We allow activation of the switch only after a predefined number of consecutive predictions of a transition event in order to prevent premature activation of the switch and undertake a sensitivity analysis regarding the optimal choice of the (time) lag parameter. From a complexity/parsimony perspective the benefit of our approach is three-fold: i) The multi-class learning task is broken down into binary subproblems which usually have simpler decision surfaces and tend to be less susceptible to the class-imbalance problem. ii) We exploit the fact that the process follows a rigid feed cycle structure (i.e. batch-feed-batch-feed) which allows us to focus on the subproblems involving phase transitions as they occur during the process while discarding off-transition classifiers and iii) only one binary classifier is active at the time which keeps effective model complexity low. We further use a combination of logistic regression and Lasso (i.e. regularized logistic regression, RLR) as a wrapper to extract the most relevant features for individual subproblems from the whole set of high-dimensional sensor data. We train different soft computing classifiers
Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

Science.gov (United States)

Sankari, E Siva; Manimegalai, D

2017-12-21

Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
STATISTICAL TOOLS FOR CLASSIFYING GALAXY GROUP DYNAMICS

International Nuclear Information System (INIS)

Hou, Annie; Parker, Laura C.; Harris, William E.; Wilman, David J.

2009-01-01

The dynamical state of galaxy groups at intermediate redshifts can provide information about the growth of structure in the universe. We examine three goodness-of-fit tests, the Anderson-Darling (A-D), Kolmogorov, and χ 2 tests, in order to determine which statistical tool is best able to distinguish between groups that are relaxed and those that are dynamically complex. We perform Monte Carlo simulations of these three tests and show that the χ 2 test is profoundly unreliable for groups with fewer than 30 members. Power studies of the Kolmogorov and A-D tests are conducted to test their robustness for various sample sizes. We then apply these tests to a sample of the second Canadian Network for Observational Cosmology Redshift Survey (CNOC2) galaxy groups and find that the A-D test is far more reliable and powerful at detecting real departures from an underlying Gaussian distribution than the more commonly used χ 2 and Kolmogorov tests. We use this statistic to classify a sample of the CNOC2 groups and find that 34 of 106 groups are inconsistent with an underlying Gaussian velocity distribution, and thus do not appear relaxed. In addition, we compute velocity dispersion profiles (VDPs) for all groups with more than 20 members and compare the overall features of the Gaussian and non-Gaussian groups, finding that the VDPs of the non-Gaussian groups are distinct from those classified as Gaussian.
Classification of THz pulse signals using two-dimensional cross-correlation feature extraction and non-linear classifiers.

Science.gov (United States)

Siuly; Yin, Xiaoxia; Hadjiloucas, Sillas; Zhang, Yanchun

2016-04-01

This work provides a performance comparison of four different machine learning classifiers: multinomial logistic regression with ridge estimators (MLR) classifier, k-nearest neighbours (KNN), support vector machine (SVM) and naïve Bayes (NB) as applied to terahertz (THz) transient time domain sequences associated with pixelated images of different powder samples. The six substances considered, although have similar optical properties, their complex insertion loss at the THz part of the spectrum is significantly different because of differences in both their frequency dependent THz extinction coefficient as well as differences in their refractive index and scattering properties. As scattering can be unquantifiable in many spectroscopic experiments, classification solely on differences in complex insertion loss can be inconclusive. The problem is addressed using two-dimensional (2-D) cross-correlations between background and sample interferograms, these ensure good noise suppression of the datasets and provide a range of statistical features that are subsequently used as inputs to the above classifiers. A cross-validation procedure is adopted to assess the performance of the classifiers. Firstly the measurements related to samples that had thicknesses of 2mm were classified, then samples at thicknesses of 4mm, and after that 3mm were classified and the success rate and consistency of each classifier was recorded. In addition, mixtures having thicknesses of 2 and 4mm as well as mixtures of 2, 3 and 4mm were presented simultaneously to all classifiers. This approach provided further cross-validation of the classification consistency of each algorithm. The results confirm the superiority in classification accuracy and robustness of the MLR (least accuracy 88.24%) and KNN (least accuracy 90.19%) algorithms which consistently outperformed the SVM (least accuracy 74.51%) and NB (least accuracy 56.86%) classifiers for the same number of feature vectors across all studies
Classifying the Progression of Ductal Carcinoma from Single-Cell Sampled Data via Integer Linear Programming: A Case Study.

Science.gov (United States)

Catanzaro, Daniele; Shackney, Stanley E; Schaffer, Alejandro A; Schwartz, Russell

2016-01-01

Ductal Carcinoma In Situ (DCIS) is a precursor lesion of Invasive Ductal Carcinoma (IDC) of the breast. Investigating its temporal progression could provide fundamental new insights for the development of better diagnostic tools to predict which cases of DCIS will progress to IDC. We investigate the problem of reconstructing a plausible progression from single-cell sampled data of an individual with synchronous DCIS and IDC. Specifically, by using a number of assumptions derived from the observation of cellular atypia occurring in IDC, we design a possible predictive model using integer linear programming (ILP). Computational experiments carried out on a preexisting data set of 13 patients with simultaneous DCIS and IDC show that the corresponding predicted progression models are classifiable into categories having specific evolutionary characteristics. The approach provides new insights into mechanisms of clonal progression in breast cancers and helps illustrate the power of the ILP approach for similar problems in reconstructing tumor evolution scenarios under complex sets of constraints.
MAMMOGRAMS ANALYSIS USING SVM CLASSIFIER IN COMBINED TRANSFORMS DOMAIN

Directory of Open Access Journals (Sweden)

B.N. Prathibha

2011-02-01

Full Text Available Breast cancer is a primary cause of mortality and morbidity in women. Reports reveal that earlier the detection of abnormalities, better the improvement in survival. Digital mammograms are one of the most effective means for detecting possible breast anomalies at early stages. Digital mammograms supported with Computer Aided Diagnostic (CAD systems help the radiologists in taking reliable decisions. The proposed CAD system extracts wavelet features and spectral features for the better classification of mammograms. The Support Vector Machines classifier is used to analyze 206 mammogram images from Mias database pertaining to the severity of abnormality, i.e., benign and malign. The proposed system gives 93.14% accuracy for discrimination between normal-malign and 87.25% accuracy for normal-benign samples and 89.22% accuracy for benign-malign samples. The study reveals that features extracted in hybrid transform domain with SVM classifier proves to be a promising tool for analysis of mammograms.
A neural-fuzzy approach to classify the ecological status in surface waters

International Nuclear Information System (INIS)

Ocampo-Duque, William; Schuhmacher, Marta; Domingo, Jose L.

2007-01-01

A methodology based on a hybrid approach that combines fuzzy inference systems and artificial neural networks has been used to classify ecological status in surface waters. This methodology has been proposed to deal efficiently with the non-linearity and highly subjective nature of variables involved in this serious problem. Ecological status has been assessed with biological, hydro-morphological, and physicochemical indicators. A data set collected from 378 sampling sites in the Ebro river basin has been used to train and validate the hybrid model. Up to 97.6% of sampling sites have been correctly classified with neural-fuzzy models. Such performance resulted very competitive when compared with other classification algorithms. With non-parametric classification-regression trees and probabilistic neural networks, the predictive capacities were 90.7% and 97.0%, respectively. The proposed methodology can support decision-makers in evaluation and classification of ecological status, as required by the EU Water Framework Directive. - Fuzzy inference systems can be used as environmental classifiers
Use of Multi-class Empirical Orthogonal Function for Identification of Hydrogeological Parameters and Spatiotemporal Pattern of Multiple Recharges in Groundwater Modeling

Science.gov (United States)

Huang, C. L.; Hsu, N. S.; Yeh, W. W. G.; Hsieh, I. H.

2017-12-01

This study develops an innovative calibration method for regional groundwater modeling by using multi-class empirical orthogonal functions (EOFs). The developed method is an iterative approach. Prior to carrying out the iterative procedures, the groundwater storage hydrographs associated with the observation wells are calculated. The combined multi-class EOF amplitudes and EOF expansion coefficients of the storage hydrographs are then used to compute the initial gauss of the temporal and spatial pattern of multiple recharges. The initial guess of the hydrogeological parameters are also assigned according to in-situ pumping experiment. The recharges include net rainfall recharge and boundary recharge, and the hydrogeological parameters are riverbed leakage conductivity, horizontal hydraulic conductivity, vertical hydraulic conductivity, storage coefficient, and specific yield. The first step of the iterative algorithm is to conduct the numerical model (i.e. MODFLOW) by the initial guess / adjusted values of the recharges and parameters. Second, in order to determine the best EOF combination of the error storage hydrographs for determining the correction vectors, the objective function is devised as minimizing the root mean square error (RMSE) of the simulated storage hydrographs. The error storage hydrograph are the differences between the storage hydrographs computed from observed and simulated groundwater level fluctuations. Third, adjust the values of recharges and parameters and repeat the iterative procedures until the stopping criterion is reached. The established methodology was applied to the groundwater system of Ming-Chu Basin, Taiwan. The study period is from January 1st to December 2ed in 2012. Results showed that the optimal EOF combination for the multiple recharges and hydrogeological parameters can decrease the RMSE of the simulated storage hydrographs dramatically within three calibration iterations. It represents that the iterative approach that
SpectraClassifier 1.0: a user friendly, automated MRS-based classifier-development system

Directory of Open Access Journals (Sweden)

Julià-Sapé Margarida

2010-02-01

Full Text Available Abstract Background SpectraClassifier (SC is a Java solution for designing and implementing Magnetic Resonance Spectroscopy (MRS-based classifiers. The main goal of SC is to allow users with minimum background knowledge of multivariate statistics to perform a fully automated pattern recognition analysis. SC incorporates feature selection (greedy stepwise approach, either forward or backward, and feature extraction (PCA. Fisher Linear Discriminant Analysis is the method of choice for classification. Classifier evaluation is performed through various methods: display of the confusion matrix of the training and testing datasets; K-fold cross-validation, leave-one-out and bootstrapping as well as Receiver Operating Characteristic (ROC curves. Results SC is composed of the following modules: Classifier design, Data exploration, Data visualisation, Classifier evaluation, Reports, and Classifier history. It is able to read low resolution in-vivo MRS (single-voxel and multi-voxel and high resolution tissue MRS (HRMAS, processed with existing tools (jMRUI, INTERPRET, 3DiCSI or TopSpin. In addition, to facilitate exchanging data between applications, a standard format capable of storing all the information needed for a dataset was developed. Each functionality of SC has been specifically validated with real data with the purpose of bug-testing and methods validation. Data from the INTERPRET project was used. Conclusions SC is a user-friendly software designed to fulfil the needs of potential users in the MRS community. It accepts all kinds of pre-processed MRS data types and classifies them semi-automatically, allowing spectroscopists to concentrate on interpretation of results with the use of its visualisation tools.
Optimization of short amino acid sequences classifier

Science.gov (United States)

Barcz, Aleksy; Szymański, Zbigniew

This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.
Classifier utility modeling and analysis of hypersonic inlet start/unstart considering training data costs

Science.gov (United States)

Chang, Juntao; Hu, Qinghua; Yu, Daren; Bao, Wen

2011-11-01

Start/unstart detection is one of the most important issues of hypersonic inlets and is also the foundation of protection control of scramjet. The inlet start/unstart detection can be attributed to a standard pattern classification problem, and the training sample costs have to be considered for the classifier modeling as the CFD numerical simulations and wind tunnel experiments of hypersonic inlets both cost time and money. To solve this problem, the CFD simulation of inlet is studied at first step, and the simulation results could provide the training data for pattern classification of hypersonic inlet start/unstart. Then the classifier modeling technology and maximum classifier utility theories are introduced to analyze the effect of training data cost on classifier utility. In conclusion, it is useful to introduce support vector machine algorithms to acquire the classifier model of hypersonic inlet start/unstart, and the minimum total cost of hypersonic inlet start/unstart classifier can be obtained by the maximum classifier utility theories.
Quantum ensembles of quantum classifiers.

Science.gov (United States)

Schuld, Maria; Petruccione, Francesco

2018-02-09

Quantum machine learning witnesses an increasing amount of quantum algorithms for data-driven decision making, a problem with potential applications ranging from automated image recognition to medical diagnosis. Many of those algorithms are implementations of quantum classifiers, or models for the classification of data inputs with a quantum computer. Following the success of collective decision making with ensembles in classical machine learning, this paper introduces the concept of quantum ensembles of quantum classifiers. Creating the ensemble corresponds to a state preparation routine, after which the quantum classifiers are evaluated in parallel and their combined decision is accessed by a single-qubit measurement. This framework naturally allows for exponentially large ensembles in which - similar to Bayesian learning - the individual classifiers do not have to be trained. As an example, we analyse an exponentially large quantum ensemble in which each classifier is weighed according to its performance in classifying the training data, leading to new results for quantum as well as classical machine learning.

Fuzziness-based active learning framework to enhance hyperspectral image classification performance for discriminative and generative classifiers.

Directory of Open Access Journals (Sweden)

Muhammad Ahmad

Full Text Available Hyperspectral image classification with a limited number of training samples without loss of accuracy is desirable, as collecting such data is often expensive and time-consuming. However, classifiers trained with limited samples usually end up with a large generalization error. To overcome the said problem, we propose a fuzziness-based active learning framework (FALF, in which we implement the idea of selecting optimal training samples to enhance generalization performance for two different kinds of classifiers, discriminative and generative (e.g. SVM and KNN. The optimal samples are selected by first estimating the boundary of each class and then calculating the fuzziness-based distance between each sample and the estimated class boundaries. Those samples that are at smaller distances from the boundaries and have higher fuzziness are chosen as target candidates for the training set. Through detailed experimentation on three publically available datasets, we showed that when trained with the proposed sample selection framework, both classifiers achieved higher classification accuracy and lower processing time with the small amount of training data as opposed to the case where the training samples were selected randomly. Our experiments demonstrate the effectiveness of our proposed method, which equates favorably with the state-of-the-art methods.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations.

Science.gov (United States)

Zhang, Yi; Ren, Jinchang; Jiang, Jianmin

2015-01-01

Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Combining MLC and SVM Classifiers for Learning Based Decision Making: Analysis and Evaluations

Directory of Open Access Journals (Sweden)

Yi Zhang

2015-01-01

Full Text Available Maximum likelihood classifier (MLC and support vector machines (SVM are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions.
Derivation of LDA log likelihood ratio one-to-one classifier

NARCIS (Netherlands)

Spreeuwers, Lieuwe Jan

2014-01-01

The common expression for the Likelihood Ratio classifier using LDA assumes that the reference class mean is available. In biometrics, this is often not the case and only a single sample of the reference class is available. In this paper expressions are derived for biometric comparison between
Quantum Ensemble Classification: A Sampling-Based Learning Control Approach.

Science.gov (United States)

Chen, Chunlin; Dong, Daoyi; Qi, Bo; Petersen, Ian R; Rabitz, Herschel

2017-06-01

Quantum ensemble classification (QEC) has significant applications in discrimination of atoms (or molecules), separation of isotopes, and quantum information extraction. However, quantum mechanics forbids deterministic discrimination among nonorthogonal states. The classification of inhomogeneous quantum ensembles is very challenging, since there exist variations in the parameters characterizing the members within different classes. In this paper, we recast QEC as a supervised quantum learning problem. A systematic classification methodology is presented by using a sampling-based learning control (SLC) approach for quantum discrimination. The classification task is accomplished via simultaneously steering members belonging to different classes to their corresponding target states (e.g., mutually orthogonal states). First, a new discrimination method is proposed for two similar quantum systems. Then, an SLC method is presented for QEC. Numerical results demonstrate the effectiveness of the proposed approach for the binary classification of two-level quantum ensembles and the multiclass classification of multilevel quantum ensembles.
Chemometrics and chromatographic fingerprints to classify plant food supplements according to the content of regulated plants.

Science.gov (United States)

Deconinck, E; Sokeng Djiogo, C A; Courselle, P

2017-09-05

Plant food supplements are gaining popularity, resulting in a broader spectrum of available products and an increased consumption. Next to the problem of adulteration of these products with synthetic drugs the presence of regulated or toxic plants is an important issue, especially when the products are purchased from irregular sources. This paper focusses on this problem by using specific chromatographic fingerprints for five targeted plants and chemometric classification techniques in order to extract the important information from the fingerprints and determine the presence of the targeted plants in plant food supplements in an objective way. Two approaches were followed: (1) a multiclass model, (2) 2-class model for each of the targeted plants separately. For both approaches good classification models were obtained, especially when using SIMCA and PLS-DA. For each model, misclassification rates for the external test set of maximum one sample could be obtained. The models were applied to five real samples resulting in the identification of the correct plants, confirmed by mass spectrometry. Therefore chromatographic fingerprinting combined with chemometric modelling can be considered interesting to make a more objective decision on whether a regulated plant is present in a plant food supplement or not, especially when no mass spectrometry equipment is available. The results suggest also that the use of a battery of 2-class models to screen for several plants is the approach to be preferred. Copyright © 2017 Elsevier B.V. All rights reserved.
Pattern recognition based on time-frequency analysis and convolutional neural networks for vibrational events in φ-OTDR

Science.gov (United States)

Xu, Chengjin; Guan, Junjun; Bao, Ming; Lu, Jiangang; Ye, Wei

2018-01-01

Based on vibration signals detected by a phase-sensitive optical time-domain reflectometer distributed optical fiber sensing system, this paper presents an implement of time-frequency analysis and convolutional neural network (CNN), used to classify different types of vibrational events. First, spectral subtraction and the short-time Fourier transform are used to enhance time-frequency features of vibration signals and transform different types of vibration signals into spectrograms, which are input to the CNN for automatic feature extraction and classification. Finally, by replacing the soft-max layer in the CNN with a multiclass support vector machine, the performance of the classifier is enhanced. Experiments show that after using this method to process 4000 vibration signal samples generated by four different vibration events, namely, digging, walking, vehicles passing, and damaging, the recognition rates of vibration events are over 90%. The experimental results prove that this method can automatically make an effective feature selection and greatly improve the classification accuracy of vibrational events in distributed optical fiber sensing systems.
Online Feature Transformation Learning for Cross-Domain Object Category Recognition.

Science.gov (United States)

Zhang, Xuesong; Zhuang, Yan; Wang, Wei; Pedrycz, Witold

2017-06-09

In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.
Bayes classifiers for imbalanced traffic accidents datasets.

Science.gov (United States)

Mujalli, Randa Oqab; López, Griselda; Garach, Laura

2016-03-01

Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.
Classifying galaxy spectra at 0.5 < z < 1 with self-organizing maps

Science.gov (United States)

Rahmani, S.; Teimoorinia, H.; Barmby, P.

2018-05-01

The spectrum of a galaxy contains information about its physical properties. Classifying spectra using templates helps elucidate the nature of a galaxy's energy sources. In this paper, we investigate the use of self-organizing maps in classifying galaxy spectra against templates. We trained semi-supervised self-organizing map networks using a set of templates covering the wavelength range from far ultraviolet to near infrared. The trained networks were used to classify the spectra of a sample of 142 galaxies with 0.5 K-means clustering, a supervised neural network, and chi-squared minimization. Spectra corresponding to quiescent galaxies were more likely to be classified similarly by all methods while starburst spectra showed more variability. Compared to classification using chi-squared minimization or the supervised neural network, the galaxies classed together by the self-organizing map had more similar spectra. The class ordering provided by the one-dimensional self-organizing maps corresponds to an ordering in physical properties, a potentially important feature for the exploration of large datasets.
IAEA safeguards and classified materials

International Nuclear Information System (INIS)

Pilat, J.F.; Eccleston, G.W.; Fearey, B.L.; Nicholas, N.J.; Tape, J.W.; Kratzer, M.

1997-01-01

The international community in the post-Cold War period has suggested that the International Atomic Energy Agency (IAEA) utilize its expertise in support of the arms control and disarmament process in unprecedented ways. The pledges of the US and Russian presidents to place excess defense materials, some of which are classified, under some type of international inspections raises the prospect of using IAEA safeguards approaches for monitoring classified materials. A traditional safeguards approach, based on nuclear material accountancy, would seem unavoidably to reveal classified information. However, further analysis of the IAEA's safeguards approaches is warranted in order to understand fully the scope and nature of any problems. The issues are complex and difficult, and it is expected that common technical understandings will be essential for their resolution. Accordingly, this paper examines and compares traditional safeguards item accounting of fuel at a nuclear power station (especially spent fuel) with the challenges presented by inspections of classified materials. This analysis is intended to delineate more clearly the problems as well as reveal possible approaches, techniques, and technologies that could allow the adaptation of safeguards to the unprecedented task of inspecting classified materials. It is also hoped that a discussion of these issues can advance ongoing political-technical debates on international inspections of excess classified materials
Multi-Class Simultaneous Adaptive Segmentation and Quality Control of Point Cloud Data

Directory of Open Access Journals (Sweden)

Ayman Habib

2016-01-01

Full Text Available 3D modeling of a given site is an important activity for a wide range of applications including urban planning, as-built mapping of industrial sites, heritage documentation, military simulation, and outdoor/indoor analysis of airflow. Point clouds, which could be either derived from passive or active imaging systems, are an important source for 3D modeling. Such point clouds need to undergo a sequence of data processing steps to derive the necessary information for the 3D modeling process. Segmentation is usually the first step in the data processing chain. This paper presents a region-growing multi-class simultaneous segmentation procedure, where planar, pole-like, and rough regions are identified while considering the internal characteristics (i.e., local point density/spacing and noise level of the point cloud in question. The segmentation starts with point cloud organization into a kd-tree data structure and characterization process to estimate the local point density/spacing. Then, proceeding from randomly-distributed seed points, a set of seed regions is derived through distance-based region growing, which is followed by modeling of such seed regions into planar and pole-like features. Starting from optimally-selected seed regions, planar and pole-like features are then segmented. The paper also introduces a list of hypothesized artifacts/problems that might take place during the region-growing process. Finally, a quality control process is devised to detect, quantify, and mitigate instances of partially/fully misclassified planar and pole-like features. Experimental results from airborne and terrestrial laser scanning as well as image-based point clouds are presented to illustrate the performance of the proposed segmentation and quality control framework.
Predicting Classifier Performance with Limited Training Data: Applications to Computer-Aided Diagnosis in Breast and Prostate Cancer

Science.gov (United States)

Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant

2015-01-01

Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029
Classifying features in CT imagery: accuracy for some single- and multiple-species classifiers

Science.gov (United States)

Daniel L. Schmoldt; Jing He; A. Lynn Abbott

1998-01-01

Our current approach to automatically label features in CT images of hardwood logs classifies each pixel of an image individually. These feature classifiers use a back-propagation artificial neural network (ANN) and feature vectors that include a small, local neighborhood of pixels and the distance of the target pixel to the center of the log. Initially, this type of...
Cross-Layer Resource Allocation for Variable Bit Rate Multiclass Services in a Multirate Multicarrier DS-CDMA Network

Directory of Open Access Journals (Sweden)

Kee-Chaing Chua

2005-02-01

Full Text Available An approximate analytical formulation of the resource allocation problem for handling variable bit rate multiclass services in a cellular round-robin carrier-hopping multirate multicarrier direct-sequence code-division multiple-access (MC-DS-CDMA system is presented. In this paper, all grade-of-service (GoS or quality-of-service (QoS requirements at the connection level, packet level, and link layer are satisfied simultaneously in the system, instead of being satisfied at the connection level or at the link layer only. The analytical formulation shows how the GoS/QoS in the different layers are intertwined across the layers. A novelty of this paper is that the outages in the subcarriers are minimized by spreading the subcarriers' signal-to-interference ratio evenly among all the subcarriers by using a dynamic round-robin carrier-hopping allocation scheme. A complete sharing (CS scheme with guard capacity is used for the resource sharing policy at the connection level based on the mean rates of the connections. Numerical results illustrate that significant gain in the system utilization is achieved through the joint coupling of connection/packet levels and link layer.
15 CFR 4.8 - Classified Information.

Science.gov (United States)

2010-01-01

... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Classified Information. 4.8 Section 4... INFORMATION Freedom of Information Act § 4.8 Classified Information. In processing a request for information..., the information shall be reviewed to determine whether it should remain classified. Ordinarily the...
Hybrid Radar Emitter Recognition Based on Rough k-Means Classifier and Relevance Vector Machine

Science.gov (United States)

Yang, Zhutian; Wu, Zhilu; Yin, Zhendong; Quan, Taifan; Sun, Hongjian

2013-01-01

Due to the increasing complexity of electromagnetic signals, there exists a significant challenge for recognizing radar emitter signals. In this paper, a hybrid recognition approach is presented that classifies radar emitter signals by exploiting the different separability of samples. The proposed approach comprises two steps, namely the primary signal recognition and the advanced signal recognition. In the former step, a novel rough k-means classifier, which comprises three regions, i.e., certain area, rough area and uncertain area, is proposed to cluster the samples of radar emitter signals. In the latter step, the samples within the rough boundary are used to train the relevance vector machine (RVM). Then RVM is used to recognize the samples in the uncertain area; therefore, the classification accuracy is improved. Simulation results show that, for recognizing radar emitter signals, the proposed hybrid recognition approach is more accurate, and presents lower computational complexity than traditional approaches. PMID:23344380
Fingerprint prediction using classifier ensembles

CSIR Research Space (South Africa)

Molale, P

2011-11-01

Full Text Available ); logistic discrimination (LgD), k-nearest neighbour (k-NN), artificial neural network (ANN), association rules (AR) decision tree (DT), naive Bayes classifier (NBC) and the support vector machine (SVM). The performance of several multiple classifier systems...
Defining and Classifying Interest Groups

DEFF Research Database (Denmark)

Baroni, Laura; Carroll, Brendan; Chalmers, Adam

2014-01-01

The interest group concept is defined in many different ways in the existing literature and a range of different classification schemes are employed. This complicates comparisons between different studies and their findings. One of the important tasks faced by interest group scholars engaged...... in large-N studies is therefore to define the concept of an interest group and to determine which classification scheme to use for different group types. After reviewing the existing literature, this article sets out to compare different approaches to defining and classifying interest groups with a sample...... in the organizational attributes of specific interest group types. As expected, our comparison of coding schemes reveals a closer link between group attributes and group type in narrower classification schemes based on group organizational characteristics than those based on a behavioral definition of lobbying....
Classifier Directed Data Hybridization for Geographic Sample Supervised Segment Generation

Directory of Open Access Journals (Sweden)

Christoff Fourie

2014-11-01

Full Text Available Quality segment generation is a well-known challenge and research objective within Geographic Object-based Image Analysis (GEOBIA. Although methodological avenues within GEOBIA are diverse, segmentation commonly plays a central role in most approaches, influencing and being influenced by surrounding processes. A general approach using supervised quality measures, specifically user provided reference segments, suggest casting the parameters of a given segmentation algorithm as a multidimensional search problem. In such a sample supervised segment generation approach, spatial metrics observing the user provided reference segments may drive the search process. The search is commonly performed by metaheuristics. A novel sample supervised segment generation approach is presented in this work, where the spectral content of provided reference segments is queried. A one-class classification process using spectral information from inside the provided reference segments is used to generate a probability image, which in turn is employed to direct a hybridization of the original input imagery. Segmentation is performed on such a hybrid image. These processes are adjustable, interdependent and form a part of the search problem. Results are presented detailing the performances of four method variants compared to the generic sample supervised segment generation approach, under various conditions in terms of resultant segment quality, required computing time and search process characteristics. Multiple metrics, metaheuristics and segmentation algorithms are tested with this approach. Using the spectral data contained within user provided reference segments to tailor the output generally improves the results in the investigated problem contexts, but at the expense of additional required computing time.

Evaluation of Diagnostic Tests Using Information Theory for Multi-Class Diagnostic Problems and its Application for the Detection of Occlusal Caries Lesions

Directory of Open Access Journals (Sweden)

Umut Arslan

2014-09-01

Full Text Available Background: Several methods are available to evaluate the performance of the tests when the purpose of the diagnostic test is to discriminate between two possible disease states. However multi-class diagnostic problems frequently appear in many areas of medical science. Hence, there is a need for methods which will enable us to characterize the accuracy of diagnostic tests when there are more than two possible disease states. Aims: To show that two information theory measures, information content (IC and proportional reduction in diagnostic uncertainty (PRDU, can be used for the evaluation of the performance of diagnostic tests for multi-class diagnostic problems that may appear in different areas of medical science. Study Design: Diagnostic accuracy study. Methods: Sixty freshly extracted permanent human molar and premolar teeth suspected to have occlusal caries lesions were selected for the study and were assessed by two experienced examiners. Each examiner performed two evaluations. Histological examination was used as the gold standard. The scores of the histological examination were defined as sound (n=11, enamel caries (n=22 and dentin caries (n=27. Diagnostic performance of i visual inspection, ii radiography, iii laser fluorescence (LF and iv micro-computed tomography (M-CT caries detection methods was evaluated by calculating IC and PRDU. Results: Micro-computed tomography examination was the best method among the diagnostic techniques for the diagnosis of occlusal caries in terms of both IC and PRDU. M-CT examination supplied the maximum diagnostic information about the diagnosis of occlusal caries in the first (IC: 1.056; p<0.05, (PRDU: 70.5% and second evaluation (IC: 1.105; p<0.05, (PRDU: 73.8% for the first examiner. M-CT examination was the best method among the diagnostic techniques for the second examiner in both the first (IC:1.105; p<0.05, (PRDU:73.8% and second evaluation (IC:1.061; p<0.05, (PRDU:70.8%. IC and PRDU were
adabag: An R Package for Classification with Boosting and Bagging

Directory of Open Access Journals (Sweden)

Esteban Alfaro

2013-09-01

Full Text Available Boosting and bagging are two widely used ensemble methods for classification. Their common goal is to improve the accuracy of a classifier combining single classifiers which are slightly better than random guessing. Among the family of boosting algorithms, AdaBoost (adaptive boosting is the best known, although it is suitable only for dichotomous tasks. AdaBoost.M1 and SAMME (stagewise additive modeling using a multi-class exponential loss function are two easy and natural extensions to the general case of two or more classes. In this paper, the adabag R package is introduced. This version implements AdaBoost.M1, SAMME and bagging algorithms with classification trees as base classifiers. Once the ensembles have been trained, they can be used to predict the class of new samples. The accuracy of these classifiers can be estimated in a separated data set or through cross validation. Moreover, the evolution of the error as the ensemble grows can be analysed and the ensemble can be pruned. In addition, the margin in the class prediction and the probability of each class for the observations can be calculated. Finally, several classic examples in classification literature are shown to illustrate the use of this package.
Interface Prostheses With Classifier-Feedback-Based User Training.

Science.gov (United States)

Fang, Yinfeng; Zhou, Dalin; Li, Kairu; Liu, Honghai

2017-11-01

It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well as the centroids of the training samples, whose dimensionality is reduced to minimal number by dimension reduction. Clustering feedback provides a criterion that guides users to adjust motion gestures and muscle contraction forces intentionally. The experiment results have demonstrated that hand motion recognition accuracy increases steadily along the progress of the clustering-feedback-based user training, while conventional classifier-feedback methods, i.e., label feedback, hardly achieve any improvement. The result concludes that the use of proper classifier feedback can accelerate the process of user training, and implies prosperous future for the amputees with limited or no experience in pattern-recognition-based prosthetic device manipulation.It is evident that user training significantly affects performance of pattern-recognition-based myoelectric prosthetic device control. Despite plausible classification accuracy on offline datasets, online accuracy usually suffers from the changes in physiological conditions and electrode displacement. The user ability in generating consistent electromyographic (EMG) patterns can be enhanced via proper user training strategies in order to improve online performance. This study proposes a clustering-feedback strategy that provides real-time feedback to users by means of a visualized online EMG signal input as well
Classifying Sluice Occurrences in Dialogue

DEFF Research Database (Denmark)

Baird, Austin; Hamza, Anissa; Hardt, Daniel

2018-01-01

perform manual annotation with acceptable inter-coder agreement. We build classifier models with Decision Trees and Naive Bayes, with accuracy of 67%. We deploy a classifier to automatically classify sluice occurrences in OpenSubtitles, resulting in a corpus with 1.7 million occurrences. This will support....... Despite this, the corpus can be of great use in research on sluicing and development of systems, and we are making the corpus freely available on request. Furthermore, we are in the process of improving the accuracy of sluice identification and annotation for the purpose of created a subsequent version...
Classifiers based on optimal decision rules

KAUST Repository

Amin, Talha

2013-11-25

Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).
Classifiers based on optimal decision rules

KAUST Repository

Amin, Talha M.; Chikalov, Igor; Moshkov, Mikhail; Zielosko, Beata

2013-01-01

Based on dynamic programming approach we design algorithms for sequential optimization of exact and approximate decision rules relative to the length and coverage [3, 4]. In this paper, we use optimal rules to construct classifiers, and study two questions: (i) which rules are better from the point of view of classification-exact or approximate; and (ii) which order of optimization gives better results of classifier work: length, length+coverage, coverage, or coverage+length. Experimental results show that, on average, classifiers based on exact rules are better than classifiers based on approximate rules, and sequential optimization (length+coverage or coverage+length) is better than the ordinary optimization (length or coverage).
Discriminative motif discovery via simulated evolution and random under-sampling.

Directory of Open Access Journals (Sweden)

Tao Song

Full Text Available Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
Discriminative motif discovery via simulated evolution and random under-sampling.

Science.gov (United States)

Song, Tao; Gu, Hong

2014-01-01

Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
"When 'Bad' is 'Good'": Identifying Personal Communication and Sentiment in Drug-Related Tweets.

Science.gov (United States)

Daniulaityte, Raminta; Chen, Lu; Lamy, Francois R; Carlson, Robert G; Thirunarayan, Krishnaprasad; Sheth, Amit

2016-10-24

To harness the full potential of social media for epidemiological surveillance of drug abuse trends, the field needs a greater level of automation in processing and analyzing social media content. The objective of the study is to describe the development of supervised machine-learning techniques for the eDrugTrends platform to automatically classify tweets by type/source of communication (personal, official/media, retail) and sentiment (positive, negative, neutral) expressed in cannabis- and synthetic cannabinoid-related tweets. Tweets were collected using Twitter streaming Application Programming Interface and filtered through the eDrugTrends platform using keywords related to cannabis, marijuana edibles, marijuana concentrates, and synthetic cannabinoids. After creating coding rules and assessing intercoder reliability, a manually labeled data set (N=4000) was developed by coding several batches of randomly selected subsets of tweets extracted from the pool of 15,623,869 collected by eDrugTrends (May-November 2015). Out of 4000 tweets, 25% (1000/4000) were used to build source classifiers and 75% (3000/4000) were used for sentiment classifiers. Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machines (SVM) were used to train the classifiers. Source classification (n=1000) tested Approach 1 that used short URLs, and Approach 2 where URLs were expanded and included into the bag-of-words analysis. For sentiment classification, Approach 1 used all tweets, regardless of their source/type (n=3000), while Approach 2 applied sentiment classification to personal communication tweets only (2633/3000, 88%). Multiclass and binary classification tasks were examined, and machine-learning sentiment classifier performance was compared with Valence Aware Dictionary for sEntiment Reasoning (VADER), a lexicon and rule-based method. The performance of each classifier was assessed using 5-fold cross validation that calculated average F-scores. One-tailed t test was
A support vector machine (SVM) based voltage stability classifier

Energy Technology Data Exchange (ETDEWEB)

Dosano, R.D.; Song, H. [Kunsan National Univ., Kunsan, Jeonbuk (Korea, Republic of); Lee, B. [Korea Univ., Seoul (Korea, Republic of)

2007-07-01

Power system stability has become even more complex and critical with the advent of deregulated energy markets and the growing desire to completely employ existing transmission and infrastructure. The economic pressure on electricity markets forces the operation of power systems and components to their limit of capacity and performance. System conditions can be more exposed to instability due to greater uncertainty in day to day system operations and increase in the number of potential components for system disturbances potentially resulting in voltage stability. This paper proposed a support vector machine (SVM) based power system voltage stability classifier using local measurements of voltage and active power of load. It described the procedure for fast classification of long-term voltage stability using the SVM algorithm. The application of the SVM based voltage stability classifier was presented with reference to the choice of input parameters; input data preconditioning; moving window for feature vector; determination of learning samples; and other considerations in SVM applications. The paper presented a case study with numerical examples of an 11-bus test system. The test results for the feasibility study demonstrated that the classifier could offer an excellent performance in classification with time-series measurements in terms of long-term voltage stability. 9 refs., 14 figs.
Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

Directory of Open Access Journals (Sweden)

Shehzad Khalid

2014-01-01

Full Text Available We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.
Discovery and validation of a colorectal cancer classifier in a new blood test with improved performance for high-risk subjects

DEFF Research Database (Denmark)

Croner, Lisa J; Dillon, Roslyn; Kao, Athit

2017-01-01

BACKGROUND: The aim was to improve upon an existing blood-based colorectal cancer (CRC) test directed to high-risk symptomatic patients, by developing a new CRC classifier to be used with a new test embodiment. The new test uses a robust assay format-electrochemiluminescence immunoassays......, the indeterminate rate of the new panel was 23.2%, sensitivity/specificity was 0.80/0.83, PPV was 36.5%, and NPV was 97.1%. CONCLUSIONS: The validated classifier serves as the basis of a new blood-based CRC test for symptomatic patients. The improved performance, resulting from robust concentration measures across......-to quantify protein concentrations. The aim was achieved by building and validating a CRC classifier using concentration measures from a large sample set representing a true intent-to-test (ITT) symptomatic population. METHODS: 4435 patient samples were drawn from the Endoscopy II sample set. Samples were...
76 FR 34761 - Classified National Security Information

Science.gov (United States)

2011-06-14

... MARINE MAMMAL COMMISSION Classified National Security Information [Directive 11-01] AGENCY: Marine... Commission's (MMC) policy on classified information, as directed by Information Security Oversight Office... of Executive Order 13526, ``Classified National Security Information,'' and 32 CFR part 2001...
Error minimizing algorithms for nearest eighbor classifiers

Energy Technology Data Exchange (ETDEWEB)

Porter, Reid B [Los Alamos National Laboratory; Hush, Don [Los Alamos National Laboratory; Zimmer, G. Beate [TEXAS A& M

2011-01-03

Stack Filters define a large class of discrete nonlinear filter first introd uced in image and signal processing for noise removal. In recent years we have suggested their application to classification problems, and investigated their relationship to other types of discrete classifiers such as Decision Trees. In this paper we focus on a continuous domain version of Stack Filter Classifiers which we call Ordered Hypothesis Machines (OHM), and investigate their relationship to Nearest Neighbor classifiers. We show that OHM classifiers provide a novel framework in which to train Nearest Neighbor type classifiers by minimizing empirical error based loss functions. We use the framework to investigate a new cost sensitive loss function that allows us to train a Nearest Neighbor type classifier for low false alarm rate applications. We report results on both synthetic data and real-world image data.
Aggregation Operator Based Fuzzy Pattern Classifier Design

DEFF Research Database (Denmark)

Mönks, Uwe; Larsen, Henrik Legind; Lohweg, Volker

2009-01-01

This paper presents a novel modular fuzzy pattern classifier design framework for intelligent automation systems, developed on the base of the established Modified Fuzzy Pattern Classifier (MFPC) and allows designing novel classifier models which are hardware-efficiently implementable....... The performances of novel classifiers using substitutes of MFPC's geometric mean aggregator are benchmarked in the scope of an image processing application against the MFPC to reveal classification improvement potentials for obtaining higher classification rates....
Clean-up progress at the SNL/NM Classified Waste Landfill

International Nuclear Information System (INIS)

Slavin, P.J.; Galloway, R.B.

1999-01-01

The Sandia National Laboratories/New Mexico (SNL/NM)Environmental Restoration Project is currently excavating the Classified Waste Landfill in Technical Area II, a disposal area for weapon components for approximately 40 years until it closed in 1987. Many different types of classified parts were disposed in unlined trenches and pits throughout the course of the landfill's history. A percentage of the parts contain explosives and/or radioactive components or contamination. The excavation has progressed backward chronologically from the last trenches filled through to the earlier pits. Excavation commenced in March 1998, and approximately 75 percent of the site (as defined by geophysical anomalies) has been completed as of November 1999. The material excavated consists primarily of classified weapon assemblies and related components, so disposition must include demilitarization and sanitization. This has resulted in substantial waste minimization and cost avoidance for the project as upwards of 90 percent of the classified materials are being demilitarized and recycled. The project is using field screening and lab analysis in conjunction with preliminary and in-process risk assessments to characterize soil and make waste determinations in a timely a fashion as possible. Challenges in waste management have prompted the adoption of innovative solutions. The hand-picked crew (both management and field staff) and the ability to quickly adapt to changing conditions has ensured the success of the project. The current schedule is to complete excavation in July 2000, with follow-on verification sampling, demilitarization, and waste management activities following
Combined Kernel-Based BDT-SMO Classification of Hyperspectral Fused Images

Directory of Open Access Journals (Sweden)

Fenghua Huang

2014-01-01

Full Text Available To solve the poor generalization and flexibility problems that single kernel SVM classifiers have while classifying combined spectral and spatial features, this paper proposed a solution to improve the classification accuracy and efficiency of hyperspectral fused images: (1 different radial basis kernel functions (RBFs are employed for spectral and textural features, and a new combined radial basis kernel function (CRBF is proposed by combining them in a weighted manner; (2 the binary decision tree-based multiclass SMO (BDT-SMO is used in the classification of hyperspectral fused images; (3 experiments are carried out, where the single radial basis function- (SRBF- based BDT-SMO classifier and the CRBF-based BDT-SMO classifier are used, respectively, to classify the land usages of hyperspectral fused images, and genetic algorithms (GA are used to optimize the kernel parameters of the classifiers. The results show that, compared with SRBF, CRBF-based BDT-SMO classifiers display greater classification accuracy and efficiency.
Composite Classifiers for Automatic Target Recognition

National Research Council Canada - National Science Library

Wang, Lin-Cheng

1998-01-01

...) using forward-looking infrared (FLIR) imagery. Two existing classifiers, one based on learning vector quantization and the other on modular neural networks, are used as the building blocks for our composite classifiers...
TU-FG-209-12: Treatment Site and View Recognition in X-Ray Images with Hierarchical Multiclass Recognition Models

Energy Technology Data Exchange (ETDEWEB)

Chang, X; Mazur, T; Yang, D [Washington University in St Louis, St Louis, MO (United States)

2016-06-15

Purpose: To investigate an approach of automatically recognizing anatomical sites and imaging views (the orientation of the image acquisition) in 2D X-ray images. Methods: A hierarchical (binary tree) multiclass recognition model was developed to recognize the treatment sites and views in x-ray images. From top to bottom of the tree, the treatment sites are grouped hierarchically from more general to more specific. Each node in the hierarchical model was designed to assign images to one of two categories of anatomical sites. The binary image classification function of each node in the hierarchical model is implemented by using a PCA transformation and a support vector machine (SVM) model. The optimal PCA transformation matrices and SVM models are obtained by learning from a set of sample images. Alternatives of the hierarchical model were developed to support three scenarios of site recognition that may happen in radiotherapy clinics, including two or one X-ray images with or without view information. The performance of the approach was tested with images of 120 patients from six treatment sites – brain, head-neck, breast, lung, abdomen and pelvis – with 20 patients per site and two views (AP and RT) per patient. Results: Given two images in known orthogonal views (AP and RT), the hierarchical model achieved a 99% average F1 score to recognize the six sites. Site specific view recognition models have 100 percent accuracy. The computation time to process a new patient case (preprocessing, site and view recognition) is 0.02 seconds. Conclusion: The proposed hierarchical model of site and view recognition is effective and computationally efficient. It could be useful to automatically and independently confirm the treatment sites and views in daily setup x-ray 2D images. It could also be applied to guide subsequent image processing tasks, e.g. site and view dependent contrast enhancement and image registration. The senior author received research grants from View
TU-FG-209-12: Treatment Site and View Recognition in X-Ray Images with Hierarchical Multiclass Recognition Models

International Nuclear Information System (INIS)

Chang, X; Mazur, T; Yang, D

2016-01-01

Purpose: To investigate an approach of automatically recognizing anatomical sites and imaging views (the orientation of the image acquisition) in 2D X-ray images. Methods: A hierarchical (binary tree) multiclass recognition model was developed to recognize the treatment sites and views in x-ray images. From top to bottom of the tree, the treatment sites are grouped hierarchically from more general to more specific. Each node in the hierarchical model was designed to assign images to one of two categories of anatomical sites. The binary image classification function of each node in the hierarchical model is implemented by using a PCA transformation and a support vector machine (SVM) model. The optimal PCA transformation matrices and SVM models are obtained by learning from a set of sample images. Alternatives of the hierarchical model were developed to support three scenarios of site recognition that may happen in radiotherapy clinics, including two or one X-ray images with or without view information. The performance of the approach was tested with images of 120 patients from six treatment sites – brain, head-neck, breast, lung, abdomen and pelvis – with 20 patients per site and two views (AP and RT) per patient. Results: Given two images in known orthogonal views (AP and RT), the hierarchical model achieved a 99% average F1 score to recognize the six sites. Site specific view recognition models have 100 percent accuracy. The computation time to process a new patient case (preprocessing, site and view recognition) is 0.02 seconds. Conclusion: The proposed hierarchical model of site and view recognition is effective and computationally efficient. It could be useful to automatically and independently confirm the treatment sites and views in daily setup x-ray 2D images. It could also be applied to guide subsequent image processing tasks, e.g. site and view dependent contrast enhancement and image registration. The senior author received research grants from View

Localization and Recognition of Dynamic Hand Gestures Based on Hierarchy of Manifold Classifiers

Science.gov (United States)

Favorskaya, M.; Nosov, A.; Popov, A.

2015-05-01

Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case). Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset "Multi-modal Gesture Recognition Challenge 2013: Dataset and Results" including 393 dynamic hand-gestures was chosen. The proposed method yielded 84-91% recognition accuracy, in average, for restricted set of dynamic gestures.
LOCALIZATION AND RECOGNITION OF DYNAMIC HAND GESTURES BASED ON HIERARCHY OF MANIFOLD CLASSIFIERS

Directory of Open Access Journals (Sweden)

M. Favorskaya

2015-05-01

Full Text Available Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin detector, normalized skeleton representation of one or two hands, and motion history representing by motion vectors normalized through predetermined directions (8 and 16 in our case. Each dynamic gesture is separated into a set of sub-gestures in order to predict a trajectory and remove those samples of gestures, which do not satisfy to current trajectory. The posture classifiers involve the normalized skeleton representation of palm and fingers and relative finger positions using fingertips. The min-max criterion is used for trajectory recognition, and the decision tree technique was applied for posture recognition of sub-gestures. For experiments, a dataset “Multi-modal Gesture Recognition Challenge 2013: Dataset and Results” including 393 dynamic hand-gestures was chosen. The proposed method yielded 84–91% recognition accuracy, in average, for restricted set of dynamic gestures.
Super resolution reconstruction of infrared images based on classified dictionary learning

Science.gov (United States)

Liu, Fei; Han, Pingli; Wang, Yi; Li, Xuan; Bai, Lu; Shao, Xiaopeng

2018-05-01

Infrared images always suffer from low-resolution problems resulting from limitations of imaging devices. An economical approach to combat this problem involves reconstructing high-resolution images by reasonable methods without updating devices. Inspired by compressed sensing theory, this study presents and demonstrates a Classified Dictionary Learning method to reconstruct high-resolution infrared images. It classifies features of the samples into several reasonable clusters and trained a dictionary pair for each cluster. The optimal pair of dictionaries is chosen for each image reconstruction and therefore, more satisfactory results is achieved without the increase in computational complexity and time cost. Experiments and results demonstrated that it is a viable method for infrared images reconstruction since it improves image resolution and recovers detailed information of targets.
RRHGE: A Novel Approach to Classify the Estrogen Receptor Based Breast Cancer Subtypes

Directory of Open Access Journals (Sweden)

Ashish Saini

2014-01-01

Full Text Available Background. Breast cancer is the most common type of cancer among females with a high mortality rate. It is essential to classify the estrogen receptor based breast cancer subtypes into correct subclasses, so that the right treatments can be applied to lower the mortality rate. Using gene signatures derived from gene interaction networks to classify breast cancers has proven to be more reproducible and can achieve higher classification performance. However, the interactions in the gene interaction network usually contain many false-positive interactions that do not have any biological meanings. Therefore, it is a challenge to incorporate the reliability assessment of interactions when deriving gene signatures from gene interaction networks. How to effectively extract gene signatures from available resources is critical to the success of cancer classification. Methods. We propose a novel method to measure and extract the reliable (biologically true or valid interactions from gene interaction networks and incorporate the extracted reliable gene interactions into our proposed RRHGE algorithm to identify significant gene signatures from microarray gene expression data for classifying ER+ and ER− breast cancer samples. Results. The evaluation on real breast cancer samples showed that our RRHGE algorithm achieved higher classification accuracy than the existing approaches.
Distance and Density Similarity Based Enhanced k-NN Classifier for Improving Fault Diagnosis Performance of Bearings

Directory of Open Access Journals (Sweden)

Sharif Uddin

2016-01-01

Full Text Available An enhanced k-nearest neighbor (k-NN classification algorithm is presented, which uses a density based similarity measure in addition to a distance based similarity measure to improve the diagnostic performance in bearing fault diagnosis. Due to its use of distance based similarity measure alone, the classification accuracy of traditional k-NN deteriorates in case of overlapping samples and outliers and is highly susceptible to the neighborhood size, k. This study addresses these limitations by proposing the use of both distance and density based measures of similarity between training and test samples. The proposed k-NN classifier is used to enhance the diagnostic performance of a bearing fault diagnosis scheme, which classifies different fault conditions based upon hybrid feature vectors extracted from acoustic emission (AE signals. Experimental results demonstrate that the proposed scheme, which uses the enhanced k-NN classifier, yields better diagnostic performance and is more robust to variations in the neighborhood size, k.
Use of Unlabeled Samples for Mitigating the Hughes Phenomenon

Science.gov (United States)

Landgrebe, David A.; Shahshahani, Behzad M.

1993-01-01

The use of unlabeled samples in improving the performance of classifiers is studied. When the number of training samples is fixed and small, additional feature measurements may reduce the performance of a statistical classifier. It is shown that by using unlabeled samples, estimates of the parameters can be improved and therefore this phenomenon may be mitigated. Various methods for using unlabeled samples are reviewed and experimental results are provided.
Hybrid Neuro-Fuzzy Classifier Based On Nefclass Model

Directory of Open Access Journals (Sweden)

Bogdan Gliwa

2011-01-01

Full Text Available The paper presents hybrid neuro-fuzzy classifier, based on NEFCLASS model, which wasmodified. The presented classifier was compared to popular classifiers – neural networks andk-nearest neighbours. Efficiency of modifications in classifier was compared with methodsused in original model NEFCLASS (learning methods. Accuracy of classifier was testedusing 3 datasets from UCI Machine Learning Repository: iris, wine and breast cancer wisconsin.Moreover, influence of ensemble classification methods on classification accuracy waspresented.
Verification of Representative Sampling in RI waste

International Nuclear Information System (INIS)

Ahn, Hong Joo; Song, Byung Cheul; Sohn, Se Cheul; Song, Kyu Seok; Jee, Kwang Yong; Choi, Kwang Seop

2009-01-01

For evaluating the radionuclide inventories for RI wastes, representative sampling is one of the most important parts in the process of radiochemical assay. Sampling to characterized RI waste conditions typically has been based on judgment or convenience sampling of individual or groups. However, it is difficult to get a sample representatively among the numerous drums. In addition, RI waste drums might be classified into heterogeneous wastes because they have a content of cotton, glass, vinyl, gloves, etc. In order to get the representative samples, the sample to be analyzed must be collected from selected every drum. Considering the expense and time of analysis, however, the number of sample has to be minimized. In this study, RI waste drums were classified by the various conditions of the half-life, surface dose, acceptance date, waste form, generator, etc. A sample for radiochemical assay was obtained through mixing samples of each drum. The sample has to be prepared for radiochemical assay and although the sample should be reasonably uniform, it is rare that a completely homogeneous material is received. Every sample is shredded by a 1 ∼ 2 cm 2 diameter and a representative aliquot taken for the required analysis. For verification of representative sampling, classified every group is tested for evaluation of 'selection of representative drum in a group' and 'representative sampling in a drum'
SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data.

Science.gov (United States)

Pirooznia, Mehdi; Deng, Youping

2006-12-12

Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1-BRCA2 samples with RBF kernel of SVM. We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at http://mfgn.usm.edu/ebl/svm/.
Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling

OpenAIRE

Song, Tao; Gu, Hong

2014-01-01

Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the sta...
36 CFR 1256.46 - National security-classified information.

Science.gov (United States)

2010-07-01

... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false National security-classified... Restrictions § 1256.46 National security-classified information. In accordance with 5 U.S.C. 552(b)(1), NARA... properly classified under the provisions of the pertinent Executive Order on Classified National Security...
Class-specific Error Bounds for Ensemble Classifiers

Energy Technology Data Exchange (ETDEWEB)

Prenger, R; Lemmond, T; Varshney, K; Chen, B; Hanley, W

2009-10-06

The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missed detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.
Deconvolution When Classifying Noisy Data Involving Transformations

KAUST Repository

Carroll, Raymond

2012-09-01

In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.
Deconvolution When Classifying Noisy Data Involving Transformations.

Science.gov (United States)

Carroll, Raymond; Delaigle, Aurore; Hall, Peter

2012-09-01

In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.
Just-in-time classifiers for recurrent concepts.

Science.gov (United States)

Alippi, Cesare; Boracchi, Giacomo; Roveri, Manuel

2013-04-01

Just-in-time (JIT) classifiers operate in evolving environments by classifying instances and reacting to concept drift. In stationary conditions, a JIT classifier improves its accuracy over time by exploiting additional supervised information coming from the field. In nonstationary conditions, however, the classifier reacts as soon as concept drift is detected; the current classification setup is discarded and a suitable one activated to keep the accuracy high. We present a novel generation of JIT classifiers able to deal with recurrent concept drift by means of a practical formalization of the concept representation and the definition of a set of operators working on such representations. The concept-drift detection activity, which is crucial in promptly reacting to changes exactly when needed, is advanced by considering change-detection tests monitoring both inputs and classes distributions.
Deconvolution When Classifying Noisy Data Involving Transformations

KAUST Repository

Carroll, Raymond; Delaigle, Aurore; Hall, Peter

2012-01-01

In the present study, we consider the problem of classifying spatial data distorted by a linear transformation or convolution and contaminated by additive random noise. In this setting, we show that classifier performance can be improved if we carefully invert the data before the classifier is applied. However, the inverse transformation is not constructed so as to recover the original signal, and in fact, we show that taking the latter approach is generally inadvisable. We introduce a fully data-driven procedure based on cross-validation, and use several classifiers to illustrate numerical properties of our approach. Theoretical arguments are given in support of our claims. Our procedure is applied to data generated by light detection and ranging (Lidar) technology, where we improve on earlier approaches to classifying aerosols. This article has supplementary materials online.
Comparing classifiers for pronunciation error detection

NARCIS (Netherlands)

Strik, H.; Truong, K.; Wet, F. de; Cucchiarini, C.

2007-01-01

Providing feedback on pronunciation errors in computer assisted language learning systems requires that pronunciation errors be detected automatically. In the present study we compare four types of classifiers that can be used for this purpose: two acoustic-phonetic classifiers (one of which employs
Classifier Fusion With Contextual Reliability Evaluation.

Science.gov (United States)

Liu, Zhunga; Pan, Quan; Dezert, Jean; Han, Jun-Wei; He, You

2018-05-01

Classifier fusion is an efficient strategy to improve the classification performance for the complex pattern recognition problem. In practice, the multiple classifiers to combine can have different reliabilities and the proper reliability evaluation plays an important role in the fusion process for getting the best classification performance. We propose a new method for classifier fusion with contextual reliability evaluation (CF-CRE) based on inner reliability and relative reliability concepts. The inner reliability, represented by a matrix, characterizes the probability of the object belonging to one class when it is classified to another class. The elements of this matrix are estimated from the -nearest neighbors of the object. A cautious discounting rule is developed under belief functions framework to revise the classification result according to the inner reliability. The relative reliability is evaluated based on a new incompatibility measure which allows to reduce the level of conflict between the classifiers by applying the classical evidence discounting rule to each classifier before their combination. The inner reliability and relative reliability capture different aspects of the classification reliability. The discounted classification results are combined with Dempster-Shafer's rule for the final class decision making support. The performance of CF-CRE have been evaluated and compared with those of main classical fusion methods using real data sets. The experimental results show that CF-CRE can produce substantially higher accuracy than other fusion methods in general. Moreover, CF-CRE is robust to the changes of the number of nearest neighbors chosen for estimating the reliability matrix, which is appealing for the applications.
A Kernel for Protein Secondary Structure Prediction

OpenAIRE

Guermeur , Yann; Lifchitz , Alain; Vert , Régis

2004-01-01

http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10338&mode=toc; International audience; Multi-class support vector machines have already proved efficient in protein secondary structure prediction as ensemble methods, to combine the outputs of sets of classifiers based on different principles. In this chapter, their implementation as basic prediction methods, processing the primary structure or the profile of multiple alignments, is investigated. A kernel devoted to the task is in...
Hand Gesture Recognition with Leap Motion

OpenAIRE

Du, Youchen; Liu, Shenglan; Feng, Lin; Chen, Menghui; Wu, Jie

2017-01-01

The recent introduction of depth cameras like Leap Motion Controller allows researchers to exploit the depth information to recognize hand gesture more robustly. This paper proposes a novel hand gesture recognition system with Leap Motion Controller. A series of features are extracted from Leap Motion tracking data, we feed these features along with HOG feature extracted from sensor images into a multi-class SVM classifier to recognize performed gesture, dimension reduction and feature weight...

Hierarchical mixtures of naive Bayes classifiers

NARCIS (Netherlands)

Wiering, M.A.

2002-01-01

Naive Bayes classifiers tend to perform very well on a large number of problem domains, although their representation power is quite limited compared to more sophisticated machine learning algorithms. In this pa- per we study combining multiple naive Bayes classifiers by using the hierar- chical
Classification Identification of Acoustic Emission Signals from Underground Metal Mine Rock by ICIMF Classifier

Directory of Open Access Journals (Sweden)

Hongyan Zuo

2014-01-01

Full Text Available To overcome the drawback that fuzzy classifier was sensitive to noises and outliers, Mamdani fuzzy classifier based on improved chaos immune algorithm was developed, in which bilateral Gaussian membership function parameters were set as constraint conditions and the indexes of fuzzy classification effectiveness and number of correct samples of fuzzy classification as the subgoal of fitness function. Moreover, Iris database was used for simulation experiment, classification, and recognition of acoustic emission signals and interference signals from stope wall rock of underground metal mines. The results showed that Mamdani fuzzy classifier based on improved chaos immune algorithm could effectively improve the prediction accuracy of classification of data sets with noises and outliers and the classification accuracy of acoustic emission signal and interference signal from stope wall rock of underground metal mines was 90.00%. It was obvious that the improved chaos immune Mamdani fuzzy (ICIMF classifier was useful for accurate diagnosis of acoustic emission signal and interference signal from stope wall rock of underground metal mines.
Steganalysis using logistic regression

Science.gov (United States)

Lubenko, Ivans; Ker, Andrew D.

2011-02-01

We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Object recognition based on Google's reverse image search and image similarity

Science.gov (United States)

Horváth, András.

2015-12-01

Image classification is one of the most challenging tasks in computer vision and a general multiclass classifier could solve many different tasks in image processing. Classification is usually done by shallow learning for predefined objects, which is a difficult task and very different from human vision, which is based on continuous learning of object classes and one requires years to learn a large taxonomy of objects which are not disjunct nor independent. In this paper I present a system based on Google image similarity algorithm and Google image database, which can classify a large set of different objects in a human like manner, identifying related classes and taxonomies.
Logarithmic learning for generalized classifier neural network.

Science.gov (United States)

Ozyildirim, Buse Melis; Avci, Mutlu

2014-12-01

Generalized classifier neural network is introduced as an efficient classifier among the others. Unless the initial smoothing parameter value is close to the optimal one, generalized classifier neural network suffers from convergence problem and requires quite a long time to converge. In this work, to overcome this problem, a logarithmic learning approach is proposed. The proposed method uses logarithmic cost function instead of squared error. Minimization of this cost function reduces the number of iterations used for reaching the minima. The proposed method is tested on 15 different data sets and performance of logarithmic learning generalized classifier neural network is compared with that of standard one. Thanks to operation range of radial basis function included by generalized classifier neural network, proposed logarithmic approach and its derivative has continuous values. This makes it possible to adopt the advantage of logarithmic fast convergence by the proposed learning method. Due to fast convergence ability of logarithmic cost function, training time is maximally decreased to 99.2%. In addition to decrease in training time, classification performance may also be improved till 60%. According to the test results, while the proposed method provides a solution for time requirement problem of generalized classifier neural network, it may also improve the classification accuracy. The proposed method can be considered as an efficient way for reducing the time requirement problem of generalized classifier neural network. Copyright © 2014 Elsevier Ltd. All rights reserved.
Training set optimization and classifier performance in a top-down diabetic retinopathy screening system

Science.gov (United States)

Wigdahl, J.; Agurto, C.; Murray, V.; Barriga, S.; Soliz, P.

2013-03-01

Diabetic retinopathy (DR) affects more than 4.4 million Americans age 40 and over. Automatic screening for DR has shown to be an efficient and cost-effective way to lower the burden on the healthcare system, by triaging diabetic patients and ensuring timely care for those presenting with DR. Several supervised algorithms have been developed to detect pathologies related to DR, but little work has been done in determining the size of the training set that optimizes an algorithm's performance. In this paper we analyze the effect of the training sample size on the performance of a top-down DR screening algorithm for different types of statistical classifiers. Results are based on partial least squares (PLS), support vector machines (SVM), k-nearest neighbor (kNN), and Naïve Bayes classifiers. Our dataset consisted of digital retinal images collected from a total of 745 cases (595 controls, 150 with DR). We varied the number of normal controls in the training set, while keeping the number of DR samples constant, and repeated the procedure 10 times using randomized training sets to avoid bias. Results show increasing performance in terms of area under the ROC curve (AUC) when the number of DR subjects in the training set increased, with similar trends for each of the classifiers. Of these, PLS and k-NN had the highest average AUC. Lower standard deviation and a flattening of the AUC curve gives evidence that there is a limit to the learning ability of the classifiers and an optimal number of cases to train on.
SVM Classifier – a comprehensive java interface for support vector machine classification of microarray data

Science.gov (United States)

Pirooznia, Mehdi; Deng, Youping

2006-01-01

Motivation Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction. Results The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries. We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the BRCA1–BRCA2 samples with RBF kernel of SVM. Conclusion We have developed a java GUI application that allows SVM users to perform SVM training, classification and prediction. We have demonstrated that support vector machines can accurately classify genes into functional categories based upon expression data from DNA microarray hybridization experiments. Among the different kernel functions that we examined, the SVM that uses a radial basis kernel function provides the best performance. The SVM Classifier is available at . PMID:17217518
DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

International Nuclear Information System (INIS)

Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

2011-01-01

We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 ≤ r ≤ 21 (85.2%) and r ≥ 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 ≤ r ≤ 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination (∼2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 ≤ r ≤ 21.
32 CFR 2400.28 - Dissemination of classified information.

Science.gov (United States)

2010-07-01

... 32 National Defense 6 2010-07-01 2010-07-01 false Dissemination of classified information. 2400.28... SECURITY PROGRAM Safeguarding § 2400.28 Dissemination of classified information. Heads of OSTP offices... originating official may prescribe specific restrictions on dissemination of classified information when...
Ensemble support vector machine classification of dementia using structural MRI and mini-mental state examination.

Science.gov (United States)

Sørensen, Lauge; Nielsen, Mads

2018-05-15

The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.
Differential diagnosis of neurodegenerative diseases using structural MRI data

DEFF Research Database (Denmark)

Koikkalainen, Juha; Rhodius-Meester, Hanneke; Tolonen, Antti

2016-01-01

individuals was used for evaluation. The cross-validated classification accuracy was 70.6% and balanced accuracy was 69.1% for the five disease groups using only automatically determined MRI features. Vascular dementia patients could be detected with high sensitivity (96%) using features from FLAIR images....... Controls (sensitivity 82%) and Alzheimer's disease patients (sensitivity 74%) could be accurately classified using T1-based features, whereas the most difficult group was the dementia with Lewy bodies (sensitivity 32%). These results were notable better than the classification accuracies obtained...... characteristics from T1 images, and vascular characteristics from FLAIR images. Classification was performed using a multi-class classifier based on Disease State Index methodology. The classifier provided continuous probability indices for each disease to support clinical decision making. A dataset of 504...
Multi-class, multi-residue analysis of pesticides, polychlorinated biphenyls, polycyclic aromatic hydrocarbons, polybrominated diphenyl ethers and novel flame retardants in fish using fast, low-pressure gas chromatography–tandem mass spectrometry

International Nuclear Information System (INIS)

Sapozhnikova, Yelena; Lehotay, Steven J.

2013-01-01

Highlights: ► A method for analysis of POPs and novel flame retardants in catfish was developed. ► The method is based on a QuEChERS extraction, d-SPE clean-up and low pressure GC/MS–MS. ► The method validation demonstrated good recoveries and low detection limits. ► The method was successfully applied for analysis of catfish samples from the market. - Abstract: A multi-class, multi-residue method for the analysis of 13 novel flame retardants, 18 representative pesticides, 14 polychlorinated biphenyl (PCB) congeners, 16 polycyclic aromatic hydrocarbons (PAHs), and 7 polybrominated diphenyl ether (PBDE) congeners in catfish muscle was developed and evaluated using fast low pressure gas chromatography triple quadrupole tandem mass spectrometry (LP-GC/MS–MS). The method was based on a QuEChERS (quick, easy, cheap, effective, rugged, safe) extraction with acetonitrile and dispersive solid-phase extraction (d-SPE) clean-up with zirconium-based sorbent prior to LP-GC/MS–MS analysis. The developed method was evaluated at 4 spiking levels and further validated by analysis of NIST Standard Reference Materials (SRMs) 1974B and 1947. Sample preparation for a batch of 10 homogenized samples took about 1 h/analyst, and LP-GC/MS–MS analysis provided fast separation of multiple analytes within 9 min achieving high throughput. With the use of isotopically labeled internal standards, recoveries of all but one analyte were between 70 and 120% with relative standard deviations less than 20% (n = 5). The measured values for both SRMs agreed with certified/reference values (72–119% accuracy) for the majority of analytes. The detection limits were 0.1–0.5 ng g −1 for PCBs, 0.5–10 ng g −1 for PBDEs, 0.5–5 ng g −1 for select pesticides and PAHs and 1–10 ng g −1 for flame retardants. The developed method was successfully applied for analysis of catfish samples from the market.
Neural Network Classifiers for Local Wind Prediction.

Science.gov (United States)

Kretzschmar, Ralf; Eckert, Pierre; Cattani, Daniel; Eggimann, Fritz

2004-05-01

This paper evaluates the quality of neural network classifiers for wind speed and wind gust prediction with prediction lead times between +1 and +24 h. The predictions were realized based on local time series and model data. The selection of appropriate input features was initiated by time series analysis and completed by empirical comparison of neural network classifiers trained on several choices of input features. The selected input features involved day time, yearday, features from a single wind observation device at the site of interest, and features derived from model data. The quality of the resulting classifiers was benchmarked against persistence for two different sites in Switzerland. The neural network classifiers exhibited superior quality when compared with persistence judged on a specific performance measure, hit and false-alarm rates.
3D Bayesian contextual classifiers

DEFF Research Database (Denmark)

Larsen, Rasmus

2000-01-01

We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours.......We extend a series of multivariate Bayesian 2-D contextual classifiers to 3-D by specifying a simultaneous Gaussian distribution for the feature vectors as well as a prior distribution of the class variables of a pixel and its 6 nearest 3-D neighbours....
A bench-top hyperspectral imaging system to classify beef from Nellore cattle based on tenderness

Science.gov (United States)

Nubiato, Keni Eduardo Zanoni; Mazon, Madeline Rezende; Antonelo, Daniel Silva; Calkins, Chris R.; Naganathan, Govindarajan Konda; Subbiah, Jeyamkondan; da Luz e Silva, Saulo

2018-03-01

The aim of this study was to evaluate the accuracy of classification of Nellore beef aged for 0, 7, 14, or 21 days and classification based on tenderness and aging period using a bench-top hyperspectral imaging system. A hyperspectral imaging system (λ = 928-2524 nm) was used to collect hyperspectral images of the Longissimus thoracis et lumborum (aging n = 376 and tenderness n = 345) of Nellore cattle. The image processing steps included selection of region of interest, extraction of spectra, and indentification and evalution of selected wavelengths for classification. Six linear discriminant models were developed to classify samples based on tenderness and aging period. The model using the first derivative of partial absorbance spectra (give wavelength range spectra) was able to classify steaks based on the tenderness with an overall accuracy of 89.8%. The model using the first derivative of full absorbance spectra was able to classify steaks based on aging period with an overall accuracy of 84.8%. The results demonstrate that the HIS may be a viable technology for classifying beef based on tenderness and aging period.
A CLASSIFIER SYSTEM USING SMOOTH GRAPH COLORING

Directory of Open Access Journals (Sweden)

JORGE FLORES CRUZ

2017-01-01

Full Text Available Unsupervised classifiers allow clustering methods with less or no human intervention. Therefore it is desirable to group the set of items with less data processing. This paper proposes an unsupervised classifier system using the model of soft graph coloring. This method was tested with some classic instances in the literature and the results obtained were compared with classifications made with human intervention, yielding as good or better results than supervised classifiers, sometimes providing alternative classifications that considers additional information that humans did not considered.
Improved Collaborative Representation Classifier Based on l2-Regularized for Human Action Recognition

Directory of Open Access Journals (Sweden)

Shirui Huo

2017-01-01

Full Text Available Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs and extracting deep convolutional neural network (DCNN features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM.
An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data

Directory of Open Access Journals (Sweden)

Datta Susmita

2010-08-01

Full Text Available Abstract Background Generally speaking, different classifiers tend to work well for certain types of data and conversely, it is usually not known a priori which algorithm will be optimal in any given classification application. In addition, for most classification problems, selecting the best performing classification algorithm amongst a number of competing algorithms is a difficult task for various reasons. As for example, the order of performance may depend on the performance measure employed for such a comparison. In this work, we present a novel adaptive ensemble classifier constructed by combining bagging and rank aggregation that is capable of adaptively changing its performance depending on the type of data that is being classified. The attractive feature of the proposed classifier is its multi-objective nature where the classification results can be simultaneously optimized with respect to several performance measures, for example, accuracy, sensitivity and specificity. We also show that our somewhat complex strategy has better predictive performance as judged on test samples than a more naive approach that attempts to directly identify the optimal classifier based on the training data performances of the individual classifiers. Results We illustrate the proposed method with two simulated and two real-data examples. In all cases, the ensemble classifier performs at the level of the best individual classifier comprising the ensemble or better. Conclusions For complex high-dimensional datasets resulting from present day high-throughput experiments, it may be wise to consider a number of classification algorithms combined with dimension reduction techniques rather than a fixed standard algorithm set a priori.
Exploring Land Use and Land Cover of Geotagged Social-Sensing Images Using Naive Bayes Classifier

Directory of Open Access Journals (Sweden)

Asamaporn Sitthi

2016-09-01

Full Text Available Online social media crowdsourced photos contain a vast amount of visual information about the physical properties and characteristics of the earth’s surface. Flickr is an important online social media platform for users seeking this information. Each day, users generate crowdsourced geotagged digital imagery containing an immense amount of information. In this paper, geotagged Flickr images are used for automatic extraction of low-level land use/land cover (LULC features. The proposed method uses a naive Bayes classifier with color, shape, and color index descriptors. The classified images are mapped using a majority filtering approach. The classifier performance in overall accuracy, kappa coefficient, precision, recall, and f-measure was 87.94%, 82.89%, 88.20%, 87.90%, and 88%, respectively. Labeled-crowdsourced images were filtered into a spatial tile of a 30 m × 30 m resolution using the majority voting method to reduce geolocation uncertainty from the crowdsourced data. These tile datasets were used as training and validation samples to classify Landsat TM5 images. The supervised maximum likelihood method was used for the LULC classification. The results show that the geotagged Flickr images can classify LULC types with reasonable accuracy and that the proposed approach improves LULC classification efficiency if a sufficient spatial distribution of crowdsourced data exists.
QuEChERS, a sample preparation technique that is “catching on”: an up-to-date interview with its inventors

Science.gov (United States)

The technique of QuEChERS (Quick, Easy, Cheap, Effective, Rugged and Safe) is only 7 years old, yet it is revolutionizing the manner in which multiresidue, multiclass pesticide analysis (and perhaps beyond) is performed. Columnist Ron Majors sits down with inventors Steve Lehotay and Michelangelo An...

Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

Science.gov (United States)

Cao, Houwei; Verma, Ragini; Nenkova, Ani

2015-01-01

We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.
Comparison of Different Features and Classifiers for Driver Fatigue Detection Based on a Single EEG Channel

Directory of Open Access Journals (Sweden)

Jianfeng Hu

2017-01-01

Full Text Available Driver fatigue has become an important factor to traffic accidents worldwide, and effective detection of driver fatigue has major significance for public health. The purpose method employs entropy measures for feature extraction from a single electroencephalogram (EEG channel. Four types of entropies measures, sample entropy (SE, fuzzy entropy (FE, approximate entropy (AE, and spectral entropy (PE, were deployed for the analysis of original EEG signal and compared by ten state-of-the-art classifiers. Results indicate that optimal performance of single channel is achieved using a combination of channel CP4, feature FE, and classifier Random Forest (RF. The highest accuracy can be up to 96.6%, which has been able to meet the needs of real applications. The best combination of channel + features + classifier is subject-specific. In this work, the accuracy of FE as the feature is far greater than the Acc of other features. The accuracy using classifier RF is the best, while that of classifier SVM with linear kernel is the worst. The impact of channel selection on the Acc is larger. The performance of various channels is very different.
The edge-preservation multi-classifier relearning framework for the classification of high-resolution remotely sensed imagery

Science.gov (United States)

Han, Xiaopeng; Huang, Xin; Li, Jiayi; Li, Yansheng; Yang, Michael Ying; Gong, Jianya

2018-04-01

In recent years, the availability of high-resolution imagery has enabled more detailed observation of the Earth. However, it is imperative to simultaneously achieve accurate interpretation and preserve the spatial details for the classification of such high-resolution data. To this aim, we propose the edge-preservation multi-classifier relearning framework (EMRF). This multi-classifier framework is made up of support vector machine (SVM), random forest (RF), and sparse multinomial logistic regression via variable splitting and augmented Lagrangian (LORSAL) classifiers, considering their complementary characteristics. To better characterize complex scenes of remote sensing images, relearning based on landscape metrics is proposed, which iteratively quantizes both the landscape composition and spatial configuration by the use of the initial classification results. In addition, a novel tri-training strategy is proposed to solve the over-smoothing effect of relearning by means of automatic selection of training samples with low classification certainties, which always distribute in or near the edge areas. Finally, EMRF flexibly combines the strengths of relearning and tri-training via the classification certainties calculated by the probabilistic output of the respective classifiers. It should be noted that, in order to achieve an unbiased evaluation, we assessed the classification accuracy of the proposed framework using both edge and non-edge test samples. The experimental results obtained with four multispectral high-resolution images confirm the efficacy of the proposed framework, in terms of both edge and non-edge accuracy.
Carbon classified?

DEFF Research Database (Denmark)

Lippert, Ingmar

2012-01-01

. Using an actor- network theory (ANT) framework, the aim is to investigate the actors who bring together the elements needed to classify their carbon emission sources and unpack the heterogeneous relations drawn on. Based on an ethnographic study of corporate agents of ecological modernisation over...... a period of 13 months, this paper provides an exploration of three cases of enacting classification. Drawing on ANT, we problematise the silencing of a range of possible modalities of consumption facts and point to the ontological ethics involved in such performances. In a context of global warming...
Feature extraction using convolutional neural network for classifying breast density in mammographic images

Science.gov (United States)

Thomaz, Ricardo L.; Carneiro, Pedro C.; Patrocinio, Ana C.

2017-03-01

Breast cancer is the leading cause of death for women in most countries. The high levels of mortality relate mostly to late diagnosis and to the direct proportionally relationship between breast density and breast cancer development. Therefore, the correct assessment of breast density is important to provide better screening for higher risk patients. However, in modern digital mammography the discrimination among breast densities is highly complex due to increased contrast and visual information for all densities. Thus, a computational system for classifying breast density might be a useful tool for aiding medical staff. Several machine-learning algorithms are already capable of classifying small number of classes with good accuracy. However, machinelearning algorithms main constraint relates to the set of features extracted and used for classification. Although well-known feature extraction techniques might provide a good set of features, it is a complex task to select an initial set during design of a classifier. Thus, we propose feature extraction using a Convolutional Neural Network (CNN) for classifying breast density by a usual machine-learning classifier. We used 307 mammographic images downsampled to 260x200 pixels to train a CNN and extract features from a deep layer. After training, the activation of 8 neurons from a deep fully connected layer are extracted and used as features. Then, these features are feedforward to a single hidden layer neural network that is cross-validated using 10-folds to classify among four classes of breast density. The global accuracy of this method is 98.4%, presenting only 1.6% of misclassification. However, the small set of samples and memory constraints required the reuse of data in both CNN and MLP-NN, therefore overfitting might have influenced the results even though we cross-validated the network. Thus, although we presented a promising method for extracting features and classifying breast density, a greater database is
Local-global classifier fusion for screening chest radiographs

Science.gov (United States)

Ding, Meng; Antani, Sameer; Jaeger, Stefan; Xue, Zhiyun; Candemir, Sema; Kohli, Marc; Thoma, George

2017-03-01

Tuberculosis (TB) is a severe comorbidity of HIV and chest x-ray (CXR) analysis is a necessary step in screening for the infective disease. Automatic analysis of digital CXR images for detecting pulmonary abnormalities is critical for population screening, especially in medical resource constrained developing regions. In this article, we describe steps that improve previously reported performance of NLM's CXR screening algorithms and help advance the state of the art in the field. We propose a local-global classifier fusion method where two complementary classification systems are combined. The local classifier focuses on subtle and partial presentation of the disease leveraging information in radiology reports that roughly indicates locations of the abnormalities. In addition, the global classifier models the dominant spatial structure in the gestalt image using GIST descriptor for the semantic differentiation. Finally, the two complementary classifiers are combined using linear fusion, where the weight of each decision is calculated by the confidence probabilities from the two classifiers. We evaluated our method on three datasets in terms of the area under the Receiver Operating Characteristic (ROC) curve, sensitivity, specificity and accuracy. The evaluation demonstrates the superiority of our proposed local-global fusion method over any single classifier.
Balanced sensitivity functions for tuning multi-dimensional Bayesian network classifiers

NARCIS (Netherlands)

Bolt, J.H.; van der Gaag, L.C.

Multi-dimensional Bayesian network classifiers are Bayesian networks of restricted topological structure, which are tailored to classifying data instances into multiple dimensions. Like more traditional classifiers, multi-dimensional classifiers are typically learned from data and may include
Assessment of fractures classified as non-mineralised in the Sicada database

Energy Technology Data Exchange (ETDEWEB)

Claesson Liljedahl, Lillemor; Munier, Raymond (Swedish Nuclear Fuel and Waste Management Co., Stockholm (Sweden)); Sandstroem, Bjoern (WSP Sverige AB, Goeteborg (Sweden)); Drake, Henrik (Isochron GeoConsulting, Varberg (Sweden)); Tullborg, Eva-Lena (Terralogica AB, Graabo (Sweden))

2011-03-15

The general objective of this report was to describe the results of the investigation of fractures classified as non-mineralised in Sicada. Such fractures exist at Forsmark and at Laxemar. The main aims of the investigation of these fractures were to: - Quantify the number of non-mineralised fractures (i.e. fractures lacking mineral coating) in Sicada (table: p{_}fract{_}core{_}eshi). - Closely examine a selection of fractures recorded as non-mineralised in Sicada. - Outline possible reasons for the existence of non-mineralised fractures. The work has involved extraction of fracture data from Sicada and subsequent statistical analysis. Since several thousand fractures are classified as non-mineralised in Sicada, it was not a practical possibility to include all these in this study, we examined one fracture sub-set from each site. We investigated a sample of 204 of these fractures in detail (see Sections 1.1 and 2.4). Rock mechanical differences between Forsmark and Laxemar and kinematic analysis of fracture surfaces is not discussed in this report
Recognition of pornographic web pages by classifying texts and images.

Science.gov (United States)

Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

2007-06-01

With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.
Zooniverse: Combining Human and Machine Classifiers for the Big Survey Era

Science.gov (United States)

Fortson, Lucy; Wright, Darryl; Beck, Melanie; Lintott, Chris; Scarlata, Claudia; Dickinson, Hugh; Trouille, Laura; Willi, Marco; Laraia, Michael; Boyer, Amy; Veldhuis, Marten; Zooniverse

2018-01-01

Many analyses of astronomical data sets, ranging from morphological classification of galaxies to identification of supernova candidates, have relied on humans to classify data into distinct categories. Crowdsourced galaxy classifications via the Galaxy Zoo project provided a solution that scaled visual classification for extant surveys by harnessing the combined power of thousands of volunteers. However, the much larger data sets anticipated from upcoming surveys will require a different approach. Automated classifiers using supervised machine learning have improved considerably over the past decade but their increasing sophistication comes at the expense of needing ever more training data. Crowdsourced classification by human volunteers is a critical technique for obtaining these training data. But several improvements can be made on this zeroth order solution. Efficiency gains can be achieved by implementing a “cascade filtering” approach whereby the task structure is reduced to a set of binary questions that are more suited to simpler machines while demanding lower cognitive loads for humans.Intelligent subject retirement based on quantitative metrics of volunteer skill and subject label reliability also leads to dramatic improvements in efficiency. We note that human and machine classifiers may retire subjects differently leading to trade-offs in performance space. Drawing on work with several Zooniverse projects including Galaxy Zoo and Supernova Hunter, we will present recent findings from experiments that combine cohorts of human and machine classifiers. We show that the most efficient system results when appropriate subsets of the data are intelligently assigned to each group according to their particular capabilities.With sufficient online training, simple machines can quickly classify “easy” subjects, leaving more difficult (and discovery-oriented) tasks for volunteers. We also find humans achieve higher classification purity while samples
Using Neural Networks to Classify Digitized Images of Galaxies

Science.gov (United States)

Goderya, S. N.; McGuire, P. C.

2000-12-01

Automated classification of Galaxies into Hubble types is of paramount importance to study the large scale structure of the Universe, particularly as survey projects like the Sloan Digital Sky Survey complete their data acquisition of one million galaxies. At present it is not possible to find robust and efficient artificial intelligence based galaxy classifiers. In this study we will summarize progress made in the development of automated galaxy classifiers using neural networks as machine learning tools. We explore the Bayesian linear algorithm, the higher order probabilistic network, the multilayer perceptron neural network and Support Vector Machine Classifier. The performance of any machine classifier is dependant on the quality of the parameters that characterize the different groups of galaxies. Our effort is to develop geometric and invariant moment based parameters as input to the machine classifiers instead of the raw pixel data. Such an approach reduces the dimensionality of the classifier considerably, and removes the effects of scaling and rotation, and makes it easier to solve for the unknown parameters in the galaxy classifier. To judge the quality of training and classification we develop the concept of Mathews coefficients for the galaxy classification community. Mathews coefficients are single numbers that quantify classifier performance even with unequal prior probabilities of the classes.
Classifier fusion for VoIP attacks classification

Science.gov (United States)

Safarik, Jakub; Rezac, Filip

2017-05-01

SIP is one of the most successful protocols in the field of IP telephony communication. It establishes and manages VoIP calls. As the number of SIP implementation rises, we can expect a higher number of attacks on the communication system in the near future. This work aims at malicious SIP traffic classification. A number of various machine learning algorithms have been developed for attack classification. The paper presents a comparison of current research and the use of classifier fusion method leading to a potential decrease in classification error rate. Use of classifier combination makes a more robust solution without difficulties that may affect single algorithms. Different voting schemes, combination rules, and classifiers are discussed to improve the overall performance. All classifiers have been trained on real malicious traffic. The concept of traffic monitoring depends on the network of honeypot nodes. These honeypots run in several networks spread in different locations. Separation of honeypots allows us to gain an independent and trustworthy attack information.
Fast Most Similar Neighbor (MSN) classifiers for Mixed Data

OpenAIRE

Hernández Rodríguez, Selene

2010-01-01

The k nearest neighbor (k-NN) classifier has been extensively used in Pattern Recognition because of its simplicity and its good performance. However, in large datasets applications, the exhaustive k-NN classifier becomes impractical. Therefore, many fast k-NN classifiers have been developed; most of them rely on metric properties (usually the triangle inequality) to reduce the number of prototype comparisons. Hence, the existing fast k-NN classifiers are applicable only when the comparison f...
Feature extraction for dynamic integration of classifiers

NARCIS (Netherlands)

Pechenizkiy, M.; Tsymbal, A.; Puuronen, S.; Patterson, D.W.

2007-01-01

Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. In this paper, we present an algorithm for the dynamic integration of classifiers in the space of extracted features (FEDIC). It is based on the technique
Classifying Returns as Extreme

DEFF Research Database (Denmark)

Christiansen, Charlotte

2014-01-01

I consider extreme returns for the stock and bond markets of 14 EU countries using two classification schemes: One, the univariate classification scheme from the previous literature that classifies extreme returns for each market separately, and two, a novel multivariate classification scheme tha...
The Protection of Classified Information: The Legal Framework

National Research Council Canada - National Science Library

Elsea, Jennifer K

2006-01-01

Recent incidents involving leaks of classified information have heightened interest in the legal framework that governs security classification, access to classified information, and penalties for improper disclosure...
Energy-Efficient Neuromorphic Classifiers.

Science.gov (United States)

Martí, Daniel; Rigotti, Mattia; Seok, Mingoo; Fusi, Stefano

2016-10-01

Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.
Automated Detection of Driver Fatigue Based on AdaBoost Classifier with EEG Signals

Directory of Open Access Journals (Sweden)

Jianfeng Hu

2017-08-01

Full Text Available Purpose: Driving fatigue has become one of the important causes of road accidents, there are many researches to analyze driver fatigue. EEG is becoming increasingly useful in the measuring fatigue state. Manual interpretation of EEG signals is impossible, so an effective method for automatic detection of EEG signals is crucial needed.Method: In order to evaluate the complex, unstable, and non-linear characteristics of EEG signals, four feature sets were computed from EEG signals, in which fuzzy entropy (FE, sample entropy (SE, approximate Entropy (AE, spectral entropy (PE, and combined entropies (FE + SE + AE + PE were included. All these feature sets were used as the input vectors of AdaBoost classifier, a boosting method which is fast and highly accurate. To assess our method, several experiments including parameter setting and classifier comparison were conducted on 28 subjects. For comparison, Decision Trees (DT, Support Vector Machine (SVM and Naive Bayes (NB classifiers are used.Results: The proposed method (combination of FE and AdaBoost yields superior performance than other schemes. Using FE feature extractor, AdaBoost achieves improved area (AUC under the receiver operating curve of 0.994, error rate (ERR of 0.024, Precision of 0.969, Recall of 0.984, F1 score of 0.976, and Matthews correlation coefficient (MCC of 0.952, compared to SVM (ERR at 0.035, Precision of 0.957, Recall of 0.974, F1 score of 0.966, and MCC of 0.930 with AUC of 0.990, DT (ERR at 0.142, Precision of 0.857, Recall of 0.859, F1 score of 0.966, and MCC of 0.716 with AUC of 0.916 and NB (ERR at 0.405, Precision of 0.646, Recall of 0.434, F1 score of 0.519, and MCC of 0.203 with AUC of 0.606. It shows that the FE feature set and combined feature set outperform other feature sets. AdaBoost seems to have better robustness against changes of ratio of test samples for all samples and number of subjects, which might therefore aid in the real-time detection of driver
Assessment of a 44 gene classifier for the evaluation of chronic fatigue syndrome from peripheral blood mononuclear cell gene expression.

Directory of Open Access Journals (Sweden)

Daniel Frampton

Full Text Available Chronic fatigue syndrome (CFS is a clinically defined illness estimated to affect millions of people worldwide causing significant morbidity and an annual cost of billions of dollars. Currently there are no laboratory-based diagnostic methods for CFS. However, differences in gene expression profiles between CFS patients and healthy persons have been reported in the literature. Using mRNA relative quantities for 44 previously identified reporter genes taken from a large dataset comprising both CFS patients and healthy volunteers, we derived a gene profile scoring metric to accurately classify CFS and healthy samples. This metric out-performed any of the reporter genes used individually as a classifier of CFS.To determine whether the reporter genes were robust across populations, we applied this metric to classify a separate blind dataset of mRNA relative quantities from a new population of CFS patients and healthy persons with limited success. Although the metric was able to successfully classify roughly two-thirds of both CFS and healthy samples correctly, the level of misclassification was high. We conclude many of the previously identified reporter genes are study-specific and thus cannot be used as a broad CFS diagnostic.
Fault diagnosis in spur gears based on genetic algorithm and random forest

Science.gov (United States)

Cerrada, Mariela; Zurita, Grover; Cabrera, Diego; Sánchez, René-Vinicio; Artés, Mariano; Li, Chuan

2016-03-01

There are growing demands for condition-based monitoring of gearboxes, and therefore new methods to improve the reliability, effectiveness, accuracy of the gear fault detection ought to be evaluated. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance of the diagnostic models. On the other hand, random forest classifiers are suitable models in industrial environments where large data-samples are not usually available for training such diagnostic models. The main aim of this research is to build up a robust system for the multi-class fault diagnosis in spur gears, by selecting the best set of condition parameters on time, frequency and time-frequency domains, which are extracted from vibration signals. The diagnostic system is performed by using genetic algorithms and a classifier based on random forest, in a supervised environment. The original set of condition parameters is reduced around 66% regarding the initial size by using genetic algorithms, and still get an acceptable classification precision over 97%. The approach is tested on real vibration signals by considering several fault classes, one of them being an incipient fault, under different running conditions of load and velocity.

Discriminative illumination: per-pixel classification of raw materials based on optimal projections of spectral BRDF.

Science.gov (United States)

Liu, Chao; Gu, Jinwei

2014-01-01

Classifying raw, unpainted materials--metal, plastic, ceramic, fabric, and so on--is an important yet challenging task for computer vision. Previous works measure subsets of surface spectral reflectance as features for classification. However, acquiring the full spectral reflectance is time consuming and error-prone. In this paper, we propose to use coded illumination to directly measure discriminative features for material classification. Optimal illumination patterns--which we call "discriminative illumination"--are learned from training samples, after projecting to which the spectral reflectance of different materials are maximally separated. This projection is automatically realized by the integration of incident light for surface reflection. While a single discriminative illumination is capable of linear, two-class classification, we show that multiple discriminative illuminations can be used for nonlinear and multiclass classification. We also show theoretically that the proposed method has higher signal-to-noise ratio than previous methods due to light multiplexing. Finally, we construct an LED-based multispectral dome and use the discriminative illumination method for classifying a variety of raw materials, including metal (aluminum, alloy, steel, stainless steel, brass, and copper), plastic, ceramic, fabric, and wood. Experimental results demonstrate its effectiveness.
Effect of sample size on multi-parametric prediction of tissue outcome in acute ischemic stroke using a random forest classifier

Science.gov (United States)

Forkert, Nils Daniel; Fiehler, Jens

2015-03-01

The tissue outcome prediction in acute ischemic stroke patients is highly relevant for clinical and research purposes. It has been shown that the combined analysis of diffusion and perfusion MRI datasets using high-level machine learning techniques leads to an improved prediction of final infarction compared to single perfusion parameter thresholding. However, most high-level classifiers require a previous training and, until now, it is ambiguous how many subjects are required for this, which is the focus of this work. 23 MRI datasets of acute stroke patients with known tissue outcome were used in this work. Relative values of diffusion and perfusion parameters as well as the binary tissue outcome were extracted on a voxel-by- voxel level for all patients and used for training of a random forest classifier. The number of patients used for training set definition was iteratively and randomly reduced from using all 22 other patients to only one other patient. Thus, 22 tissue outcome predictions were generated for each patient using the trained random forest classifiers and compared to the known tissue outcome using the Dice coefficient. Overall, a logarithmic relation between the number of patients used for training set definition and tissue outcome prediction accuracy was found. Quantitatively, a mean Dice coefficient of 0.45 was found for the prediction using the training set consisting of the voxel information from only one other patient, which increases to 0.53 if using all other patients (n=22). Based on extrapolation, 50-100 patients appear to be a reasonable tradeoff between tissue outcome prediction accuracy and effort required for data acquisition and preparation.
MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

Science.gov (United States)

Chen, Lei; Kamel, Mohamed S.

2016-01-01

In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.
Highly informative multiclass profiling of lipids by ultra-high performance liquid chromatography - Low resolution (quadrupole) mass spectrometry by using electrospray ionization and atmospheric pressure chemical ionization interfaces.

Science.gov (United States)

Beccaria, Marco; Inferrera, Veronica; Rigano, Francesca; Gorynski, Krzysztof; Purcaro, Giorgia; Pawliszyn, Janusz; Dugo, Paola; Mondello, Luigi

2017-08-04

A simple, fast, and versatile method, using an ultra-high performance liquid chromatography system coupled with a low resolution (single quadrupole) mass spectrometer was optimized to perform multiclass lipid profiling of human plasma. Particular attention was made to develop a method suitable for both electrospray ionization and atmospheric pressure chemical ionization interfaces (sequentially in positive- and negative-ion mode), without any modification of the chromatographic conditions (mobile phase, flow-rate, gradient, etc.). Emphasis was given to the extrapolation of the structural information based on the fragmentation pattern obtained using atmospheric pressure chemical ionization interface, under each different ionization condition, highlighting the complementary information obtained using the electrospray ionization interface, of support for related molecule ions identification. Furthermore, mass spectra of phosphatidylserine and phosphatidylinositol obtained using the atmospheric pressure chemical ionization interface are reported and discussed for the first time. Copyright © 2017 Elsevier B.V. All rights reserved.
Wearable Sensor Data Classification for Human Activity Recognition Based on an Iterative Learning Framework

Directory of Open Access Journals (Sweden)

Juan Carlos Davila

2017-06-01

Full Text Available The design of multiple human activity recognition applications in areas such as healthcare, sports and safety relies on wearable sensor technologies. However, when making decisions based on the data acquired by such sensors in practical situations, several factors related to sensor data alignment, data losses, and noise, among other experimental constraints, deteriorate data quality and model accuracy. To tackle these issues, this paper presents a data-driven iterative learning framework to classify human locomotion activities such as walk, stand, lie, and sit, extracted from the Opportunity dataset. Data acquired by twelve 3-axial acceleration sensors and seven inertial measurement units are initially de-noised using a two-stage consecutive filtering approach combining a band-pass Finite Impulse Response (FIR and a wavelet filter. A series of statistical parameters are extracted from the kinematical features, including the principal components and singular value decomposition of roll, pitch, yaw and the norm of the axial components. The novel interactive learning procedure is then applied in order to minimize the number of samples required to classify human locomotion activities. Only those samples that are most distant from the centroids of data clusters, according to a measure presented in the paper, are selected as candidates for the training dataset. The newly built dataset is then used to train an SVM multi-class classifier. The latter will produce the lowest prediction error. The proposed learning framework ensures a high level of robustness to variations in the quality of input data, while only using a much lower number of training samples and therefore a much shorter training time, which is an important consideration given the large size of the dataset.
Wearable Sensor Data Classification for Human Activity Recognition Based on an Iterative Learning Framework.

Science.gov (United States)

Davila, Juan Carlos; Cretu, Ana-Maria; Zaremba, Marek

2017-06-07

The design of multiple human activity recognition applications in areas such as healthcare, sports and safety relies on wearable sensor technologies. However, when making decisions based on the data acquired by such sensors in practical situations, several factors related to sensor data alignment, data losses, and noise, among other experimental constraints, deteriorate data quality and model accuracy. To tackle these issues, this paper presents a data-driven iterative learning framework to classify human locomotion activities such as walk, stand, lie, and sit, extracted from the Opportunity dataset. Data acquired by twelve 3-axial acceleration sensors and seven inertial measurement units are initially de-noised using a two-stage consecutive filtering approach combining a band-pass Finite Impulse Response (FIR) and a wavelet filter. A series of statistical parameters are extracted from the kinematical features, including the principal components and singular value decomposition of roll, pitch, yaw and the norm of the axial components. The novel interactive learning procedure is then applied in order to minimize the number of samples required to classify human locomotion activities. Only those samples that are most distant from the centroids of data clusters, according to a measure presented in the paper, are selected as candidates for the training dataset. The newly built dataset is then used to train an SVM multi-class classifier. The latter will produce the lowest prediction error. The proposed learning framework ensures a high level of robustness to variations in the quality of input data, while only using a much lower number of training samples and therefore a much shorter training time, which is an important consideration given the large size of the dataset.
Effective Heart Disease Detection Based on Quantitative Computerized Traditional Chinese Medicine Using Representation Based Classifiers

Directory of Open Access Journals (Sweden)

Ting Shu

2017-01-01

Full Text Available At present, heart disease is the number one cause of death worldwide. Traditionally, heart disease is commonly detected using blood tests, electrocardiogram, cardiac computerized tomography scan, cardiac magnetic resonance imaging, and so on. However, these traditional diagnostic methods are time consuming and/or invasive. In this paper, we propose an effective noninvasive computerized method based on facial images to quantitatively detect heart disease. Specifically, facial key block color features are extracted from facial images and analyzed using the Probabilistic Collaborative Representation Based Classifier. The idea of facial key block color analysis is founded in Traditional Chinese Medicine. A new dataset consisting of 581 heart disease and 581 healthy samples was experimented by the proposed method. In order to optimize the Probabilistic Collaborative Representation Based Classifier, an analysis of its parameters was performed. According to the experimental results, the proposed method obtains the highest accuracy compared with other classifiers and is proven to be effective at heart disease detection.
Verification of classified fissile material using unclassified attributes

International Nuclear Information System (INIS)

Nicholas, N.J.; Fearey, B.L.; Puckett, J.M.; Tape, J.W.

1998-01-01

This paper reports on the most recent efforts of US technical experts to explore verification by IAEA of unclassified attributes of classified excess fissile material. Two propositions are discussed: (1) that multiple unclassified attributes could be declared by the host nation and then verified (and reverified) by the IAEA in order to provide confidence in that declaration of a classified (or unclassified) inventory while protecting classified or sensitive information; and (2) that attributes could be measured, remeasured, or monitored to provide continuity of knowledge in a nonintrusive and unclassified manner. They believe attributes should relate to characteristics of excess weapons materials and should be verifiable and authenticatable with methods usable by IAEA inspectors. Further, attributes (along with the methods to measure them) must not reveal any classified information. The approach that the authors have taken is as follows: (1) assume certain attributes of classified excess material, (2) identify passive signatures, (3) determine range of applicable measurement physics, (4) develop a set of criteria to assess and select measurement technologies, (5) select existing instrumentation for proof-of-principle measurements and demonstration, and (6) develop and design information barriers to protect classified information. While the attribute verification concepts and measurements discussed in this paper appear promising, neither the attribute verification approach nor the measurement technologies have been fully developed, tested, and evaluated
Reinforcement Learning Based Artificial Immune Classifier

Directory of Open Access Journals (Sweden)

Mehmet Karakose

2013-01-01

Full Text Available One of the widely used methods for classification that is a decision-making process is artificial immune systems. Artificial immune systems based on natural immunity system can be successfully applied for classification, optimization, recognition, and learning in real-world problems. In this study, a reinforcement learning based artificial immune classifier is proposed as a new approach. This approach uses reinforcement learning to find better antibody with immune operators. The proposed new approach has many contributions according to other methods in the literature such as effectiveness, less memory cell, high accuracy, speed, and data adaptability. The performance of the proposed approach is demonstrated by simulation and experimental results using real data in Matlab and FPGA. Some benchmark data and remote image data are used for experimental results. The comparative results with supervised/unsupervised based artificial immune system, negative selection classifier, and resource limited artificial immune classifier are given to demonstrate the effectiveness of the proposed new method.
Development of a Multi-class Steroid Hormone Screening Method using Liquid Chromatography/Tandem Mass Spectrometry (LC-MS/MS)

Science.gov (United States)

Boggs, Ashley S. P.; Bowden, John A.; Galligan, Thomas M.; Guillette, Louis J.; Kucklick, John R.

2016-01-01

Monitoring complex endocrine pathways is often limited by indirect measurement or measurement of a single hormone class per analysis. There is a burgeoning need to develop specific direct-detection methods capable of providing simultaneous measurement of biologically relevant concentrations of multiple classes of hormones (estrogens, androgens, progestogens, and corticosteroids). The objectives of this study were to develop a liquid chromatography-tandem mass spectrometry (LC-MS/MS) method for multi-class steroid hormone detection using biologically relevant concentrations, then test limits of detection (LOD) in a high-background matrix by spiking charcoal-stripped fetal bovine serum (FBS) extract. Accuracy was tested with National Institute of Standards and Technology Standard Reference Materials (SRMs) with certified concentrations of cortisol, testosterone, and progesterone. 11-Deoxycorticosterone, 11-deoxycortisol, 17-hydroxypregnenolone, 17-hydroxyprogesterone, adrenosterone, androstenedione, cortisol, corticosterone, dehydroepiandrosterone, dihydrotestosterone, estradiol, estriol, estrone, equilin, pregnenolone, progesterone, and testosterone were also measured using isotopic dilution. Dansyl chloride (DC) derivatization was investigated maintaining the same method to improve and expedite estrogen analysis. Biologically relevant LODs were determined for 15 hormones. DC derivatization improved estrogen response two- to eight-fold, and improved chromatographic separation. All measurements had an accuracy ≤ 14 % difference from certified values (not accounting for uncertainty) and relative standard deviation ≤ 14 %. This method chromatographically separated and quantified biologically relevant concentrations of four hormone classes using highly specific fragmentation patterns and measured certified values of hormones that were previously split into three separate chromatographic methods. PMID:27039201
Intelligent Garbage Classifier

Directory of Open Access Journals (Sweden)

Ignacio Rodríguez Novelle

2008-12-01

Full Text Available IGC (Intelligent Garbage Classifier is a system for visual classification and separation of solid waste products. Currently, an important part of the separation effort is based on manual work, from household separation to industrial waste management. Taking advantage of the technologies currently available, a system has been built that can analyze images from a camera and control a robot arm and conveyor belt to automatically separate different kinds of waste.
Correlation Dimension-Based Classifier

Czech Academy of Sciences Publication Activity Database

Jiřina, Marcel; Jiřina jr., M.

2014-01-01

Roč. 44, č. 12 (2014), s. 2253-2263 ISSN 2168-2267 R&D Projects: GA MŠk(CZ) LG12020 Institutional support: RVO:67985807 Keywords : classifier * multidimensional data * correlation dimension * scaling exponent * polynomial expansion Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 3.469, year: 2014
An ensemble classifier to predict track geometry degradation

International Nuclear Information System (INIS)

Cárdenas-Gallo, Iván; Sarmiento, Carlos A.; Morales, Gilberto A.; Bolivar, Manuel A.; Akhavan-Tabatabaei, Raha

2017-01-01

Railway operations are inherently complex and source of several problems. In particular, track geometry defects are one of the leading causes of train accidents in the United States. This paper presents a solution approach which entails the construction of an ensemble classifier to forecast the degradation of track geometry. Our classifier is constructed by solving the problem from three different perspectives: deterioration, regression and classification. We considered a different model from each perspective and our results show that using an ensemble method improves the predictive performance. - Highlights: • We present an ensemble classifier to forecast the degradation of track geometry. • Our classifier considers three perspectives: deterioration, regression and classification. • We construct and test three models and our results show that using an ensemble method improves the predictive performance.
Methods of generalizing and classifying layer structures of a special form

Energy Technology Data Exchange (ETDEWEB)

Viktorova, N P

1981-09-01

An examination is made of the problem of classifying structures represented by weighted multilayer graphs of special form with connections between the vertices of each layer. The classification of structures of such a form is based on the construction of resolving sets of graphs as a result of generalization of the elements of the training sample of each class and the testing of whether an input object is isomorphic (with allowance for the weights) to the structures of the resolving set or not. 4 references.
Data characteristics that determine classifier performance

CSIR Research Space (South Africa)

Van der Walt, Christiaan M

2006-11-01

Full Text Available available at [11]. The kNN uses a LinearNN nearest neighbour search algorithm with an Euclidean distance metric [8]. The optimal k value is determined by performing 10-fold cross-validation. An optimal k value between 1 and 10 is used for Experiments 1... classifiers. 10-fold cross-validation is used to evaluate and compare the performance of the classifiers on the different data sets. 3.1. Artificial data generation Multivariate Gaussian distributions are used to generate artificial data sets. We use d...
Pixel Classification of SAR ice images using ANFIS-PSO Classifier

Directory of Open Access Journals (Sweden)

G. Vasumathi

2016-12-01

Full Text Available Synthetic Aperture Radar (SAR is playing a vital role in taking extremely high resolution radar images. It is greatly used to monitor the ice covered ocean regions. Sea monitoring is important for various purposes which includes global climate systems and ship navigation. Classification on the ice infested area gives important features which will be further useful for various monitoring process around the ice regions. Main objective of this paper is to classify the SAR ice image that helps in identifying the regions around the ice infested areas. In this paper three stages are considered in classification of SAR ice images. It starts with preprocessing in which the speckled SAR ice images are denoised using various speckle removal filters; comparison is made on all these filters to find the best filter in speckle removal. Second stage includes segmentation in which different regions are segmented using K-means and watershed segmentation algorithms; comparison is made between these two algorithms to find the best in segmenting SAR ice images. The last stage includes pixel based classification which identifies and classifies the segmented regions using various supervised learning classifiers. The algorithms includes Back propagation neural networks (BPN, Fuzzy Classifier, Adaptive Neuro Fuzzy Inference Classifier (ANFIS classifier and proposed ANFIS with Particle Swarm Optimization (PSO classifier; comparison is made on all these classifiers to propose which classifier is best suitable for classifying the SAR ice image. Various evaluation metrics are performed separately at all these three stages.
Disassembly and Sanitization of Classified Matter

International Nuclear Information System (INIS)

Stockham, Dwight J.; Saad, Max P.

2008-01-01

The Disassembly Sanitization Operation (DSO) process was implemented to support weapon disassembly and disposition by using recycling and waste minimization measures. This process was initiated by treaty agreements and reconfigurations within both the DOD and DOE Complexes. The DOE is faced with disassembling and disposing of a huge inventory of retired weapons, components, training equipment, spare parts, weapon maintenance equipment, and associated material. In addition, regulations have caused a dramatic increase in the need for information required to support the handling and disposition of these parts and materials. In the past, huge inventories of classified weapon components were required to have long-term storage at Sandia and at many other locations throughout the DoE Complex. These materials are placed in onsite storage unit due to classification issues and they may also contain radiological and/or hazardous components. Since no disposal options exist for this material, the only choice was long-term storage. Long-term storage is costly and somewhat problematic, requiring a secured storage area, monitoring, auditing, and presenting the potential for loss or theft of the material. Overall recycling rates for materials sent through the DSO process have enabled 70 to 80% of these components to be recycled. These components are made of high quality materials and once this material has been sanitized, the demand for the component metals for recycling efforts is very high. The DSO process for NGPF, classified components established the credibility of this technique for addressing the long-term storage requirements of the classified weapons component inventory. The success of this application has generated interest from other Sandia organizations and other locations throughout the complex. Other organizations are requesting the help of the DSO team and the DSO is responding to these requests by expanding its scope to include Work-for- Other projects. For example
Wavelet Packet Entropy in Speaker-Independent Emotional State Detection from Speech Signal

Directory of Open Access Journals (Sweden)

Mina Kadkhodaei Elyaderani

2015-01-01

Full Text Available In this paper, wavelet packet entropy is proposed for speaker-independent emotion detection from speech. After pre-processing, wavelet packet decomposition using wavelet type db3 at level 4 is calculated and Shannon entropy in its nodes is calculated to be used as feature. In addition, prosodic features such as first four formants, jitter or pitch deviation amplitude, and shimmer or energy variation amplitude besides MFCC features are applied to complete the feature vector. Then, Support Vector Machine (SVM is used to classify the vectors in multi-class (all emotions or two-class (each emotion versus normal state format. 46 different utterances of a single sentence from Berlin Emotional Speech Dataset are selected. These are uttered by 10 speakers in sadness, happiness, fear, boredom, anger, and normal emotional state. Experimental results show that proposed features can improve emotional state detection accuracy in multi-class situation. Furthermore, adding to other features wavelet entropy coefficients increase the accuracy of two-class detection for anger, fear, and happiness.
Detecting and classifying method based on similarity matching of Android malware behavior with profile.

Science.gov (United States)

Jang, Jae-Wook; Yun, Jaesung; Mohaisen, Aziz; Woo, Jiyoung; Kim, Huy Kang

2016-01-01

Mass-market mobile security threats have increased recently due to the growth of mobile technologies and the popularity of mobile devices. Accordingly, techniques have been introduced for identifying, classifying, and defending against mobile threats utilizing static, dynamic, on-device, and off-device techniques. Static techniques are easy to evade, while dynamic techniques are expensive. On-device techniques are evasion, while off-device techniques need being always online. To address some of those shortcomings, we introduce Andro-profiler, a hybrid behavior based analysis and classification system for mobile malware. Andro-profiler main goals are efficiency, scalability, and accuracy. For that, Andro-profiler classifies malware by exploiting the behavior profiling extracted from the integrated system logs including system calls. Andro-profiler executes a malicious application on an emulator in order to generate the integrated system logs, and creates human-readable behavior profiles by analyzing the integrated system logs. By comparing the behavior profile of malicious application with representative behavior profile for each malware family using a weighted similarity matching technique, Andro-profiler detects and classifies it into malware families. The experiment results demonstrate that Andro-profiler is scalable, performs well in detecting and classifying malware with accuracy greater than 98 %, outperforms the existing state-of-the-art work, and is capable of identifying 0-day mobile malware samples.
COMPARISON OF VARIOUS APPROACHES TO MULTI-CHANNEL INFORMATION FUSION IN C-OTDR SYSTEMS FOR REMOTE MONITORING OF EXTENDED OBJECTS

Directory of Open Access Journals (Sweden)

A. V. Timofeev

2015-01-01

Full Text Available The paper presents new results concerning selection of optimal information fusion formula for ensembles of COTDR channels. Here C-OTDR is a coherent optical time domain reflectometer. Each of these channels provides data for appropriate automatic classifier which is designed to classify the elastic vibration sources in the multiclass case. Those classifiers form a so-called classifiers ensemble. Ensembles of Lipschitz Classifiers were considered. In this case the goal of information fusion is to create an integral classificator designed for effective classification of seismoacoustic target events. The Matching Pursuit Optimization Ensemble Classifiers (MPOEC, the Linear Programming Boosting (LP-Boost (LP-β and LP-B variants, the Multiple Kernel Learning (MKL, and Weighing of Inversely as Lipschitz Constants (WILC approaches were compared. The WILC is a brand new approach to optimal fusion of Lipschitz Classifiers Ensembles. The basics of these methods have been briefly described along with intrinsic features. All of those methods are based on reducing the task of choosing convex hull parameters to a solution of an optimization problem. All of the mentioned approaches can be successfully used for using in the C-OTDR system data processing. Results of practical usage are presented.

Assessment of fractures classified as non-mineralised in the Sicada database

International Nuclear Information System (INIS)

Claesson Liljedahl, Lillemor; Munier, Raymond; Sandstroem, Bjoern; Drake, Henrik; Tullborg, Eva-Lena

2011-03-01

The general objective of this report was to describe the results of the investigation of fractures classified as non-mineralised in Sicada. Such fractures exist at Forsmark and at Laxemar. The main aims of the investigation of these fractures were to: - Quantify the number of non-mineralised fractures (i.e. fractures lacking mineral coating) in Sicada (table: p f ract c ore e shi). - Closely examine a selection of fractures recorded as non-mineralised in Sicada. - Outline possible reasons for the existence of non-mineralised fractures. The work has involved extraction of fracture data from Sicada and subsequent statistical analysis. Since several thousand fractures are classified as non-mineralised in Sicada, it was not a practical possibility to include all these in this study, we examined one fracture sub-set from each site. We investigated a sample of 204 of these fractures in detail (see Sections 1.1 and 2.4). Rock mechanical differences between Forsmark and Laxemar and kinematic analysis of fracture surfaces is not discussed in this report
Classifying seismic waveforms from scratch: a case study in the alpine environment

Science.gov (United States)

Hammer, C.; Ohrnberger, M.; Fäh, D.

2013-01-01

Nowadays, an increasing amount of seismic data is collected by daily observatory routines. The basic step for successfully analyzing those data is the correct detection of various event types. However, the visually scanning process is a time-consuming task. Applying standard techniques for detection like the STA/LTA trigger still requires the manual control for classification. Here, we present a useful alternative. The incoming data stream is scanned automatically for events of interest. A stochastic classifier, called hidden Markov model, is learned for each class of interest enabling the recognition of highly variable waveforms. In contrast to other automatic techniques as neural networks or support vector machines the algorithm allows to start the classification from scratch as soon as interesting events are identified. Neither the tedious process of collecting training samples nor a time-consuming configuration of the classifier is required. An approach originally introduced for the volcanic task force action allows to learn classifier properties from a single waveform example and some hours of background recording. Besides a reduction of required workload this also enables to detect very rare events. Especially the latter feature provides a milestone point for the use of seismic devices in alpine warning systems. Furthermore, the system offers the opportunity to flag new signal classes that have not been defined before. We demonstrate the application of the classification system using a data set from the Swiss Seismological Survey achieving very high recognition rates. In detail we document all refinements of the classifier providing a step-by-step guide for the fast set up of a well-working classification system.
Automatic classification of endogenous seismic sources within a landslide body using random forest algorithm

Science.gov (United States)

Provost, Floriane; Hibert, Clément; Malet, Jean-Philippe; Stumpf, André; Doubre, Cécile

2016-04-01

Different studies have shown the presence of microseismic activity in soft-rock landslides. The seismic signals exhibit significantly different features in the time and frequency domains which allow their classification and interpretation. Most of the classes could be associated with different mechanisms of deformation occurring within and at the surface (e.g. rockfall, slide-quake, fissure opening, fluid circulation). However, some signals remain not fully understood and some classes contain few examples that prevent any interpretation. To move toward a more complete interpretation of the links between the dynamics of soft-rock landslides and the physical processes controlling their behaviour, a complete catalog of the endogeneous seismicity is needed. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forest classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied to the seismicity recorded at the Super-Sauze landslide between 2013 and 2015. We selected a dozen of seismic signal features that characterize precisely its spectral content (e.g. central frequency, spectrum width, energy in several frequency bands, spectrogram shape, spectrum local and global maxima
Classifying the Optical Morphology of Shocked POststarburst Galaxies

Science.gov (United States)

Stewart, Tess; SPOGs Team

2018-01-01

The Shocked POststarburst Galaxy Survey (SPOGS) is a sample of galaxies in transition from blue, star forming spirals to red, inactive ellipticals. These galaxies are earlier in the transition than classical poststarburst samples. We have classified the physical characteristics of the full sample of 1067 SPOGs in 7 categories, covering (1) their shape; (2) the relative prominence of their nuclei; (3) the uniformity of their optical color; (4) whether the outskirts of the galaxy were indicative of on-going star formation; (5) whether they are engaged in interactions with other galaxies, and if so, (6) the kinds of galaxies with which they are interacting; and (7) the presence of asymmetrical features, possibly indicative of recent interactions. We determined that a plurality of SPOGs are in elliptical galaxies, indicating morphological transformations may tend to conclude before other indicators of transitions have faded. Further, early-type SPOGs also tend to have the brightest optical nuclei. Most galaxies do not show signs of current or recent interactions. We used these classifications to search for correlations between qualitative and quantitative characteristics of SPOGs using Sloan Digital Sky Survey and Wide-field Infrared Survey Explorer magnitudes. We find that relative optical nuclear brightness is not a good indicator of the presence of an active galactic nuclei and that galaxies with visible indications of active star formation also cluster in optical color and diagnostic line ratios.
Classifying low-grade and high-grade bladder cancer using label-free serum surface-enhanced Raman spectroscopy and support vector machine

Science.gov (United States)

Zhang, Yanjiao; Lai, Xiaoping; Zeng, Qiuyao; Li, Linfang; Lin, Lin; Li, Shaoxin; Liu, Zhiming; Su, Chengkang; Qi, Minni; Guo, Zhouyi

2018-03-01

This study aims to classify low-grade and high-grade bladder cancer (BC) patients using serum surface-enhanced Raman scattering (SERS) spectra and support vector machine (SVM) algorithms. Serum SERS spectra are acquired from 88 serum samples with silver nanoparticles as the SERS-active substrate. Diagnostic accuracies of 96.4% and 95.4% are obtained when differentiating the serum SERS spectra of all BC patients versus normal subjects and low-grade versus high-grade BC patients, respectively, with optimal SVM classifier models. This study demonstrates that the serum SERS technique combined with SVM has great potential to noninvasively detect and classify high-grade and low-grade BC patients.
32 CFR 2400.30 - Reproduction of classified information.

Science.gov (United States)

2010-07-01

... 32 National Defense 6 2010-07-01 2010-07-01 false Reproduction of classified information. 2400.30... SECURITY PROGRAM Safeguarding § 2400.30 Reproduction of classified information. Documents or portions of... the originator or higher authority. Any stated prohibition against reproduction shall be strictly...
A Novel Wearable Device for Food Intake and Physical Activity Recognition

Directory of Open Access Journals (Sweden)

Muhammad Farooq

2016-07-01

Full Text Available Presence of speech and motion artifacts has been shown to impact the performance of wearable sensor systems used for automatic detection of food intake. This work presents a novel wearable device which can detect food intake even when the user is physically active and/or talking. The device consists of a piezoelectric strain sensor placed on the temporalis muscle, an accelerometer, and a data acquisition module connected to the temple of eyeglasses. Data from 10 participants was collected while they performed activities including quiet sitting, talking, eating while sitting, eating while walking, and walking. Piezoelectric strain sensor and accelerometer signals were divided into non-overlapping epochs of 3 s; four features were computed for each signal. To differentiate between eating and not eating, as well as between sedentary postures and physical activity, two multiclass classification approaches are presented. The first approach used a single classifier with sensor fusion and the second approach used two-stage classification. The best results were achieved when two separate linear support vector machine (SVM classifiers were trained for food intake and activity detection, and their results were combined using a decision tree (two-stage classification to determine the final class. This approach resulted in an average F1-score of 99.85% and area under the curve (AUC of 0.99 for multiclass classification. With its ability to differentiate between food intake and activity level, this device may potentially be used for tracking both energy intake and energy expenditure.
Robust Combining of Disparate Classifiers Through Order Statistics

Science.gov (United States)

Tumer, Kagan; Ghosh, Joydeep

2001-01-01

Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.
Knowledge Uncertainty and Composed Classifier

Czech Academy of Sciences Publication Activity Database

Klimešová, Dana; Ocelíková, E.

2007-01-01

Roč. 1, č. 2 (2007), s. 101-105 ISSN 1998-0140 Institutional research plan: CEZ:AV0Z10750506 Keywords : Boosting architecture * contextual modelling * composed classifier * knowledge management, * knowledge * uncertainty Subject RIV: IN - Informatics, Computer Science
Representative Vector Machines: A Unified Framework for Classical Classifiers.

Science.gov (United States)

Gui, Jie; Liu, Tongliang; Tao, Dacheng; Sun, Zhenan; Tan, Tieniu

2016-08-01

Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k -NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.
41 CFR 105-62.102 - Authority to originally classify.

Science.gov (United States)

2010-07-01

... originally classify. (a) Top secret, secret, and confidential. The authority to originally classify information as Top Secret, Secret, or Confidential may be exercised only by the Administrator and is delegable...
Automatic discrimination between safe and unsafe swallowing using a reputation-based classifier

Directory of Open Access Journals (Sweden)

Nikjoo Mohammad S

2011-11-01

Full Text Available Abstract Background Swallowing accelerometry has been suggested as a potential non-invasive tool for bedside dysphagia screening. Various vibratory signal features and complementary measurement modalities have been put forth in the literature for the potential discrimination between safe and unsafe swallowing. To date, automatic classification of swallowing accelerometry has exclusively involved a single-axis of vibration although a second axis is known to contain additional information about the nature of the swallow. Furthermore, the only published attempt at automatic classification in adult patients has been based on a small sample of swallowing vibrations. Methods In this paper, a large corpus of dual-axis accelerometric signals were collected from 30 older adults (aged 65.47 ± 13.4 years, 15 male referred to videofluoroscopic examination on the suspicion of dysphagia. We invoked a reputation-based classifier combination to automatically categorize the dual-axis accelerometric signals into safe and unsafe swallows, as labeled via videofluoroscopic review. From these participants, a total of 224 swallowing samples were obtained, 164 of which were labeled as unsafe swallows (swallows where the bolus entered the airway and 60 as safe swallows. Three separate support vector machine (SVM classifiers and eight different features were selected for classification. Results With selected time, frequency and information theoretic features, the reputation-based algorithm distinguished between safe and unsafe swallowing with promising accuracy (80.48 ± 5.0%, high sensitivity (97.1 ± 2% and modest specificity (64 ± 8.8%. Interpretation of the most discriminatory features revealed that in general, unsafe swallows had lower mean vibration amplitude and faster autocorrelation decay, suggestive of decreased hyoid excursion and compromised coordination, respectively. Further, owing to its performance-based weighting of component classifiers, the static
Hierarchy-associated semantic-rule inference framework for classifying indoor scenes

Science.gov (United States)

Yu, Dan; Liu, Peng; Ye, Zhipeng; Tang, Xianglong; Zhao, Wei

2016-03-01

Typically, the initial task of classifying indoor scenes is challenging, because the spatial layout and decoration of a scene can vary considerably. Recent efforts at classifying object relationships commonly depend on the results of scene annotation and predefined rules, making classification inflexible. Furthermore, annotation results are easily affected by external factors. Inspired by human cognition, a scene-classification framework was proposed using the empirically based annotation (EBA) and a match-over rule-based (MRB) inference system. The semantic hierarchy of images is exploited by EBA to construct rules empirically for MRB classification. The problem of scene classification is divided into low-level annotation and high-level inference from a macro perspective. Low-level annotation involves detecting the semantic hierarchy and annotating the scene with a deformable-parts model and a bag-of-visual-words model. In high-level inference, hierarchical rules are extracted to train the decision tree for classification. The categories of testing samples are generated from the parts to the whole. Compared with traditional classification strategies, the proposed semantic hierarchy and corresponding rules reduce the effect of a variable background and improve the classification performance. The proposed framework was evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.
A cardiorespiratory classifier of voluntary and involuntary electrodermal activity

Directory of Open Access Journals (Sweden)

Sejdic Ervin

2010-02-01

Full Text Available Abstract Background Electrodermal reactions (EDRs can be attributed to many origins, including spontaneous fluctuations of electrodermal activity (EDA and stimuli such as deep inspirations, voluntary mental activity and startling events. In fields that use EDA as a measure of psychophysiological state, the fact that EDRs may be elicited from many different stimuli is often ignored. This study attempts to classify observed EDRs as voluntary (i.e., generated from intentional respiratory or mental activity or involuntary (i.e., generated from startling events or spontaneous electrodermal fluctuations. Methods Eight able-bodied participants were subjected to conditions that would cause a change in EDA: music imagery, startling noises, and deep inspirations. A user-centered cardiorespiratory classifier consisting of 1 an EDR detector, 2 a respiratory filter and 3 a cardiorespiratory filter was developed to automatically detect a participant's EDRs and to classify the origin of their stimulation as voluntary or involuntary. Results Detected EDRs were classified with a positive predictive value of 78%, a negative predictive value of 81% and an overall accuracy of 78%. Without the classifier, EDRs could only be correctly attributed as voluntary or involuntary with an accuracy of 50%. Conclusions The proposed classifier may enable investigators to form more accurate interpretations of electrodermal activity as a measure of an individual's psychophysiological state.
Learning to classify wakes from local sensory information

Science.gov (United States)

Alsalman, Mohamad; Colvert, Brendan; Kanso, Eva; Kanso Team

2017-11-01

Aquatic organisms exhibit remarkable abilities to sense local flow signals contained in their fluid environment and to surmise the origins of these flows. For example, fish can discern the information contained in various flow structures and utilize this information for obstacle avoidance and prey tracking. Flow structures created by flapping and swimming bodies are well characterized in the fluid dynamics literature; however, such characterization relies on classical methods that use an external observer to reconstruct global flow fields. The reconstructed flows, or wakes, are then classified according to the unsteady vortex patterns. Here, we propose a new approach for wake identification: we classify the wakes resulting from a flapping airfoil by applying machine learning algorithms to local flow information. In particular, we simulate the wakes of an oscillating airfoil in an incoming flow, extract the downstream vorticity information, and train a classifier to learn the different flow structures and classify new ones. This data-driven approach provides a promising framework for underwater navigation and detection in application to autonomous bio-inspired vehicles.
Bayesian Classifier for Medical Data from Doppler Unit

Directory of Open Access Journals (Sweden)

J. Málek

2006-01-01

Full Text Available Nowadays, hand-held ultrasonic Doppler units (probes are often used for noninvasive screening of atherosclerosis in the arteries of the lower limbs. The mean velocity of blood flow in time and blood pressures are measured on several positions on each lower limb. By listening to the acoustic signal generated by the device or by reading the signal displayed on screen, a specialist can detect peripheral arterial disease (PAD.This project aims to design software that will be able to analyze data from such a device and classify it into several diagnostic classes. At the Department of Functional Diagnostics at the Regional Hospital in Liberec a database of several hundreds signals was collected. In cooperation with the specialist, the signals were manually classified into four classes. For each class, selected signal features were extracted and then used for training a Bayesian classifier. Another set of signals was used for evaluating and optimizing the parameters of the classifier. Slightly above 84 % of successfully recognized diagnostic states, was recently achieved on the test data.
An ensemble self-training protein interaction article classifier.

Science.gov (United States)

Chen, Yifei; Hou, Ping; Manderick, Bernard

2014-01-01

Protein-protein interaction (PPI) is essential to understand the fundamental processes governing cell biology. The mining and curation of PPI knowledge are critical for analyzing proteomics data. Hence it is desired to classify articles PPI-related or not automatically. In order to build interaction article classification systems, an annotated corpus is needed. However, it is usually the case that only a small number of labeled articles can be obtained manually. Meanwhile, a large number of unlabeled articles are available. By combining ensemble learning and semi-supervised self-training, an ensemble self-training interaction classifier called EST_IACer is designed to classify PPI-related articles based on a small number of labeled articles and a large number of unlabeled articles. A biological background based feature weighting strategy is extended using the category information from both labeled and unlabeled data. Moreover, a heuristic constraint is put forward to select optimal instances from unlabeled data to improve the performance further. Experiment results show that the EST_IACer can classify the PPI related articles effectively and efficiently.
Classifying Linear Canonical Relations

OpenAIRE

Lorand, Jonathan

2015-01-01

In this Master's thesis, we consider the problem of classifying, up to conjugation by linear symplectomorphisms, linear canonical relations (lagrangian correspondences) from a finite-dimensional symplectic vector space to itself. We give an elementary introduction to the theory of linear canonical relations and present partial results toward the classification problem. This exposition should be accessible to undergraduate students with a basic familiarity with linear algebra.
Classified facilities for environmental protection

International Nuclear Information System (INIS)

Anon.

1993-02-01

The legislation of the classified facilities governs most of the dangerous or polluting industries or fixed activities. It rests on the law of 9 July 1976 concerning facilities classified for environmental protection and its application decree of 21 September 1977. This legislation, the general texts of which appear in this volume 1, aims to prevent all the risks and the harmful effects coming from an installation (air, water or soil pollutions, wastes, even aesthetic breaches). The polluting or dangerous activities are defined in a list called nomenclature which subjects the facilities to a declaration or an authorization procedure. The authorization is delivered by the prefect at the end of an open and contradictory procedure after a public survey. In addition, the facilities can be subjected to technical regulations fixed by the Environment Minister (volume 2) or by the prefect for facilities subjected to declaration (volume 3). (A.B.)
Defending Malicious Script Attacks Using Machine Learning Classifiers

Directory of Open Access Journals (Sweden)

Nayeem Khan

2017-01-01

Full Text Available The web application has become a primary target for cyber criminals by injecting malware especially JavaScript to perform malicious activities for impersonation. Thus, it becomes an imperative to detect such malicious code in real time before any malicious activity is performed. This study proposes an efficient method of detecting previously unknown malicious java scripts using an interceptor at the client side by classifying the key features of the malicious code. Feature subset was obtained by using wrapper method for dimensionality reduction. Supervised machine learning classifiers were used on the dataset for achieving high accuracy. Experimental results show that our method can efficiently classify malicious code from benign code with promising results.

Spatio-Spectral Method for Estimating Classified Regions with High Confidence using MODIS Data

International Nuclear Information System (INIS)

Katiyal, Anuj; Rajan, Dr K S

2014-01-01

In studies like change analysis, the availability of very high resolution (VHR)/high resolution (HR) imagery for a particular period and region is a challenge due to the sensor revisit times and high cost of acquisition. Therefore, most studies prefer lower resolution (LR) sensor imagery with frequent revisit times, in addition to their cost and computational advantages. Further, the classification techniques provide us a global estimate of the class accuracy, which limits its utility if the accuracy is low. In this work, we focus on the sub-classification problem of LR images and estimate regions of higher confidence than the global classification accuracy within its classified region. The spectrally classified data was mined into spatially clustered regions and further refined and processed using statistical measures to arrive at local high confidence regions (LHCRs), for every class. Rabi season MODIS data of January 2006 and 2007 was used for this study and the evaluation of LHCR was done using the APLULC 2005 classified data. For Jan-2007, the global class accuracies for water bodies (WB), forested regions (FR) and Kharif crops and barren lands (KB) were 89%, 71.7% and 71.23% respectively, while the respective LHCRs had accuracies of 96.67%, 89.4% and 80.9% covering an area of 46%, 29% and 14.5% of the initially classified areas. Though areas are reduced, LHCRs with higher accuracies help in extracting more representative class regions. Identification of such regions can facilitate in improving the classification time and processing for HR images when combined with the more frequently acquired LR imagery, isolate pure vs. mixed/impure pixels and as training samples locations for HR imagery
Reducing variability in the output of pattern classifiers using histogram shaping

International Nuclear Information System (INIS)

Gupta, Shalini; Kan, Chih-Wen; Markey, Mia K.

2010-01-01

Purpose: The authors present a novel technique based on histogram shaping to reduce the variability in the output and (sensitivity, specificity) pairs of pattern classifiers with identical ROC curves, but differently distributed outputs. Methods: The authors identify different sources of variability in the output of linear pattern classifiers with identical ROC curves, which also result in classifiers with differently distributed outputs. They theoretically develop a novel technique based on the matching of the histograms of these differently distributed pattern classifier outputs to reduce the variability in their (sensitivity, specificity) pairs at fixed decision thresholds, and to reduce the variability in their actual output values. They empirically demonstrate the efficacy of the proposed technique by means of analyses on the simulated data and real world mammography data. Results: For the simulated data, with three different known sources of variability, and for the real world mammography data with unknown sources of variability, the proposed classifier output calibration technique significantly reduced the variability in the classifiers' (sensitivity, specificity) pairs at fixed decision thresholds. Furthermore, for classifiers with monotonically or approximately monotonically related output variables, the histogram shaping technique also significantly reduced the variability in their actual output values. Conclusions: Classifier output calibration based on histogram shaping can be successfully employed to reduce the variability in the output values and (sensitivity, specificity) pairs of pattern classifiers with identical ROC curves, but differently distributed outputs.
Use of information barriers to protect classified information

International Nuclear Information System (INIS)

MacArthur, D.; Johnson, M.W.; Nicholas, N.J.; Whiteson, R.

1998-01-01

This paper discusses the detailed requirements for an information barrier (IB) for use with verification systems that employ intrusive measurement technologies. The IB would protect classified information in a bilateral or multilateral inspection of classified fissile material. Such a barrier must strike a balance between providing the inspecting party the confidence necessary to accept the measurement while protecting the inspected party's classified information. The authors discuss the structure required of an IB as well as the implications of the IB on detector system maintenance. A defense-in-depth approach is proposed which would provide assurance to the inspected party that all sensitive information is protected and to the inspecting party that the measurements are being performed as expected. The barrier could include elements of physical protection (such as locks, surveillance systems, and tamper indicators), hardening of key hardware components, assurance of capabilities and limitations of hardware and software systems, administrative controls, validation and verification of the systems, and error detection and resolution. Finally, an unclassified interface could be used to display and, possibly, record measurement results. The introduction of an IB into an analysis system may result in many otherwise innocuous components (detectors, analyzers, etc.) becoming classified and unavailable for routine maintenance by uncleared personnel. System maintenance and updating will be significantly simplified if the classification status of as many components as possible can be made reversible (i.e. the component can become unclassified following the removal of classified objects)
Classifying sows' activity types from acceleration patterns

DEFF Research Database (Denmark)

Cornou, Cecile; Lundbye-Christensen, Søren

2008-01-01

An automated method of classifying sow activity using acceleration measurements would allow the individual sow's behavior to be monitored throughout the reproductive cycle; applications for detecting behaviors characteristic of estrus and farrowing or to monitor illness and welfare can be foreseen....... This article suggests a method of classifying five types of activity exhibited by group-housed sows. The method involves the measurement of acceleration in three dimensions. The five activities are: feeding, walking, rooting, lying laterally and lying sternally. Four time series of acceleration (the three...
Naive Bayesian classifiers for multinomial features: a theoretical analysis

CSIR Research Space (South Africa)

Van Dyk, E

2007-11-01

Full Text Available The authors investigate the use of naive Bayesian classifiers for multinomial feature spaces and derive error estimates for these classifiers. The error analysis is done by developing a mathematical model to estimate the probability density...
Detection of herbaceous-plant pararetrovirus in lichen herbarium samples.

Science.gov (United States)

Petrzik, K; Koloniuk, I; Sarkisová, T; Číhal, L

2016-06-01

Cauliflower mosaic virus (CaMV) - a plant pararetrovirus that naturally causes diseases in Brassicaceae and Solanaceae plant hosts worldwide - has been detected by PCR for the first time in herbarium samples of Usnea sp. lichens. The virus's presence in these lichens did not result in any micro- or macromorphological changes, and the herbarium records were classified as representative for the distinct species. Sequence analyses classified all the detected viruses into one lineage of CaMV isolates. We have shown here that herbarium samples could be a good source for virus study, especially where a longer time span is involved.
Parents' Experiences and Perceptions when Classifying their Children with Cerebral Palsy: Recommendations for Service Providers.

Science.gov (United States)

Scime, Natalie V; Bartlett, Doreen J; Brunton, Laura K; Palisano, Robert J

2017-08-01

This study investigated the experiences and perceptions of parents of children with cerebral palsy (CP) when classifying their children using the Gross Motor Function Classification System (GMFCS), the Manual Ability Classification System (MACS), and the Communication Function Classification System (CFCS). The second aim was to collate parents' recommendations for service providers on how to interact and communicate with families. A purposive sample of seven parents participating in the On Track study was recruited. Semi-structured interviews were conducted orally and were audiotaped, transcribed, and coded openly. A descriptive interpretive approach within a pragmatic perspective was used during analysis. Seven themes encompassing parents' experiences and perspectives reflect a process of increased understanding when classifying their children, with perceptions of utility evident throughout this process. Six recommendations for service providers emerged, including making the child a priority and being a dependable resource. Knowledge of parents' experiences when using the GMFCS, MACS, and CFCS can provide useful insight for service providers collaborating with parents to classify function in children with CP. Using the recommendations from these parents can facilitate family-provider collaboration for goal setting and intervention planning.
Exploring diversity in ensemble classification: Applications in large area land cover mapping

Science.gov (United States)

Mellor, Andrew; Boukir, Samia

2017-07-01

Ensemble classifiers, such as random forests, are now commonly applied in the field of remote sensing, and have been shown to perform better than single classifier systems, resulting in reduced generalisation error. Diversity across the members of ensemble classifiers is known to have a strong influence on classification performance - whereby classifier errors are uncorrelated and more uniformly distributed across ensemble members. The relationship between ensemble diversity and classification performance has not yet been fully explored in the fields of information science and machine learning and has never been examined in the field of remote sensing. This study is a novel exploration of ensemble diversity and its link to classification performance, applied to a multi-class canopy cover classification problem using random forests and multisource remote sensing and ancillary GIS data, across seven million hectares of diverse dry-sclerophyll dominated public forests in Victoria Australia. A particular emphasis is placed on analysing the relationship between ensemble diversity and ensemble margin - two key concepts in ensemble learning. The main novelty of our work is on boosting diversity by emphasizing the contribution of lower margin instances used in the learning process. Exploring the influence of tree pruning on diversity is also a new empirical analysis that contributes to a better understanding of ensemble performance. Results reveal insights into the trade-off between ensemble classification accuracy and diversity, and through the ensemble margin, demonstrate how inducing diversity by targeting lower margin training samples is a means of achieving better classifier performance for more difficult or rarer classes and reducing information redundancy in classification problems. Our findings inform strategies for collecting training data and designing and parameterising ensemble classifiers, such as random forests. This is particularly important in large area
Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

Science.gov (United States)

Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

2018-03-20

The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2 = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2 = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification.

Science.gov (United States)

Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong

2016-01-01

Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).
Robust online tracking via adaptive samples selection with saliency detection

Science.gov (United States)

Yan, Jia; Chen, Xi; Zhu, QiuPing

2013-12-01

Online tracking has shown to be successful in tracking of previously unknown objects. However, there are two important factors which lead to drift problem of online tracking, the one is how to select the exact labeled samples even when the target locations are inaccurate, and the other is how to handle the confusors which have similar features with the target. In this article, we propose a robust online tracking algorithm with adaptive samples selection based on saliency detection to overcome the drift problem. To deal with the problem of degrading the classifiers using mis-aligned samples, we introduce the saliency detection method to our tracking problem. Saliency maps and the strong classifiers are combined to extract the most correct positive samples. Our approach employs a simple yet saliency detection algorithm based on image spectral residual analysis. Furthermore, instead of using the random patches as the negative samples, we propose a reasonable selection criterion, in which both the saliency confidence and similarity are considered with the benefits that confusors in the surrounding background are incorporated into the classifiers update process before the drift occurs. The tracking task is formulated as a binary classification via online boosting framework. Experiment results in several challenging video sequences demonstrate the accuracy and stability of our tracker.
Textual and shape-based feature extraction and neuro-fuzzy classifier for nuclear track recognition

Science.gov (United States)

Khayat, Omid; Afarideh, Hossein

2013-04-01

Track counting algorithms as one of the fundamental principles of nuclear science have been emphasized in the recent years. Accurate measurement of nuclear tracks on solid-state nuclear track detectors is the aim of track counting systems. Commonly track counting systems comprise a hardware system for the task of imaging and software for analysing the track images. In this paper, a track recognition algorithm based on 12 defined textual and shape-based features and a neuro-fuzzy classifier is proposed. Features are defined so as to discern the tracks from the background and small objects. Then, according to the defined features, tracks are detected using a trained neuro-fuzzy system. Features and the classifier are finally validated via 100 Alpha track images and 40 training samples. It is shown that principle textual and shape-based features concomitantly yield a high rate of track detection compared with the single-feature based methods.
Neural network classifier of attacks in IP telephony

Science.gov (United States)

Safarik, Jakub; Voznak, Miroslav; Mehic, Miralem; Partila, Pavol; Mikulec, Martin

2014-05-01

Various types of monitoring mechanism allow us to detect and monitor behavior of attackers in VoIP networks. Analysis of detected malicious traffic is crucial for further investigation and hardening the network. This analysis is typically based on statistical methods and the article brings a solution based on neural network. The proposed algorithm is used as a classifier of attacks in a distributed monitoring network of independent honeypot probes. Information about attacks on these honeypots is collected on a centralized server and then classified. This classification is based on different mechanisms. One of them is based on the multilayer perceptron neural network. The article describes inner structure of used neural network and also information about implementation of this network. The learning set for this neural network is based on real attack data collected from IP telephony honeypot called Dionaea. We prepare the learning set from real attack data after collecting, cleaning and aggregation of this information. After proper learning is the neural network capable to classify 6 types of most commonly used VoIP attacks. Using neural network classifier brings more accurate attack classification in a distributed system of honeypots. With this approach is possible to detect malicious behavior in a different part of networks, which are logically or geographically divided and use the information from one network to harden security in other networks. Centralized server for distributed set of nodes serves not only as a collector and classifier of attack data, but also as a mechanism for generating a precaution steps against attacks.
A Machine Learned Classifier That Uses Gene Expression Data to Accurately Predict Estrogen Receptor Status

Science.gov (United States)

Bastani, Meysam; Vos, Larissa; Asgarian, Nasimeh; Deschenes, Jean; Graham, Kathryn; Mackey, John; Greiner, Russell

2013-01-01

Background Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. Methods To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. Results This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. Conclusions Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions. PMID:24312637
Development of headspace solid-phase microextraction method for ...

African Journals Online (AJOL)

A headspace solid-phase microextraction (HS-SPME) method was developed as a preliminary investigation using univariate approach for the analysis of 14 multiclass pesticide residues in fruits and vegetable samples. The gas chromatography mass spectrometry parameters (desorption temperature and time, column flow ...
SVM classifier on chip for melanoma detection.

Science.gov (United States)

Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak

2017-07-01

Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.
A radial basis classifier for the automatic detection of aspiration in children with dysphagia

Directory of Open Access Journals (Sweden)

Blain Stefanie

2006-07-01

The proposed aspiration classification algorithm provides promising accuracy for aspiration detection in children. The classifier is conducive to hardware implementation as a non-invasive, portable "aspirometer". Future research should focus on further enhancement of accuracy rates by considering other signal features, classifier methods, or an augmented variety of training samples. The present study is an important first step towards the eventual development of wearable intelligent intervention systems for the diagnosis and management of aspiration.
Performance of classification confidence measures in dynamic classifier systems

Czech Academy of Sciences Publication Activity Database

Štefka, D.; Holeňa, Martin

2013-01-01

Roč. 23, č. 4 (2013), s. 299-319 ISSN 1210-0552 R&D Projects: GA ČR GA13-17187S Institutional support: RVO:67985807 Keywords : classifier combining * dynamic classifier systems * classification confidence Subject RIV: IN - Informatics, Computer Science Impact factor: 0.412, year: 2013
SAR Target Recognition Based on Multi-feature Multiple Representation Classifier Fusion

Directory of Open Access Journals (Sweden)

Zhang Xinzheng

2017-10-01

Full Text Available In this paper, we present a Synthetic Aperture Radar (SAR image target recognition algorithm based on multi-feature multiple representation learning classifier fusion. First, it extracts three features from the SAR images, namely principal component analysis, wavelet transform, and Two-Dimensional Slice Zernike Moments (2DSZM features. Second, we harness the sparse representation classifier and the cooperative representation classifier with the above-mentioned features to get six predictive labels. Finally, we adopt classifier fusion to obtain the final recognition decision. We researched three different classifier fusion algorithms in our experiments, and the results demonstrate thatusing Bayesian decision fusion gives thebest recognition performance. The method based on multi-feature multiple representation learning classifier fusion integrates the discrimination of multi-features and combines the sparse and cooperative representation classification performance to gain complementary advantages and to improve recognition accuracy. The experiments are based on the Moving and Stationary Target Acquisition and Recognition (MSTAR database,and they demonstrate the effectiveness of the proposed approach.
Addressing the Challenge of Defining Valid Proteomic Biomarkers and Classifiers

LENUS (Irish Health Repository)

Dakna, Mohammed

2010-12-10

Abstract Background The purpose of this manuscript is to provide, based on an extensive analysis of a proteomic data set, suggestions for proper statistical analysis for the discovery of sets of clinically relevant biomarkers. As tractable example we define the measurable proteomic differences between apparently healthy adult males and females. We choose urine as body-fluid of interest and CE-MS, a thoroughly validated platform technology, allowing for routine analysis of a large number of samples. The second urine of the morning was collected from apparently healthy male and female volunteers (aged 21-40) in the course of the routine medical check-up before recruitment at the Hannover Medical School. Results We found that the Wilcoxon-test is best suited for the definition of potential biomarkers. Adjustment for multiple testing is necessary. Sample size estimation can be performed based on a small number of observations via resampling from pilot data. Machine learning algorithms appear ideally suited to generate classifiers. Assessment of any results in an independent test-set is essential. Conclusions Valid proteomic biomarkers for diagnosis and prognosis only can be defined by applying proper statistical data mining procedures. In particular, a justification of the sample size should be part of the study design.

Detection of Driver Drowsiness Using Wavelet Analysis of Heart Rate Variability and a Support Vector Machine Classifier

Directory of Open Access Journals (Sweden)

Gang Li

2013-12-01

Full Text Available Driving while fatigued is just as dangerous as drunk driving and may result in car accidents. Heart rate variability (HRV analysis has been studied recently for the detection of driver drowsiness. However, the detection reliability has been lower than anticipated, because the HRV signals of drivers were always regarded as stationary signals. The wavelet transform method is a method for analyzing non-stationary signals. The aim of this study is to classify alert and drowsy driving events using the wavelet transform of HRV signals over short time periods and to compare the classification performance of this method with the conventional method that uses fast Fourier transform (FFT-based features. Based on the standard shortest duration for FFT-based short-term HRV evaluation, the wavelet decomposition is performed on 2-min HRV samples, as well as 1-min and 3-min samples for reference purposes. A receiver operation curve (ROC analysis and a support vector machine (SVM classifier are used for feature selection and classification, respectively. The ROC analysis results show that the wavelet-based method performs better than the FFT-based method regardless of the duration of the HRV sample that is used. Finally, based on the real-time requirements for driver drowsiness detection, the SVM classifier is trained using eighty FFT and wavelet-based features that are extracted from 1-min HRV signals from four subjects. The averaged leave-one-out (LOO classification performance using wavelet-based feature is 95% accuracy, 95% sensitivity, and 95% specificity. This is better than the FFT-based results that have 68.8% accuracy, 62.5% sensitivity, and 75% specificity. In addition, the proposed hardware platform is inexpensive and easy-to-use.
Classifying spaces with virtually cyclic stabilizers for linear groups

DEFF Research Database (Denmark)

Degrijse, Dieter Dries; Köhl, Ralf; Petrosyan, Nansen

2015-01-01

We show that every discrete subgroup of GL(n, ℝ) admits a finite-dimensional classifying space with virtually cyclic stabilizers. Applying our methods to SL(3, ℤ), we obtain a four-dimensional classifying space with virtually cyclic stabilizers and a decomposition of the algebraic K-theory of its...
Intuitive Action Set Formation in Learning Classifier Systems with Memory Registers

NARCIS (Netherlands)

Simões, L.F.; Schut, M.C.; Haasdijk, E.W.

2008-01-01

An important design goal in Learning Classifier Systems (LCS) is to equally reinforce those classifiers which cause the level of reward supplied by the environment. In this paper, we propose a new method for action set formation in LCS. When applied to a Zeroth Level Classifier System with Memory
Data Stream Classification Based on the Gamma Classifier

Directory of Open Access Journals (Sweden)

Abril Valeria Uriarte-Arcia

2015-01-01

Full Text Available The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.
Design of Robust Neural Network Classifiers

DEFF Research Database (Denmark)

Larsen, Jan; Andersen, Lars Nonboe; Hintz-Madsen, Mads

1998-01-01

This paper addresses a new framework for designing robust neural network classifiers. The network is optimized using the maximum a posteriori technique, i.e., the cost function is the sum of the log-likelihood and a regularization term (prior). In order to perform robust classification, we present...... a modified likelihood function which incorporates the potential risk of outliers in the data. This leads to the introduction of a new parameter, the outlier probability. Designing the neural classifier involves optimization of network weights as well as outlier probability and regularization parameters. We...... suggest to adapt the outlier probability and regularisation parameters by minimizing the error on a validation set, and a simple gradient descent scheme is derived. In addition, the framework allows for constructing a simple outlier detector. Experiments with artificial data demonstrate the potential...
Comparison of several chemometric methods of libraries and classifiers for the analysis of expired drugs based on Raman spectra.

Science.gov (United States)

Gao, Qun; Liu, Yan; Li, Hao; Chen, Hui; Chai, Yifeng; Lu, Feng

2014-06-01

Some expired drugs are difficult to detect by conventional means. If they are repackaged and sold back into market, they will constitute a new public health challenge. For the detection of repackaged expired drugs within specification, paracetamol tablet from a manufacturer was used as a model drug in this study for comparison of Raman spectra-based library verification and classification methods. Raman spectra of different batches of paracetamol tablets were collected and a library including standard spectra of unexpired batches of tablets was established. The Raman spectrum of each sample was identified by cosine and correlation with the standard spectrum. The average HQI of the suspicious samples and the standard spectrum were calculated. The optimum threshold values were 0.997 and 0.998 respectively as a result of ROC and four evaluations, for which the accuracy was up to 97%. Three supervised classifiers, PLS-DA, SVM and k-NN, were chosen to establish two-class classification models and compared subsequently. They were used to establish a classification of expired batches and an unexpired batch, and predict the suspect samples. The average accuracy was 90.12%, 96.80% and 89.37% respectively. Different pre-processing techniques were tried to find that first derivative was optimal for methods of libraries and max-min normalization was optimal for that of classifiers. The results obtained from these studies indicated both libraries and classifier methods could detect the expired drugs effectively, and they should be used complementarily in the fast-screening. Copyright © 2014 Elsevier B.V. All rights reserved.
CASAnova: a multiclass support vector machine model for the classification of human sperm motility patterns.

Science.gov (United States)

Goodson, Summer G; White, Sarah; Stevans, Alicia M; Bhat, Sanjana; Kao, Chia-Yu; Jaworski, Scott; Marlowe, Tamara R; Kohlmeier, Martin; McMillan, Leonard; Zeisel, Steven H; O'Brien, Deborah A

2017-11-01

The ability to accurately monitor alterations in sperm motility is paramount to understanding multiple genetic and biochemical perturbations impacting normal fertilization. Computer-aided sperm analysis (CASA) of human sperm typically reports motile percentage and kinematic parameters at the population level, and uses kinematic gating methods to identify subpopulations such as progressive or hyperactivated sperm. The goal of this study was to develop an automated method that classifies all patterns of human sperm motility during in vitro capacitation following the removal of seminal plasma. We visually classified CASA tracks of 2817 sperm from 18 individuals and used a support vector machine-based decision tree to compute four hyperplanes that separate five classes based on their kinematic parameters. We then developed a web-based program, CASAnova, which applies these equations sequentially to assign a single classification to each motile sperm. Vigorous sperm are classified as progressive, intermediate, or hyperactivated, and nonvigorous sperm as slow or weakly motile. This program correctly classifies sperm motility into one of five classes with an overall accuracy of 89.9%. Application of CASAnova to capacitating sperm populations showed a shift from predominantly linear patterns of motility at initial time points to more vigorous patterns, including hyperactivated motility, as capacitation proceeds. Both intermediate and hyperactivated motility patterns were largely eliminated when sperm were incubated in noncapacitating medium, demonstrating the sensitivity of this method. The five CASAnova classifications are distinctive and reflect kinetic parameters of washed human sperm, providing an accurate, quantitative, and high-throughput method for monitoring alterations in motility. © The Authors 2017. Published by Oxford University Press on behalf of Society for the Study of Reproduction. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Combining multiple classifiers for age classification

CSIR Research Space (South Africa)

Van Heerden, C

2009-11-01

Full Text Available The authors compare several different classifier combination methods on a single task, namely speaker age classification. This task is well suited to combination strategies, since significantly different feature classes are employed. Support vector...
Maximum margin classifier working in a set of strings.

Science.gov (United States)

Koyano, Hitoshi; Hayashida, Morihiro; Akutsu, Tatsuya

2016-03-01

Numbers and numerical vectors account for a large portion of data. However, recently, the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem for a consensus sequence of strings demonstrated by one of the authors and co-workers in a previous study. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein-protein interactions using amino acid sequences and classifying RNAs by the secondary structure using nucleotide sequences.
A Survey of Blue-Noise Sampling and Its Applications

KAUST Repository

Yan, Dongming; Guo, Jian-Wei; Wang, Bin; Zhang, Xiao-Peng; Wonka, Peter

2015-01-01

In this paper, we survey recent approaches to blue-noise sampling and discuss their beneficial applications. We discuss the sampling algorithms that use points as sampling primitives and classify the sampling algorithms based on various aspects, e.g., the sampling domain and the type of algorithm. We demonstrate several well-known applications that can be improved by recent blue-noise sampling techniques, as well as some new applications such as dynamic sampling and blue-noise remeshing.
A Survey of Blue-Noise Sampling and Its Applications

KAUST Repository

Yan, Dongming

2015-05-05

In this paper, we survey recent approaches to blue-noise sampling and discuss their beneficial applications. We discuss the sampling algorithms that use points as sampling primitives and classify the sampling algorithms based on various aspects, e.g., the sampling domain and the type of algorithm. We demonstrate several well-known applications that can be improved by recent blue-noise sampling techniques, as well as some new applications such as dynamic sampling and blue-noise remeshing.
Bias correction for selecting the minimal-error classifier from many machine learning models.

Science.gov (United States)

Ding, Ying; Tang, Shaowu; Liao, Serena G; Jia, Jia; Oesterreich, Steffi; Lin, Yan; Tseng, George C

2014-11-15

Supervised machine learning is commonly applied in genomic research to construct a classifier from the training data that is generalizable to predict independent testing data. When test datasets are not available, cross-validation is commonly used to estimate the error rate. Many machine learning methods are available, and it is well known that no universally best method exists in general. It has been a common practice to apply many machine learning methods and report the method that produces the smallest cross-validation error rate. Theoretically, such a procedure produces a selection bias. Consequently, many clinical studies with moderate sample sizes (e.g. n = 30-60) risk reporting a falsely small cross-validation error rate that could not be validated later in independent cohorts. In this article, we illustrated the probabilistic framework of the problem and explored the statistical and asymptotic properties. We proposed a new bias correction method based on learning curve fitting by inverse power law (IPL) and compared it with three existing methods: nested cross-validation, weighted mean correction and Tibshirani-Tibshirani procedure. All methods were compared in simulation datasets, five moderate size real datasets and two large breast cancer datasets. The result showed that IPL outperforms the other methods in bias correction with smaller variance, and it has an additional advantage to extrapolate error estimates for larger sample sizes, a practical feature to recommend whether more samples should be recruited to improve the classifier and accuracy. An R package 'MLbias' and all source files are publicly available. tsenglab.biostat.pitt.edu/software.htm. ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A systems biology-based classifier for hepatocellular carcinoma diagnosis.

Directory of Open Access Journals (Sweden)

Yanqiong Zhang

Full Text Available AIM: The diagnosis of hepatocellular carcinoma (HCC in the early stage is crucial to the application of curative treatments which are the only hope for increasing the life expectancy of patients. Recently, several large-scale studies have shed light on this problem through analysis of gene expression profiles to identify markers correlated with HCC progression. However, those marker sets shared few genes in common and were poorly validated using independent data. Therefore, we developed a systems biology based classifier by combining the differential gene expression with topological features of human protein interaction networks to enhance the ability of HCC diagnosis. METHODS AND RESULTS: In the Oncomine platform, genes differentially expressed in HCC tissues relative to their corresponding normal tissues were filtered by a corrected Q value cut-off and Concept filters. The identified genes that are common to different microarray datasets were chosen as the candidate markers. Then, their networks were analyzed by GeneGO Meta-Core software and the hub genes were chosen. After that, an HCC diagnostic classifier was constructed by Partial Least Squares modeling based on the microarray gene expression data of the hub genes. Validations of diagnostic performance showed that this classifier had high predictive accuracy (85.88∼92.71% and area under ROC curve (approximating 1.0, and that the network topological features integrated into this classifier contribute greatly to improving the predictive performance. Furthermore, it has been demonstrated that this modeling strategy is not only applicable to HCC, but also to other cancers. CONCLUSION: Our analysis suggests that the systems biology-based classifier that combines the differential gene expression and topological features of human protein interaction network may enhance the diagnostic performance of HCC classifier.
Application of a semiautomatic classifier for modic and disk hernia changes in magnetic resonance

Directory of Open Access Journals (Sweden)

Eduardo López Arce Vivas

2015-03-01

Full Text Available OBJECTIVE: Early detection of degenerative changes in lumbar intervertebral disc by magnetic resonance imaging in a semiautomatic classifier for prevention of degenerative disease. METHOD: MRIs were selected with a diagnosis of degenerative disc disease or back pain from January to May 2014, with a sample of 23 patients and a total of 170 disks evaluated by sagittal T2 MRI image, first evaluated by a specialist physician in training and them were introduced into the software, being the results compared. RESULTS: One hundred and fifteen discs were evaluated by a programmed semiautomatic classifier to identify MODIC changes and hernia, which produced results "normal or MODIC" and "normal or abnormal", respectively. With a total of 230 readings, of which 141 were correct, 84 were reading errors and 10 readings were undiagnosed, the semiautomatic classifier is a useful tool for early diagnosis or established disease and is easy to apply because of the speed and ease of use; however, at this early stage of development, software is inferior to clinical observations and the results were from around 65% to 60% certainty for MODIC rating and 61% to 58% for disc herniation, compared with clinical evaluations. CONCLUSION: The comparative results between the two doctors were 94 consistent results and only 21 errors, which represents 81% certainty.
A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.

Directory of Open Access Journals (Sweden)

Meysam Bastani

Full Text Available BACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. METHODS: To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. RESULTS: This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. CONCLUSIONS: Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.
A Customizable Text Classifier for Text Mining

Directory of Open Access Journals (Sweden)

Yun-liang Zhang

2007-12-01

Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.
A survey of decision tree classifier methodology

Science.gov (United States)

Safavian, S. R.; Landgrebe, David

1991-01-01

Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.
Recognition of Arabic Sign Language Alphabet Using Polynomial Classifiers

Directory of Open Access Journals (Sweden)

M. Al-Rousan

2005-08-01

Full Text Available Building an accurate automatic sign language recognition system is of great importance in facilitating efficient communication with deaf people. In this paper, we propose the use of polynomial classifiers as a classification engine for the recognition of Arabic sign language (ArSL alphabet. Polynomial classifiers have several advantages over other classifiers in that they do not require iterative training, and that they are highly computationally scalable with the number of classes. Based on polynomial classifiers, we have built an ArSL system and measured its performance using real ArSL data collected from deaf people. We show that the proposed system provides superior recognition results when compared with previously published results using ANFIS-based classification on the same dataset and feature extraction methodology. The comparison is shown in terms of the number of misclassified test patterns. The reduction in the rate of misclassified patterns was very significant. In particular, we have achieved a 36% reduction of misclassifications on the training data and 57% on the test data.
Comparison of Classifier Architectures for Online Neural Spike Sorting.

Science.gov (United States)

Saeed, Maryam; Khan, Amir Ali; Kamboh, Awais Mehmood

2017-04-01

High-density, intracranial recordings from micro-electrode arrays need to undergo Spike Sorting in order to associate the recorded neuronal spikes to particular neurons. This involves spike detection, feature extraction, and classification. To reduce the data transmission and power requirements, on-chip real-time processing is becoming very popular. However, high computational resources are required for classifiers in on-chip spike-sorters, making scalability a great challenge. In this review paper, we analyze several popular classifiers to propose five new hardware architectures using the off-chip training with on-chip classification approach. These include support vector classification, fuzzy C-means classification, self-organizing maps classification, moving-centroid K-means classification, and Cosine distance classification. The performance of these architectures is analyzed in terms of accuracy and resource requirement. We establish that the neural networks based Self-Organizing Maps classifier offers the most viable solution. A spike sorter based on the Self-Organizing Maps classifier, requires only 7.83% of computational resources of the best-reported spike sorter, hierarchical adaptive means, while offering a 3% better accuracy at 7 dB SNR.
CAD system for quantifying emphysema severity based on multi-class classifier using CT image and spirometry information

International Nuclear Information System (INIS)

Nimura, Yukitaka; Mori, Kensaku; Kitasaka, Takayuki; Honma, Hirotoshi; Takabatake, Hirotsugu; Mori, Masaki; Natori, Hiroshi

2010-01-01

Many diagnosis methods based on CT image processing are proposed for quantifying emphysema. The most of these diagnosis methods extract lesions as Low-Attenuation Areas (LAA) by simple threshold processing and evaluate their severity by calculating the LAA (LAA%) in the lung. However, pulmonary emphysema is diagnosed by not only the LAA but also the changes of pulmonary blood vessel and the spirometric measurements. This paper proposes a novel computer-aided detection (CAD) system for quantifying emphysema by combining spirometric measurements and results of CT image processing. The experimental results revealed that the accuracy rate of the proposed method was 78.3%. It is 13.1% improvement compared with the method based on only the LAA%. (author)

32 CFR 2004.21 - Protection of Classified Information [201(e)].

Science.gov (United States)

2010-07-01

... 32 National Defense 6 2010-07-01 2010-07-01 false Protection of Classified Information [201(e... PROGRAM DIRECTIVE NO. 1 Operations § 2004.21 Protection of Classified Information [201(e)]. Procedures for... coordination process. ...
Classifying MCI Subtypes in Community-Dwelling Elderly Using Cross-Sectional and Longitudinal MRI-Based Biomarkers

Directory of Open Access Journals (Sweden)

Hao Guan

2017-09-01

Full Text Available Amnestic MCI (aMCI and non-amnestic MCI (naMCI are considered to differ in etiology and outcome. Accurately classifying MCI into meaningful subtypes would enable early intervention with targeted treatment. In this study, we employed structural magnetic resonance imaging (MRI for MCI subtype classification. This was carried out in a sample of 184 community-dwelling individuals (aged 73–85 years. Cortical surface based measurements were computed from longitudinal and cross-sectional scans. By introducing a feature selection algorithm, we identified a set of discriminative features, and further investigated the temporal patterns of these features. A voting classifier was trained and evaluated via 10 iterations of cross-validation. The best classification accuracies achieved were: 77% (naMCI vs. aMCI, 81% (aMCI vs. cognitively normal (CN and 70% (naMCI vs. CN. The best results for differentiating aMCI from naMCI were achieved with baseline features. Hippocampus, amygdala and frontal pole were found to be most discriminative for classifying MCI subtypes. Additionally, we observed the dynamics of classification of several MRI biomarkers. Learning the dynamics of atrophy may aid in the development of better biomarkers, as it may track the progression of cognitive impairment.
Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

Science.gov (United States)

2017-01-01

Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Conclusions Machine learning algorithms can classify open-text feedback
Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy.

Science.gov (United States)

Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John

2017-03-15

Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high
Tolerance to missing data using a likelihood ratio based classifier for computer-aided classification of breast cancer

International Nuclear Information System (INIS)

Bilska-Wolak, Anna O; Floyd, Carey E Jr

2004-01-01

While mammography is a highly sensitive method for detecting breast tumours, its ability to differentiate between malignant and benign lesions is low, which may result in as many as 70% of unnecessary biopsies. The purpose of this study was to develop a highly specific computer-aided diagnosis algorithm to improve classification of mammographic masses. A classifier based on the likelihood ratio was developed to accommodate cases with missing data. Data for development included 671 biopsy cases (245 malignant), with biopsy-proved outcome. Sixteen features based on the BI-RADS TM lexicon and patient history had been recorded for the cases, with 1.3 ± 1.1 missing feature values per case. Classifier evaluation methods included receiver operating characteristic and leave-one-out bootstrap sampling. The classifier achieved 32% specificity at 100% sensitivity on the 671 cases with 16 features that had missing values. Utilizing just the seven features present for all cases resulted in decreased performance at 100% sensitivity with average 19% specificity. No cases and no feature data were omitted during classifier development, showing that it is more beneficial to utilize cases with missing values than to discard incomplete cases that cannot be handled by many algorithms. Classification of mammographic masses was commendable at high sensitivity levels, indicating that benign cases could be potentially spared from biopsy
A Study of Assimilation Bias in Name-Based Sampling of Migrants

Directory of Open Access Journals (Sweden)

Schnell Rainer

2014-06-01

Full Text Available The use of personal names for screening is an increasingly popular sampling technique for migrant populations. Although this is often an effective sampling procedure, very little is known about the properties of this method. Based on a large German survey, this article compares characteristics of respondents whose names have been correctly classified as belonging to a migrant population with respondentswho aremigrants and whose names have not been classified as belonging to a migrant population. Although significant differences were found for some variables even with some large effect sizes, the overall bias introduced by name-based sampling (NBS is small as long as procedures with small false-negative rates are employed.
Ensemble of classifiers based network intrusion detection system performance bound

CSIR Research Space (South Africa)

Mkuzangwe, Nenekazi NP

2017-11-01

Full Text Available This paper provides a performance bound of a network intrusion detection system (NIDS) that uses an ensemble of classifiers. Currently researchers rely on implementing the ensemble of classifiers based NIDS before they can determine the performance...
3 CFR - Implementation of the Executive Order, “Classified National Security Information”

Science.gov (United States)

2010-01-01

... 29, 2009 Implementation of the Executive Order, “Classified National Security Information” Memorandum..., “Classified National Security Information” (the “order”), which substantially advances my goals for reforming... or handles classified information shall provide the Director of the Information Security Oversight...
Sex Bias in Classifying Borderline and Narcissistic Personality Disorder.

Science.gov (United States)

Braamhorst, Wouter; Lobbestael, Jill; Emons, Wilco H M; Arntz, Arnoud; Witteman, Cilia L M; Bekker, Marrie H J

2015-10-01

This study investigated sex bias in the classification of borderline and narcissistic personality disorders. A sample of psychologists in training for a post-master degree (N = 180) read brief case histories (male or female version) and made DSM classification. To differentiate sex bias due to sex stereotyping or to base rate variation, we used different case histories, respectively: (1) non-ambiguous case histories with enough criteria of either borderline or narcissistic personality disorder to meet the threshold for classification, and (2) an ambiguous case with subthreshold features of both borderline and narcissistic personality disorder. Results showed significant differences due to sex of the patient in the ambiguous condition. Thus, when the diagnosis is not straightforward, as in the case of mixed subthreshold features, sex bias is present and is influenced by base-rate variation. These findings emphasize the need for caution in classifying personality disorders, especially borderline or narcissistic traits.
36 CFR 1256.70 - What controls access to national security-classified information?

Science.gov (United States)

2010-07-01

... national security-classified information? 1256.70 Section 1256.70 Parks, Forests, and Public Property... HISTORICAL MATERIALS Access to Materials Containing National Security-Classified Information § 1256.70 What controls access to national security-classified information? (a) The declassification of and public access...
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification.

Directory of Open Access Journals (Sweden)

Cuihong Wen

Full Text Available Optical Music Recognition (OMR has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM. The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM, which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs and Neural Networks (NNs.
Facial Expression Recognition from Video Sequences Based on Spatial-Temporal Motion Local Binary Pattern and Gabor Multiorientation Fusion Histogram

Directory of Open Access Journals (Sweden)

Lei Zhao

2017-01-01

Full Text Available This paper proposes novel framework for facial expressions analysis using dynamic and static information in video sequences. First, based on incremental formulation, discriminative deformable face alignment method is adapted to locate facial points to correct in-plane head rotation and break up facial region from background. Then, spatial-temporal motion local binary pattern (LBP feature is extracted and integrated with Gabor multiorientation fusion histogram to give descriptors, which reflect static and dynamic texture information of facial expressions. Finally, a one-versus-one strategy based multiclass support vector machine (SVM classifier is applied to classify facial expressions. Experiments on Cohn-Kanade (CK + facial expression dataset illustrate that integrated framework outperforms methods using single descriptors. Compared with other state-of-the-art methods on CK+, MMI, and Oulu-CASIA VIS datasets, our proposed framework performs better.
Recovery of thermophilic Campylobacter by three sampling methods from classified river sites in Northeast Georgia, USA

Science.gov (United States)

It is not clear how best to sample streams for the detection of Campylobacter which may be introduced from agricultural or community land use. Fifteen sites in the watershed of the South Fork of the Broad River (SFBR) in Northeastern Georgia, USA, were sampled in three seasons. Seven sites were cl...
Aplikasi E-Tour Guide dengan Fitur Pengenalan Image Menggunakan Metode Haar Classifier

Directory of Open Access Journals (Sweden)

Derwin Suhartono

2013-12-01

Full Text Available Smartphone has became an important instrument in modern society as it is used for entertainment and information searching except for communication. Concerning to this condition, it is needed to develop an application in order to improve smart phone functionality. The objective of this research is to create an application named E-Tour Guide as a tool for helping to plan and manage tourism activity equipped with image recognition feature. Image recognition method used is the Haar Classifier method. The feature is used to recognize historical objects. From the testing result done to 20 images sample, 85% accuracy is achieved for the image recognition feature.
Cascaded discrimination of normal, abnormal, and confounder classes in histopathology: Gleason grading of prostate cancer

Directory of Open Access Journals (Sweden)

Doyle Scott

2012-10-01

Full Text Available Abstract Background Automated classification of histopathology involves identification of multiple classes, including benign, cancerous, and confounder categories. The confounder tissue classes can often mimic and share attributes with both the diseased and normal tissue classes, and can be particularly difficult to identify, both manually and by automated classifiers. In the case of prostate cancer, they may be several confounding tissue types present in a biopsy sample, posing as major sources of diagnostic error for pathologists. Two common multi-class approaches are one-shot classification (OSC, where all classes are identified simultaneously, and one-versus-all (OVA, where a “target” class is distinguished from all “non-target” classes. OSC is typically unable to handle discrimination of classes of varying similarity (e.g. with images of prostate atrophy and high grade cancer, while OVA forces several heterogeneous classes into a single “non-target” class. In this work, we present a cascaded (CAS approach to classifying prostate biopsy tissue samples, where images from different classes are grouped to maximize intra-group homogeneity while maximizing inter-group heterogeneity. Results We apply the CAS approach to categorize 2000 tissue samples taken from 214 patient studies into seven classes: epithelium, stroma, atrophy, prostatic intraepithelial neoplasia (PIN, and prostate cancer Gleason grades 3, 4, and 5. A series of increasingly granular binary classifiers are used to split the different tissue classes until the images have been categorized into a single unique class. Our automatically-extracted image feature set includes architectural features based on location of the nuclei within the tissue sample as well as texture features extracted on a per-pixel level. The CAS strategy yields a positive predictive value (PPV of 0.86 in classifying the 2000 tissue images into one of 7 classes, compared with the OVA (0.77 PPV and OSC
A support vector machine classifier reduces interscanner variation in the HRCT classification of regional disease pattern in diffuse lung disease: Comparison to a Bayesian classifier

Energy Technology Data Exchange (ETDEWEB)

Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug; Seo, Joon Beom [Department of Radiology, University of Ulsan College of Medicine, 388-1 Pungnap2-dong, Songpa-gu, Seoul 138-736 (Korea, Republic of); Lynch, David A. [Department of Radiology, National Jewish Medical and Research Center, Denver, Colorado 80206 (United States)

2013-05-15

Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 Multiplication-Sign 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs-normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessed using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For
A support vector machine classifier reduces interscanner variation in the HRCT classification of regional disease pattern in diffuse lung disease: Comparison to a Bayesian classifier

International Nuclear Information System (INIS)

Chang, Yongjun; Lim, Jonghyuck; Kim, Namkug; Seo, Joon Beom; Lynch, David A.

2013-01-01

Purpose: To investigate the effect of using different computed tomography (CT) scanners on the accuracy of high-resolution CT (HRCT) images in classifying regional disease patterns in patients with diffuse lung disease, support vector machine (SVM) and Bayesian classifiers were applied to multicenter data. Methods: Two experienced radiologists marked sets of 600 rectangular 20 × 20 pixel regions of interest (ROIs) on HRCT images obtained from two scanners (GE and Siemens), including 100 ROIs for each of local patterns of lungs—normal lung and five of regional pulmonary disease patterns (ground-glass opacity, reticular opacity, honeycombing, emphysema, and consolidation). Each ROI was assessed using 22 quantitative features belonging to one of the following descriptors: histogram, gradient, run-length, gray level co-occurrence matrix, low-attenuation area cluster, and top-hat transform. For automatic classification, a Bayesian classifier and a SVM classifier were compared under three different conditions. First, classification accuracies were estimated using data from each scanner. Next, data from the GE and Siemens scanners were used for training and testing, respectively, and vice versa. Finally, all ROI data were integrated regardless of the scanner type and were then trained and tested together. All experiments were performed based on forward feature selection and fivefold cross-validation with 20 repetitions. Results: For each scanner, better classification accuracies were achieved with the SVM classifier than the Bayesian classifier (92% and 82%, respectively, for the GE scanner; and 92% and 86%, respectively, for the Siemens scanner). The classification accuracies were 82%/72% for training with GE data and testing with Siemens data, and 79%/72% for the reverse. The use of training and test data obtained from the HRCT images of different scanners lowered the classification accuracy compared to the use of HRCT images from the same scanner. For integrated ROI
Stepwise classification of cancer samples using clinical and molecular data

Directory of Open Access Journals (Sweden)

Obulkasim Askar

2011-10-01

Full Text Available Abstract Background Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost inefficient. Results We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples. Conclusions Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website.
Silicon nanowire arrays as learning chemical vapour classifiers

International Nuclear Information System (INIS)

Niskanen, A O; Colli, A; White, R; Li, H W; Spigone, E; Kivioja, J M

2011-01-01

Nanowire field-effect transistors are a promising class of devices for various sensing applications. Apart from detecting individual chemical or biological analytes, it is especially interesting to use multiple selective sensors to look at their collective response in order to perform classification into predetermined categories. We show that non-functionalised silicon nanowire arrays can be used to robustly classify different chemical vapours using simple statistical machine learning methods. We were able to distinguish between acetone, ethanol and water with 100% accuracy while methanol, ethanol and 2-propanol were classified with 96% accuracy in ambient conditions.
48 CFR 8.608 - Protection of classified and sensitive information.

Science.gov (United States)

2010-10-01

... Prison Industries, Inc. 8.608 Protection of classified and sensitive information. Agencies shall not enter into any contract with FPI that allows an inmate worker access to any— (a) Classified data; (b) Geographic data regarding the location of— (1) Surface and subsurface infrastructure providing communications...

Classified Component Disposal at the Nevada National Security Site (NNSS) - 13454

Energy Technology Data Exchange (ETDEWEB)

Poling, Jeanne; Arnold, Pat [National Security Technologies, LLC (NSTec), P.O. Box 98521, Las Vegas, NV 89193-8521 (United States); Saad, Max [Sandia National Laboratories, P.O. Box 5800, Albuquerque, NM 87185 (United States); DiSanza, Frank [E. Frank DiSanza Consulting, 2250 Alanhurst Drive, Henderson, NV 89052 (United States); Cabble, Kevin [U.S. Department of Energy, National Nuclear Security Administration Nevada Site Office, P.O. Box 98518, Las Vegas, NV 89193-8518 (United States)

2013-07-01

The Nevada National Security Site (NNSS) has added the capability needed for the safe, secure disposal of non-nuclear classified components that have been declared excess to national security requirements. The NNSS has worked with U.S. Department of Energy, National Nuclear Security Administration senior leadership to gain formal approval for permanent burial of classified matter at the NNSS in the Area 5 Radioactive Waste Management Complex owned by the U.S. Department of Energy. Additionally, by working with state regulators, the NNSS added the capability to dispose non-radioactive hazardous and non-hazardous classified components. The NNSS successfully piloted the new disposal pathway with the receipt of classified materials from the Kansas City Plant in March 2012. (authors)
Classified Component Disposal at the Nevada National Security Site (NNSS) - 13454

International Nuclear Information System (INIS)

Poling, Jeanne; Arnold, Pat; Saad, Max; DiSanza, Frank; Cabble, Kevin

2013-01-01

The Nevada National Security Site (NNSS) has added the capability needed for the safe, secure disposal of non-nuclear classified components that have been declared excess to national security requirements. The NNSS has worked with U.S. Department of Energy, National Nuclear Security Administration senior leadership to gain formal approval for permanent burial of classified matter at the NNSS in the Area 5 Radioactive Waste Management Complex owned by the U.S. Department of Energy. Additionally, by working with state regulators, the NNSS added the capability to dispose non-radioactive hazardous and non-hazardous classified components. The NNSS successfully piloted the new disposal pathway with the receipt of classified materials from the Kansas City Plant in March 2012. (authors)
Using passive cavitation images to classify high-intensity focused ultrasound lesions.

Science.gov (United States)

Haworth, Kevin J; Salgaonkar, Vasant A; Corregan, Nicholas M; Holland, Christy K; Mast, T Douglas

2015-09-01

Passive cavitation imaging provides spatially resolved monitoring of cavitation emissions. However, the diffraction limit of a linear imaging array results in relatively poor range resolution. Poor range resolution has limited prior analyses of the spatial specificity and sensitivity of passive cavitation imaging in predicting thermal lesion formation. In this study, this limitation is overcome by orienting a linear array orthogonal to the high-intensity focused ultrasound propagation direction and performing passive imaging. Fourteen lesions were formed in ex vivo bovine liver samples as a result of 1.1-MHz continuous-wave ultrasound exposure. The lesions were classified as focal, "tadpole" or pre-focal based on their shape and location. Passive cavitation images were beamformed from emissions at the fundamental, harmonic, ultraharmonic and inharmonic frequencies with an established algorithm. Using the area under a receiver operating characteristic curve (AUROC), fundamental, harmonic and ultraharmonic emissions were found to be significant predictors of lesion formation for all lesion types. For both harmonic and ultraharmonic emissions, pre-focal lesions were classified most successfully (AUROC values of 0.87 and 0.88, respectively), followed by tadpole lesions (AUROC values of 0.77 and 0.64, respectively) and focal lesions (AUROC values of 0.65 and 0.60, respectively). Copyright © 2015 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
An Active Learning Classifier for Further Reducing Diabetic Retinopathy Screening System Cost

Directory of Open Access Journals (Sweden)

Yinan Zhang

2016-01-01

Full Text Available Diabetic retinopathy (DR screening system raises a financial problem. For further reducing DR screening cost, an active learning classifier is proposed in this paper. Our approach identifies retinal images based on features extracted by anatomical part recognition and lesion detection algorithms. Kernel extreme learning machine (KELM is a rapid classifier for solving classification problems in high dimensional space. Both active learning and ensemble technique elevate performance of KELM when using small training dataset. The committee only proposes necessary manual work to doctor for saving cost. On the publicly available Messidor database, our classifier is trained with 20%–35% of labeled retinal images and comparative classifiers are trained with 80% of labeled retinal images. Results show that our classifier can achieve better classification accuracy than Classification and Regression Tree, radial basis function SVM, Multilayer Perceptron SVM, Linear SVM, and K Nearest Neighbor. Empirical experiments suggest that our active learning classifier is efficient for further reducing DR screening cost.
Generalization in the XCSF classifier system: analysis, improvement, and extension.

Science.gov (United States)

Lanzi, Pier Luca; Loiacono, Daniele; Wilson, Stewart W; Goldberg, David E

2007-01-01

We analyze generalization in XCSF and introduce three improvements. We begin by showing that the types of generalizations evolved by XCSF can be influenced by the input range. To explain these results we present a theoretical analysis of the convergence of classifier weights in XCSF which highlights a broader issue. In XCSF, because of the mathematical properties of the Widrow-Hoff update, the convergence of classifier weights in a given subspace can be slow when the spread of the eigenvalues of the autocorrelation matrix associated with each classifier is large. As a major consequence, the system's accuracy pressure may act before classifier weights are adequately updated, so that XCSF may evolve piecewise constant approximations, instead of the intended, and more efficient, piecewise linear ones. We propose three different ways to update classifier weights in XCSF so as to increase the generalization capabilities of XCSF: one based on a condition-based normalization of the inputs, one based on linear least squares, and one based on the recursive version of linear least squares. Through a series of experiments we show that while all three approaches significantly improve XCSF, least squares approaches appear to be best performing and most robust. Finally we show how XCSF can be extended to include polynomial approximations.
Statistical and Machine-Learning Classifier Framework to Improve Pulse Shape Discrimination System Design

Energy Technology Data Exchange (ETDEWEB)

Wurtz, R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Kaplan, A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2015-10-28

Pulse shape discrimination (PSD) is a variety of statistical classifier. Fully-realized statistical classifiers rely on a comprehensive set of tools for designing, building, and implementing. PSD advances rely on improvements to the implemented algorithm. PSD advances can be improved by using conventional statistical classifier or machine learning methods. This paper provides the reader with a glossary of classifier-building elements and their functions in a fully-designed and operational classifier framework that can be used to discover opportunities for improving PSD classifier projects. This paper recommends reporting the PSD classifier’s receiver operating characteristic (ROC) curve and its behavior at a gamma rejection rate (GRR) relevant for realistic applications.
Classifier-ensemble incremental-learning procedure for nuclear transient identification at different operational conditions

Energy Technology Data Exchange (ETDEWEB)

Baraldi, Piero, E-mail: piero.baraldi@polimi.i [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Razavi-Far, Roozbeh [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Zio, Enrico [Dipartimento di Energia - Sezione Ingegneria Nucleare, Politecnico di Milano, via Ponzio 34/3, 20133 Milano (Italy); Ecole Centrale Paris-Supelec, Paris (France)

2011-04-15

An important requirement for the practical implementation of empirical diagnostic systems is the capability of classifying transients in all plant operational conditions. The present paper proposes an approach based on an ensemble of classifiers for incrementally learning transients under different operational conditions. New classifiers are added to the ensemble where transients occurring in new operational conditions are not satisfactorily classified. The construction of the ensemble is made by bagging; the base classifier is a supervised Fuzzy C Means (FCM) classifier whose outcomes are combined by majority voting. The incremental learning procedure is applied to the identification of simulated transients in the feedwater system of a Boiling Water Reactor (BWR) under different reactor power levels.
Classifier-ensemble incremental-learning procedure for nuclear transient identification at different operational conditions

International Nuclear Information System (INIS)

Baraldi, Piero; Razavi-Far, Roozbeh; Zio, Enrico

2011-01-01

An important requirement for the practical implementation of empirical diagnostic systems is the capability of classifying transients in all plant operational conditions. The present paper proposes an approach based on an ensemble of classifiers for incrementally learning transients under different operational conditions. New classifiers are added to the ensemble where transients occurring in new operational conditions are not satisfactorily classified. The construction of the ensemble is made by bagging; the base classifier is a supervised Fuzzy C Means (FCM) classifier whose outcomes are combined by majority voting. The incremental learning procedure is applied to the identification of simulated transients in the feedwater system of a Boiling Water Reactor (BWR) under different reactor power levels.
THE QuEChERS ANALYTICAL METHOD COMBINED WITH LOW ...

African Journals Online (AJOL)

The method has also been applied to different cereal samples and satisfactory average recoveries ... Analysis of multiclass pesticide residues in foods is a challenging task because of the ... compounds set by regulatory bodies. ..... analytes were used to evaluate the influences of the selected factors on performance of the.
Ultrasonic Sensor Signals and Optimum Path Forest Classifier for the Microstructural Characterization of Thermally-Aged Inconel 625 Alloy

Directory of Open Access Journals (Sweden)

Victor Hugo C. de Albuquerque

2015-05-01

Full Text Available Secondary phases, such as laves and carbides, are formed during the final solidification stages of nickel-based superalloy coatings deposited during the gas tungsten arc welding cold wire process. However, when aged at high temperatures, other phases can precipitate in the microstructure, like the γ'' and δ phases. This work presents an evaluation of the powerful optimum path forest (OPF classifier configured with six distance functions to classify background echo and backscattered ultrasonic signals from samples of the inconel 625 superalloy thermally aged at 650 and 950 \\(^\\circ\\C for 10, 100 and 200 h. The background echo and backscattered ultrasonic signals were acquired using transducers with frequencies of 4 and 5 MHz. The potentiality of ultrasonic sensor signals combined with the OPF to characterize the microstructures of an inconel 625 thermally aged and in the as-welded condition were confirmed by the results. The experimental results revealed that the OPF classifier is sufficiently fast (classification total time of 0.316 ms and accurate (accuracy of 88.75% and harmonic mean of 89.52 for the application proposed.
Ultrasonic sensor signals and optimum path forest classifier for the microstructural characterization of thermally-aged inconel 625 alloy.

Science.gov (United States)

de Albuquerque, Victor Hugo C; Barbosa, Cleisson V; Silva, Cleiton C; Moura, Elineudo P; Filho, Pedro P Rebouças; Papa, João P; Tavares, João Manuel R S

2015-05-27

Secondary phases, such as laves and carbides, are formed during the final solidification stages of nickel-based superalloy coatings deposited during the gas tungsten arc welding cold wire process. However, when aged at high temperatures, other phases can precipitate in the microstructure, like the γ'' and δ phases. This work presents an evaluation of the powerful optimum path forest (OPF) classifier configured with six distance functions to classify background echo and backscattered ultrasonic signals from samples of the inconel 625 superalloy thermally aged at 650 and 950 °C for 10, 100 and 200 h. The background echo and backscattered ultrasonic signals were acquired using transducers with frequencies of 4 and 5 MHz. The potentiality of ultrasonic sensor signals combined with the OPF to characterize the microstructures of an inconel 625 thermally aged and in the as-welded condition were confirmed by the results. The experimental results revealed that the OPF classifier is sufficiently fast (classification total time of 0.316 ms) and accurate (accuracy of 88.75%" and harmonic mean of 89.52) for the application proposed.
Iceberg Semantics For Count Nouns And Mass Nouns: Classifiers, measures and portions

Directory of Open Access Journals (Sweden)

Fred Landman

2016-12-01

It is the analysis of complex NPs and their mass-count properties that is the focus of the second part of this paper. There I develop an analysis of English and Dutch pseudo- partitives, in particular, measure phrases like three liters of wine and classifier phrases like three glasses of wine. We will study measure interpretations and classifier interpretations of measures and classifiers, and different types of classifier interpretations: container interpretations, contents interpretations, and - indeed - portion interpretations. Rothstein 2011 argues that classifier interpretations (including portion interpretations of pseudo partitives pattern with count nouns, but that measure interpretations pattern with mass nouns. I will show that this distinction follows from the very basic architecture of Iceberg semantics.
18 CFR 3a.12 - Authority to classify official information.

Science.gov (United States)

2010-04-01

... efficient administration. (b) The authority to classify information or material originally as Top Secret is... classify information or material originally as Secret is exercised only by: (1) Officials who have Top... information or material originally as Confidential is exercised by officials who have Top Secret or Secret...
Human Activity Recognition by Combining a Small Number of Classifiers.

Science.gov (United States)

Nazabal, Alfredo; Garcia-Moreno, Pablo; Artes-Rodriguez, Antonio; Ghahramani, Zoubin

2016-09-01

We consider the problem of daily human activity recognition (HAR) using multiple wireless inertial sensors, and specifically, HAR systems with a very low number of sensors, each one providing an estimation of the performed activities. We propose new Bayesian models to combine the output of the sensors. The models are based on a soft outputs combination of individual classifiers to deal with the small number of sensors. We also incorporate the dynamic nature of human activities as a first-order homogeneous Markov chain. We develop both inductive and transductive inference methods for each model to be employed in supervised and semisupervised situations, respectively. Using different real HAR databases, we compare our classifiers combination models against a single classifier that employs all the signals from the sensors. Our models exhibit consistently a reduction of the error rate and an increase of robustness against sensor failures. Our models also outperform other classifiers combination models that do not consider soft outputs and an Markovian structure of the human activities.
Classifying objects in LWIR imagery via CNNs

Science.gov (United States)

Rodger, Iain; Connor, Barry; Robertson, Neil M.

2016-10-01

The aim of the presented work is to demonstrate enhanced target recognition and improved false alarm rates for a mid to long range detection system, utilising a Long Wave Infrared (LWIR) sensor. By exploiting high quality thermal image data and recent techniques in machine learning, the system can provide automatic target recognition capabilities. A Convolutional Neural Network (CNN) is trained and the classifier achieves an overall accuracy of > 95% for 6 object classes related to land defence. While the highly accurate CNN struggles to recognise long range target classes, due to low signal quality, robust target discrimination is achieved for challenging candidates. The overall performance of the methodology presented is assessed using human ground truth information, generating classifier evaluation metrics for thermal image sequences.
New neural network classifier of fall-risk based on the Mahalanobis distance and kinematic parameters assessed by a wearable device

International Nuclear Information System (INIS)

Giansanti, Daniele; Macellari, Velio; Maccioni, Giovanni

2008-01-01

Fall prevention lacks easy, quantitative and wearable methods for the classification of fall-risk (FR). Efforts must be thus devoted to the choice of an ad hoc classifier both to reduce the size of the sample used to train the classifier and to improve performances. A new methodology that uses a neural network (NN) and a wearable device are hereby proposed for this purpose. The NN uses kinematic parameters assessed by a wearable device with accelerometers and rate gyroscopes during a posturography protocol. The training of the NN was based on the Mahalanobis distance and was carried out on two groups of 30 elderly subjects with varying fall-risk Tinetti scores. The validation was done on two groups of 100 subjects with different fall-risk Tinetti scores and showed that, both in terms of specificity and sensitivity, the NN performed better than other classifiers (naive Bayes, Bayes net, multilayer perceptron, support vector machines, statistical classifiers). In particular, (i) the proposed NN methodology improved the specificity and sensitivity by a mean of 3% when compared to the statistical classifier based on the Mahalanobis distance (SCMD) described in Giansanti (2006 Physiol. Meas. 27 1081–90); (ii) the assessed specificity was 97%, the assessed sensitivity was 98% and the area under receiver operator characteristics was 0.965. (note)
Sample size determination for disease prevalence studies with partially validated data.

Science.gov (United States)

Qiu, Shi-Fang; Poon, Wai-Yin; Tang, Man-Lai

2016-02-01

Disease prevalence is an important topic in medical research, and its study is based on data that are obtained by classifying subjects according to whether a disease has been contracted. Classification can be conducted with high-cost gold standard tests or low-cost screening tests, but the latter are subject to the misclassification of subjects. As a compromise between the two, many research studies use partially validated datasets in which all data points are classified by fallible tests, and some of the data points are validated in the sense that they are also classified by the completely accurate gold-standard test. In this article, we investigate the determination of sample sizes for disease prevalence studies with partially validated data. We use two approaches. The first is to find sample sizes that can achieve a pre-specified power of a statistical test at a chosen significance level, and the second is to find sample sizes that can control the width of a confidence interval with a pre-specified confidence level. Empirical studies have been conducted to demonstrate the performance of various testing procedures with the proposed sample sizes. The applicability of the proposed methods are illustrated by a real-data example. © The Author(s) 2012.
Classifying web pages with visual features

NARCIS (Netherlands)

de Boer, V.; van Someren, M.; Lupascu, T.; Filipe, J.; Cordeiro, J.

2010-01-01

To automatically classify and process web pages, current systems use the textual content of those pages, including both the displayed content and the underlying (HTML) code. However, a very important feature of a web page is its visual appearance. In this paper, we show that using generic visual
Dynamic integration of classifiers in the space of principal components

NARCIS (Netherlands)

Tsymbal, A.; Pechenizkiy, M.; Puuronen, S.; Patterson, D.W.; Kalinichenko, L.A.; Manthey, R.; Thalheim, B.; Wloka, U.

2003-01-01

Recent research has shown the integration of multiple classifiers to be one of the most important directions in machine learning and data mining. It was shown that, for an ensemble to be successful, it should consist of accurate and diverse base classifiers. However, it is also important that the
Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.

Science.gov (United States)

Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad

2014-11-01

Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.

An expert computer program for classifying stars on the MK spectral classification system

International Nuclear Information System (INIS)

Gray, R. O.; Corbally, C. J.

2014-01-01

This paper describes an expert computer program (MKCLASS) designed to classify stellar spectra on the MK Spectral Classification system in a way similar to humans—by direct comparison with the MK classification standards. Like an expert human classifier, the program first comes up with a rough spectral type, and then refines that spectral type by direct comparison with MK standards drawn from a standards library. A number of spectral peculiarities, including barium stars, Ap and Am stars, λ Bootis stars, carbon-rich giants, etc., can be detected and classified by the program. The program also evaluates the quality of the delivered spectral type. The program currently is capable of classifying spectra in the violet-green region in either the rectified or flux-calibrated format, although the accuracy of the flux calibration is not important. We report on tests of MKCLASS on spectra classified by human classifiers; those tests suggest that over the entire HR diagram, MKCLASS will classify in the temperature dimension with a precision of 0.6 spectral subclass, and in the luminosity dimension with a precision of about one half of a luminosity class. These results compare well with human classifiers.
An expert computer program for classifying stars on the MK spectral classification system

Energy Technology Data Exchange (ETDEWEB)

Gray, R. O. [Department of Physics and Astronomy, Appalachian State University, Boone, NC 26808 (United States); Corbally, C. J. [Vatican Observatory Research Group, Tucson, AZ 85721-0065 (United States)

2014-04-01

This paper describes an expert computer program (MKCLASS) designed to classify stellar spectra on the MK Spectral Classification system in a way similar to humans—by direct comparison with the MK classification standards. Like an expert human classifier, the program first comes up with a rough spectral type, and then refines that spectral type by direct comparison with MK standards drawn from a standards library. A number of spectral peculiarities, including barium stars, Ap and Am stars, λ Bootis stars, carbon-rich giants, etc., can be detected and classified by the program. The program also evaluates the quality of the delivered spectral type. The program currently is capable of classifying spectra in the violet-green region in either the rectified or flux-calibrated format, although the accuracy of the flux calibration is not important. We report on tests of MKCLASS on spectra classified by human classifiers; those tests suggest that over the entire HR diagram, MKCLASS will classify in the temperature dimension with a precision of 0.6 spectral subclass, and in the luminosity dimension with a precision of about one half of a luminosity class. These results compare well with human classifiers.
Ship localization in Santa Barbara Channel using machine learning classifiers.

Science.gov (United States)

Niu, Haiqiang; Ozanich, Emma; Gerstoft, Peter

2017-11-01

Machine learning classifiers are shown to outperform conventional matched field processing for a deep water (600 m depth) ocean acoustic-based ship range estimation problem in the Santa Barbara Channel Experiment when limited environmental information is known. Recordings of three different ships of opportunity on a vertical array were used as training and test data for the feed-forward neural network and support vector machine classifiers, demonstrating the feasibility of machine learning methods to locate unseen sources. The classifiers perform well up to 10 km range whereas the conventional matched field processing fails at about 4 km range without accurate environmental information.
Influence of Sampling Practices on the Appearance of DNA Image Histograms of Prostate Cells in FNAB Samples

Directory of Open Access Journals (Sweden)

Abdelbaset Buhmeida

1999-01-01

Full Text Available Twenty‐one fine needle aspiration biopsies (FNAB of the prostate, diagnostically classified as definitely malignant, were studied. The Papanicolaou or H&E stained samples were destained and then stained for DNA with the Feulgen reaction. DNA cytometry was applied after different sampling rules. The histograms varied according to the sampling rule applied. Because free cells between cell groups were easier to measure than cells in the cell groups, two sampling rules were tested in all samples: (i cells in the cell groups were measured, and (ii free cells between cell groups were measured. Abnormal histograms were more common after the sampling rule based on free cells, suggesting that abnormal patterns are best revealed through the free cells in these samples. The conclusions were independent of the applied histogram interpretation method.
A Consistent System for Coding Laboratory Samples

Science.gov (United States)

Sih, John C.

1996-07-01

A formal laboratory coding system is presented to keep track of laboratory samples. Preliminary useful information regarding the sample (origin and history) is gained without consulting a research notebook. Since this system uses and retains the same research notebook page number for each new experiment (reaction), finding and distinguishing products (samples) of the same or different reactions becomes an easy task. Using this system multiple products generated from a single reaction can be identified and classified in a uniform fashion. Samples can be stored and filed according to stage and degree of purification, e.g. crude reaction mixtures, recrystallized samples, chromatographed or distilled products.
Deconstructing Cross-Entropy for Probabilistic Binary Classifiers

Directory of Open Access Journals (Sweden)

Daniel Ramos

2018-03-01

Full Text Available In this work, we analyze the cross-entropy function, widely used in classifiers both as a performance measure and as an optimization objective. We contextualize cross-entropy in the light of Bayesian decision theory, the formal probabilistic framework for making decisions, and we thoroughly analyze its motivation, meaning and interpretation from an information-theoretical point of view. In this sense, this article presents several contributions: First, we explicitly analyze the contribution to cross-entropy of (i prior knowledge; and (ii the value of the features in the form of a likelihood ratio. Second, we introduce a decomposition of cross-entropy into two components: discrimination and calibration. This decomposition enables the measurement of different performance aspects of a classifier in a more precise way; and justifies previously reported strategies to obtain reliable probabilities by means of the calibration of the output of a discriminating classifier. Third, we give different information-theoretical interpretations of cross-entropy, which can be useful in different application scenarios, and which are related to the concept of reference probabilities. Fourth, we present an analysis tool, the Empirical Cross-Entropy (ECE plot, a compact representation of cross-entropy and its aforementioned decomposition. We show the power of ECE plots, as compared to other classical performance representations, in two diverse experimental examples: a speaker verification system, and a forensic case where some glass findings are present.
A Novel Semi-Supervised Electronic Nose Learning Technique: M-Training

Directory of Open Access Journals (Sweden)

Pengfei Jia

2016-03-01

Full Text Available When an electronic nose (E-nose is used to distinguish different kinds of gases, the label information of the target gas could be lost due to some fault of the operators or some other reason, although this is not expected. Another fact is that the cost of getting the labeled samples is usually higher than for unlabeled ones. In most cases, the classification accuracy of an E-nose trained using labeled samples is higher than that of the E-nose trained by unlabeled ones, so gases without label information should not be used to train an E-nose, however, this wastes resources and can even delay the progress of research. In this work a novel multi-class semi-supervised learning technique called M-training is proposed to train E-noses with both labeled and unlabeled samples. We employ M-training to train the E-nose which is used to distinguish three indoor pollutant gases (benzene, toluene and formaldehyde. Data processing results prove that the classification accuracy of E-nose trained by semi-supervised techniques (tri-training and M-training is higher than that of an E-nose trained only with labeled samples, and the performance of M-training is better than that of tri-training because more base classifiers can be employed by M-training.
LOCALIZATION AND RECOGNITION OF DYNAMIC HAND GESTURES BASED ON HIERARCHY OF MANIFOLD CLASSIFIERS

OpenAIRE

M. Favorskaya; A. Nosov; A. Popov

2015-01-01

Generally, the dynamic hand gestures are captured in continuous video sequences, and a gesture recognition system ought to extract the robust features automatically. This task involves the highly challenging spatio-temporal variations of dynamic hand gestures. The proposed method is based on two-level manifold classifiers including the trajectory classifiers in any time instants and the posture classifiers of sub-gestures in selected time instants. The trajectory classifiers contain skin dete...
COMPARISON OF SVM AND FUZZY CLASSIFIER FOR AN INDIAN SCRIPT

Directory of Open Access Journals (Sweden)

M. J. Baheti

2012-01-01

Full Text Available With the advent of technological era, conversion of scanned document (handwritten or printed into machine editable format has attracted many researchers. This paper deals with the problem of recognition of Gujarati handwritten numerals. Gujarati numeral recognition requires performing some specific steps as a part of preprocessing. For preprocessing digitization, segmentation, normalization and thinning are done with considering that the image have almost no noise. Further affine invariant moments based model is used for feature extraction and finally Support Vector Machine (SVM and Fuzzy classifiers are used for numeral classification. . The comparison of SVM and Fuzzy classifier is made and it can be seen that SVM procured better results as compared to Fuzzy Classifier.
A naïve Bayes classifier for planning transfusion requirements in heart surgery.

Science.gov (United States)

Cevenini, Gabriele; Barbini, Emanuela; Massai, Maria R; Barbini, Paolo

2013-02-01

Transfusion of allogeneic blood products is a key issue in cardiac surgery. Although blood conservation and standard transfusion guidelines have been published by different medical groups, actual transfusion practices after cardiac surgery vary widely among institutions. Models can be a useful support for decision making and may reduce the total cost of care. The objective of this study was to propose and evaluate a procedure to develop a simple locally customized decision-support system. We analysed 3182 consecutive patients undergoing cardiac surgery at the University Hospital of Siena, Italy. Univariate statistical tests were performed to identify a set of preoperative and intraoperative variables as likely independent features for planning transfusion quantities. These features were utilized to design a naïve Bayes classifier. Model performance was evaluated using the leave-one-out cross-validation approach. All computations were done using spss and matlab code. The overall correct classification percentage was not particularly high if several classes of patients were to be identified. Model performance improved appreciably when the patient sample was divided into two classes (transfused and non-transfused patients). In this case the naïve Bayes model correctly classified about three quarters of patients with 71.2% sensitivity and 78.4% specificity, thus providing useful information for recognizing patients with transfusion requirements in the specific scenario considered. Although the classifier is customized to a particular setting and cannot be generalized to other scenarios, the simplicity of its development and the results obtained make it a promising approach for designing a simple model for different heart surgery centres needing a customized decision-support system for planning transfusion requirements in intensive care unit. © 2011 Blackwell Publishing Ltd.
Classifying smoking urges via machine learning.

Science.gov (United States)

Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin

2016-12-01

Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights
Learning Bayesian network classifiers for credit scoring using Markov Chain Monte Carlo search

NARCIS (Netherlands)

Baesens, B.; Egmont-Petersen, M.; Castelo, R.; Vanthienen, J.

2001-01-01

In this paper, we will evaluate the power and usefulness of Bayesian network classifiers for credit scoring. Various types of Bayesian network classifiers will be evaluated and contrasted including unrestricted Bayesian network classifiers learnt using Markov Chain Monte Carlo (MCMC) search.
An IoT System for Remote Monitoring of Patients at Home

Directory of Open Access Journals (Sweden)

KeeHyun Park

2017-03-01

Full Text Available Application areas that utilize the concept of IoT can be broadened to healthcare or remote monitoring areas. In this paper, a remote monitoring system for patients at home in IoT environments is proposed, constructed, and evaluated through several experiments. To make it operable in IoT environments, a protocol conversion scheme between ISO/IEEE 11073 protocol and oneM2M protocol, and a Multiclass Q-learning scheduling algorithm based on the urgency of biomedical data delivery to medical staff are proposed. In addition, for the sake of patients’ privacy, two security schemes are proposed—the separate storage scheme of data in parts and the Buddy-ACK authorization scheme. The experiment on the constructed system showed that the system worked well and the Multiclass Q-learning scheduling algorithm performs better than the Multiclass Based Dynamic Priority scheduling algorithm. We also found that the throughputs of the Multiclass Q-learning scheduling algorithm increase almost linearly as the measurement time increases, whereas the throughputs of the Multiclass Based Dynamic Priority algorithm increase with decreases in the increasing ratio.
Snoring classified: The Munich-Passau Snore Sound Corpus.

Science.gov (United States)

Janott, Christoph; Schmitt, Maximilian; Zhang, Yue; Qian, Kun; Pandit, Vedhas; Zhang, Zixing; Heiser, Clemens; Hohenhorst, Winfried; Herzog, Michael; Hemmert, Werner; Schuller, Björn

2018-03-01

Snoring can be excited in different locations within the upper airways during sleep. It was hypothesised that the excitation locations are correlated with distinct acoustic characteristics of the snoring noise. To verify this hypothesis, a database of snore sounds is developed, labelled with the location of sound excitation. Video and audio recordings taken during drug induced sleep endoscopy (DISE) examinations from three medical centres have been semi-automatically screened for snore events, which subsequently have been classified by ENT experts into four classes based on the VOTE classification. The resulting dataset containing 828 snore events from 219 subjects has been split into Train, Development, and Test sets. An SVM classifier has been trained using low level descriptors (LLDs) related to energy, spectral features, mel frequency cepstral coefficients (MFCC), formants, voicing, harmonic-to-noise ratio (HNR), spectral harmonicity, pitch, and microprosodic features. An unweighted average recall (UAR) of 55.8% could be achieved using the full set of LLDs including formants. Best performing subset is the MFCC-related set of LLDs. A strong difference in performance could be observed between the permutations of train, development, and test partition, which may be caused by the relatively low number of subjects included in the smaller classes of the strongly unbalanced data set. A database of snoring sounds is presented which are classified according to their sound excitation location based on objective criteria and verifiable video material. With the database, it could be demonstrated that machine classifiers can distinguish different excitation location of snoring sounds in the upper airway based on acoustic parameters. Copyright © 2018 Elsevier Ltd. All rights reserved.
General and Local: Averaged k-Dependence Bayesian Classifiers

Directory of Open Access Journals (Sweden)

Limin Wang

2015-06-01

Full Text Available The inference of a general Bayesian network has been shown to be an NP-hard problem, even for approximate solutions. Although k-dependence Bayesian (KDB classifier can construct at arbitrary points (values of k along the attribute dependence spectrum, it cannot identify the changes of interdependencies when attributes take different values. Local KDB, which learns in the framework of KDB, is proposed in this study to describe the local dependencies implicated in each test instance. Based on the analysis of functional dependencies, substitution-elimination resolution, a new type of semi-naive Bayesian operation, is proposed to substitute or eliminate generalization to achieve accurate estimation of conditional probability distribution while reducing computational complexity. The final classifier, averaged k-dependence Bayesian (AKDB classifiers, will average the output of KDB and local KDB. Experimental results on the repository of machine learning databases from the University of California Irvine (UCI showed that AKDB has significant advantages in zero-one loss and bias relative to naive Bayes (NB, tree augmented naive Bayes (TAN, Averaged one-dependence estimators (AODE, and KDB. Moreover, KDB and local KDB show mutually complementary characteristics with respect to variance.
Entropy based classifier for cross-domain opinion mining

Directory of Open Access Journals (Sweden)

Jyoti S. Deshmukh

2018-01-01

Full Text Available In recent years, the growth of social network has increased the interest of people in analyzing reviews and opinions for products before they buy them. Consequently, this has given rise to the domain adaptation as a prominent area of research in sentiment analysis. A classifier trained from one domain often gives poor results on data from another domain. Expression of sentiment is different in every domain. The labeling cost of each domain separately is very high as well as time consuming. Therefore, this study has proposed an approach that extracts and classifies opinion words from one domain called source domain and predicts opinion words of another domain called target domain using a semi-supervised approach, which combines modified maximum entropy and bipartite graph clustering. A comparison of opinion classification on reviews on four different product domains is presented. The results demonstrate that the proposed method performs relatively well in comparison to the other methods. Comparison of SentiWordNet of domain-specific and domain-independent words reveals that on an average 72.6% and 88.4% words, respectively, are correctly classified.
A Gene Expression Classifier of Node-Positive Colorectal Cancer

Directory of Open Access Journals (Sweden)

Paul F. Meeh

2009-10-01

Full Text Available We used digital long serial analysis of gene expression to discover gene expression differences between node-negative and node-positive colorectal tumors and developed a multigene classifier able to discriminate between these two tumor types. We prepared and sequenced long serial analysis of gene expression libraries from one node-negative and one node-positive colorectal tumor, sequenced to a depth of 26,060 unique tags, and identified 262 tags significantly differentially expressed between these two tumors (P < 2 x 10-6. We confirmed the tag-to-gene assignments and differential expression of 31 genes by quantitative real-time polymerase chain reaction, 12 of which were elevated in the node-positive tumor. We analyzed the expression levels of these 12 upregulated genes in a validation panel of 23 additional tumors and developed an optimized seven-gene logistic regression classifier. The classifier discriminated between node-negative and node-positive tumors with 86% sensitivity and 80% specificity. Receiver operating characteristic analysis of the classifier revealed an area under the curve of 0.86. Experimental manipulation of the function of one classification gene, Fibronectin, caused profound effects on invasion and migration of colorectal cancer cells in vitro. These results suggest that the development of node-positive colorectal cancer occurs in part through elevated epithelial FN1 expression and suggest novel strategies for the diagnosis and treatment of advanced disease.
Building an automated SOAP classifier for emergency department reports.

Science.gov (United States)

Mowery, Danielle; Wiebe, Janyce; Visweswaran, Shyam; Harkema, Henk; Chapman, Wendy W

2012-02-01

Information extraction applications that extract structured event and entity information from unstructured text can leverage knowledge of clinical report structure to improve performance. The Subjective, Objective, Assessment, Plan (SOAP) framework, used to structure progress notes to facilitate problem-specific, clinical decision making by physicians, is one example of a well-known, canonical structure in the medical domain. Although its applicability to structuring data is understood, its contribution to information extraction tasks has not yet been determined. The first step to evaluating the SOAP framework's usefulness for clinical information extraction is to apply the model to clinical narratives and develop an automated SOAP classifier that classifies sentences from clinical reports. In this quantitative study, we applied the SOAP framework to sentences from emergency department reports, and trained and evaluated SOAP classifiers built with various linguistic features. We found the SOAP framework can be applied manually to emergency department reports with high agreement (Cohen's kappa coefficients over 0.70). Using a variety of features, we found classifiers for each SOAP class can be created with moderate to outstanding performance with F(1) scores of 93.9 (subjective), 94.5 (objective), 75.7 (assessment), and 77.0 (plan). We look forward to expanding the framework and applying the SOAP classification to clinical information extraction tasks. Copyright © 2011. Published by Elsevier Inc.
Localizing genes to cerebellar layers by classifying ISH images.

Directory of Open Access Journals (Sweden)

Lior Kirsch

Full Text Available Gene expression controls how the brain develops and functions. Understanding control processes in the brain is particularly hard since they involve numerous types of neurons and glia, and very little is known about which genes are expressed in which cells and brain layers. Here we describe an approach to detect genes whose expression is primarily localized to a specific brain layer and apply it to the mouse cerebellum. We learn typical spatial patterns of expression from a few markers that are known to be localized to specific layers, and use these patterns to predict localization for new genes. We analyze images of in-situ hybridization (ISH experiments, which we represent using histograms of local binary patterns (LBP and train image classifiers and gene classifiers for four layers of the cerebellum: the Purkinje, granular, molecular and white matter layer. On held-out data, the layer classifiers achieve accuracy above 94% (AUC by representing each image at multiple scales and by combining multiple image scores into a single gene-level decision. When applied to the full mouse genome, the classifiers predict specific layer localization for hundreds of new genes in the Purkinje and granular layers. Many genes localized to the Purkinje layer are likely to be expressed in astrocytes, and many others are involved in lipid metabolism, possibly due to the unusual size of Purkinje cells.
Arabic Handwriting Recognition Using Neural Network Classifier

African Journals Online (AJOL)

pc

2018-03-05

Mar 5, 2018 ... an OCR using Neural Network classifier preceded by a set of preprocessing .... Artificial Neural Networks (ANNs), which we adopt in this research, consist of ... advantage and disadvantages of each technique. In [9],. Khemiri ...

Lung Nodule Image Classification Based on Local Difference Pattern and Combined Classifier.

Science.gov (United States)

Mao, Keming; Deng, Zhuofu

2016-01-01

This paper proposes a novel lung nodule classification method for low-dose CT images. The method includes two stages. First, Local Difference Pattern (LDP) is proposed to encode the feature representation, which is extracted by comparing intensity difference along circular regions centered at the lung nodule. Then, the single-center classifier is trained based on LDP. Due to the diversity of feature distribution for different class, the training images are further clustered into multiple cores and the multicenter classifier is constructed. The two classifiers are combined to make the final decision. Experimental results on public dataset show the superior performance of LDP and the combined classifier.
Lung Nodule Image Classification Based on Local Difference Pattern and Combined Classifier

Directory of Open Access Journals (Sweden)

Keming Mao

2016-01-01

Full Text Available This paper proposes a novel lung nodule classification method for low-dose CT images. The method includes two stages. First, Local Difference Pattern (LDP is proposed to encode the feature representation, which is extracted by comparing intensity difference along circular regions centered at the lung nodule. Then, the single-center classifier is trained based on LDP. Due to the diversity of feature distribution for different class, the training images are further clustered into multiple cores and the multicenter classifier is constructed. The two classifiers are combined to make the final decision. Experimental results on public dataset show the superior performance of LDP and the combined classifier.
Detection of microaneurysms in retinal images using an ensemble classifier

Directory of Open Access Journals (Sweden)

M.M. Habib

2017-01-01

Full Text Available This paper introduces, and reports on the performance of, a novel combination of algorithms for automated microaneurysm (MA detection in retinal images. The presence of MAs in retinal images is a pathognomonic sign of Diabetic Retinopathy (DR which is one of the leading causes of blindness amongst the working age population. An extensive survey of the literature is presented and current techniques in the field are summarised. The proposed technique first detects an initial set of candidates using a Gaussian Matched Filter and then classifies this set to reduce the number of false positives. A Tree Ensemble classifier is used with a set of 70 features (the most commons features in the literature. A new set of 32 MA groundtruth images (with a total of 256 labelled MAs based on images from the MESSIDOR dataset is introduced as a public dataset for benchmarking MA detection algorithms. We evaluate our algorithm on this dataset as well as another public dataset (DIARETDB1 v2.1 and compare it against the best available alternative. Results show that the proposed classifier is superior in terms of eliminating false positive MA detection from the initial set of candidates. The proposed method achieves an ROC score of 0.415 compared to 0.2636 achieved by the best available technique. Furthermore, results show that the classifier model maintains consistent performance across datasets, illustrating the generalisability of the classifier and that overfitting does not occur.
Dynamic cluster generation for a fuzzy classifier with ellipsoidal regions.

Science.gov (United States)

Abe, S

1998-01-01

In this paper, we discuss a fuzzy classifier with ellipsoidal regions that dynamically generates clusters. First, for the data belonging to a class we define a fuzzy rule with an ellipsoidal region. Namely, using the training data for each class, we calculate the center and the covariance matrix of the ellipsoidal region for the class. Then we tune the fuzzy rules, i.e., the slopes of the membership functions, successively until there is no improvement in the recognition rate of the training data. Then if the number of the data belonging to a class that are misclassified into another class exceeds a prescribed number, we define a new cluster to which those data belong and the associated fuzzy rule. Then we tune the newly defined fuzzy rules in the similar way as stated above, fixing the already obtained fuzzy rules. We iterate generation of clusters and tuning of the newly generated fuzzy rules until the number of the data belonging to a class that are misclassified into another class does not exceed the prescribed number. We evaluate our method using thyroid data, Japanese Hiragana data of vehicle license plates, and blood cell data. By dynamic cluster generation, the generalization ability of the classifier is improved and the recognition rate of the fuzzy classifier for the test data is the best among the neural network classifiers and other fuzzy classifiers if there are no discrete input variables.
A Bayesian Classifier for X-Ray Pulsars Recognition

Directory of Open Access Journals (Sweden)

Hao Liang

2016-01-01

Full Text Available Recognition for X-ray pulsars is important for the problem of spacecraft’s attitude determination by X-ray Pulsar Navigation (XPNAV. By using the nonhomogeneous Poisson model of the received photons and the minimum recognition error criterion, a classifier based on the Bayesian theorem is proposed. For X-ray pulsars recognition with unknown Doppler frequency and initial phase, the features of every X-ray pulsar are extracted and the unknown parameters are estimated using the Maximum Likelihood (ML method. Besides that, a method to recognize unknown X-ray pulsars or X-ray disturbances is proposed. Simulation results certificate the validity of the proposed Bayesian classifier.
Effect of Subliminal Lexical Priming on the Subjective Perception of Images: A Machine Learning Approach.

Directory of Open Access Journals (Sweden)

Dhanya Menoth Mohan

Full Text Available The purpose of the study is to examine the effect of subliminal priming in terms of the perception of images influenced by words with positive, negative, and neutral emotional content, through electroencephalograms (EEGs. Participants were instructed to rate how much they like the stimuli images, on a 7-point Likert scale, after being subliminally exposed to masked lexical prime words that exhibit positive, negative, and neutral connotations with respect to the images. Simultaneously, the EEGs were recorded. Statistical tests such as repeated measures ANOVAs and two-tailed paired-samples t-tests were performed to measure significant differences in the likability ratings among the three prime affect types; the results showed a strong shift in the likeness judgment for the images in the positively primed condition compared to the other two. The acquired EEGs were examined to assess the difference in brain activity associated with the three different conditions. The consistent results obtained confirmed the overall priming effect on participants' explicit ratings. In addition, machine learning algorithms such as support vector machines (SVMs, and AdaBoost classifiers were applied to infer the prime affect type from the ERPs. The highest classification rates of 95.0% and 70.0% obtained respectively for average-trial binary classifier and average-trial multi-class further emphasize that the ERPs encode information about the different kinds of primes.
Effect of Subliminal Lexical Priming on the Subjective Perception of Images: A Machine Learning Approach.

Science.gov (United States)

Mohan, Dhanya Menoth; Kumar, Parmod; Mahmood, Faisal; Wong, Kian Foong; Agrawal, Abhishek; Elgendi, Mohamed; Shukla, Rohit; Ang, Natania; Ching, April; Dauwels, Justin; Chan, Alice H D

2016-01-01

The purpose of the study is to examine the effect of subliminal priming in terms of the perception of images influenced by words with positive, negative, and neutral emotional content, through electroencephalograms (EEGs). Participants were instructed to rate how much they like the stimuli images, on a 7-point Likert scale, after being subliminally exposed to masked lexical prime words that exhibit positive, negative, and neutral connotations with respect to the images. Simultaneously, the EEGs were recorded. Statistical tests such as repeated measures ANOVAs and two-tailed paired-samples t-tests were performed to measure significant differences in the likability ratings among the three prime affect types; the results showed a strong shift in the likeness judgment for the images in the positively primed condition compared to the other two. The acquired EEGs were examined to assess the difference in brain activity associated with the three different conditions. The consistent results obtained confirmed the overall priming effect on participants' explicit ratings. In addition, machine learning algorithms such as support vector machines (SVMs), and AdaBoost classifiers were applied to infer the prime affect type from the ERPs. The highest classification rates of 95.0% and 70.0% obtained respectively for average-trial binary classifier and average-trial multi-class further emphasize that the ERPs encode information about the different kinds of primes.
Adaptation in P300 braincomputer interfaces: A two-classifier cotraining approach

DEFF Research Database (Denmark)

Panicker, Rajesh C.; Sun, Ying; Puthusserypady, Sadasivan

2010-01-01

A cotraining-based approach is introduced for constructing high-performance classifiers for P300-based braincomputer interfaces (BCIs), which were trained from very little data. It uses two classifiers: Fishers linear discriminant analysis and Bayesian linear discriminant analysis progressively...
Health condition identification of multi-stage planetary gearboxes using a mRVM-based method

Science.gov (United States)

Lei, Yaguo; Liu, Zongyao; Wu, Xionghui; Li, Naipeng; Chen, Wu; Lin, Jing

2015-08-01

Multi-stage planetary gearboxes are widely applied in aerospace, automotive and heavy industries. Their key components, such as gears and bearings, can easily suffer from damage due to tough working environment. Health condition identification of planetary gearboxes aims to prevent accidents and save costs. This paper proposes a method based on multiclass relevance vector machine (mRVM) to identify health condition of multi-stage planetary gearboxes. In this method, a mRVM algorithm is adopted as a classifier, and two features, i.e. accumulative amplitudes of carrier orders (AACO) and energy ratio based on difference spectra (ERDS), are used as the input of the classifier to classify different health conditions of multi-stage planetary gearboxes. To test the proposed method, seven health conditions of a two-stage planetary gearbox are considered and vibration data is acquired from the planetary gearbox under different motor speeds and loading conditions. The results of three tests based on different data show that the proposed method obtains an improved identification performance and robustness compared with the existing method.
A kernel-based multivariate feature selection method for microarray data classification.

Directory of Open Access Journals (Sweden)

Shiquan Sun

Full Text Available High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, [Formula: see text]-nearest neighbor on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.
Binary naive Bayesian classifiers for correlated Gaussian features: a theoretical analysis

CSIR Research Space (South Africa)

Van Dyk, E

2008-11-01

Full Text Available classifier with Gaussian features while using any quadratic decision boundary. Therefore, the analysis is not restricted to Naive Bayesian classifiers alone and can, for instance, be used to calculate the Bayes error performance. We compare the analytical...
Automating the construction of scene classifiers for content-based video retrieval

NARCIS (Netherlands)

Khan, L.; Israël, Menno; Petrushin, V.A.; van den Broek, Egon; van der Putten, Peter

2004-01-01

This paper introduces a real time automatic scene classifier within content-based video retrieval. In our envisioned approach end users like documentalists, not image processing experts, build classifiers interactively, by simply indicating positive examples of a scene. Classification consists of a
A unified classifier for robust face recognition based on combining multiple subspace algorithms

Science.gov (United States)

Ijaz Bajwa, Usama; Ahmad Taj, Imtiaz; Waqas Anwar, Muhammad

2012-10-01

Face recognition being the fastest growing biometric technology has expanded manifold in the last few years. Various new algorithms and commercial systems have been proposed and developed. However, none of the proposed or developed algorithm is a complete solution because it may work very well on one set of images with say illumination changes but may not work properly on another set of image variations like expression variations. This study is motivated by the fact that any single classifier cannot claim to show generally better performance against all facial image variations. To overcome this shortcoming and achieve generality, combining several classifiers using various strategies has been studied extensively also incorporating the question of suitability of any classifier for this task. The study is based on the outcome of a comprehensive comparative analysis conducted on a combination of six subspace extraction algorithms and four distance metrics on three facial databases. The analysis leads to the selection of the most suitable classifiers which performs better on one task or the other. These classifiers are then combined together onto an ensemble classifier by two different strategies of weighted sum and re-ranking. The results of the ensemble classifier show that these strategies can be effectively used to construct a single classifier that can successfully handle varying facial image conditions of illumination, aging and facial expressions.
Optimal threshold estimation for binary classifiers using game theory.

Science.gov (United States)

Sanchez, Ignacio Enrique

2016-01-01

Many bioinformatics algorithms can be understood as binary classifiers. They are usually compared using the area under the receiver operating characteristic ( ROC ) curve. On the other hand, choosing the best threshold for practical use is a complex task, due to uncertain and context-dependent skews in the abundance of positives in nature and in the yields/costs for correct/incorrect classification. We argue that considering a classifier as a player in a zero-sum game allows us to use the minimax principle from game theory to determine the optimal operating point. The proposed classifier threshold corresponds to the intersection between the ROC curve and the descending diagonal in ROC space and yields a minimax accuracy of 1-FPR. Our proposal can be readily implemented in practice, and reveals that the empirical condition for threshold estimation of "specificity equals sensitivity" maximizes robustness against uncertainties in the abundance of positives in nature and classification costs.
Comparison of artificial intelligence classifiers for SIP attack data

Science.gov (United States)

Safarik, Jakub; Slachta, Jiri

2016-05-01

Honeypot application is a source of valuable data about attacks on the network. We run several SIP honeypots in various computer networks, which are separated geographically and logically. Each honeypot runs on public IP address and uses standard SIP PBX ports. All information gathered via honeypot is periodically sent to the centralized server. This server classifies all attack data by neural network algorithm. The paper describes optimizations of a neural network classifier, which lower the classification error. The article contains the comparison of two neural network algorithm used for the classification of validation data. The first is the original implementation of the neural network described in recent work; the second neural network uses further optimizations like input normalization or cross-entropy cost function. We also use other implementations of neural networks and machine learning classification algorithms. The comparison test their capabilities on validation data to find the optimal classifier. The article result shows promise for further development of an accurate SIP attack classification engine.
Gene-expression Classifier in Papillary Thyroid Carcinoma

DEFF Research Database (Denmark)

Londero, Stefano Christian; Jespersen, Marie Louise; Krogdahl, Annelise

2016-01-01

BACKGROUND: No reliable biomarker for metastatic potential in the risk stratification of papillary thyroid carcinoma exists. We aimed to develop a gene-expression classifier for metastatic potential. MATERIALS AND METHODS: Genome-wide expression analyses were used. Development cohort: freshly...
A History of Classified Activities at Oak Ridge National Laboratory

Energy Technology Data Exchange (ETDEWEB)

Quist, A.S.

2001-01-30

The facilities that became Oak Ridge National Laboratory (ORNL) were created in 1943 during the United States' super-secret World War II project to construct an atomic bomb (the Manhattan Project). During World War II and for several years thereafter, essentially all ORNL activities were classified. Now, in 2000, essentially all ORNL activities are unclassified. The major purpose of this report is to provide a brief history of ORNL's major classified activities from 1943 until the present (September 2000). This report is expected to be useful to the ORNL Classification Officer and to ORNL's Authorized Derivative Classifiers and Authorized Derivative Declassifiers in their classification review of ORNL documents, especially those documents that date from the 1940s and 1950s.
Oblique decision trees using embedded support vector machines in classifier ensembles

NARCIS (Netherlands)

Menkovski, V.; Christou, I.; Efremidis, S.

2008-01-01

Classifier ensembles have emerged in recent years as a promising research area for boosting pattern recognition systems' performance. We present a new base classifier that utilizes oblique decision tree technology based on support vector machines for the construction of oblique (non-axis parallel)
29 CFR 1926.407 - Hazardous (classified) locations.

Science.gov (United States)

2010-07-01

...) locations, unless modified by provisions of this section. (b) Electrical installations. Equipment, wiring..., DEPARTMENT OF LABOR (CONTINUED) SAFETY AND HEALTH REGULATIONS FOR CONSTRUCTION Electrical Installation Safety... electric equipment and wiring in locations which are classified depending on the properties of the...
Ensembles of novelty detection classifiers for structural health monitoring using guided waves

Science.gov (United States)

Dib, Gerges; Karpenko, Oleksii; Koricho, Ermias; Khomenko, Anton; Haq, Mahmoodul; Udpa, Lalita

2018-01-01

Guided wave structural health monitoring uses sparse sensor networks embedded in sophisticated structures for defect detection and characterization. The biggest challenge of those sensor networks is developing robust techniques for reliable damage detection under changing environmental and operating conditions (EOC). To address this challenge, we develop a novelty classifier for damage detection based on one class support vector machines. We identify appropriate features for damage detection and introduce a feature aggregation method which quadratically increases the number of available training observations. We adopt a two-level voting scheme by using an ensemble of classifiers and predictions. Each classifier is trained on a different segment of the guided wave signal, and each classifier makes an ensemble of predictions based on a single observation. Using this approach, the classifier can be trained using a small number of baseline signals. We study the performance using Monte-Carlo simulations of an analytical model and data from impact damage experiments on a glass fiber composite plate. We also demonstrate the classifier performance using two types of baseline signals: fixed and rolling baseline training set. The former requires prior knowledge of baseline signals from all EOC, while the latter does not and leverages the fact that EOC vary slowly over time and can be modeled as a Gaussian process.

A predictive toxicogenomics signature to classify genotoxic versus non-genotoxic chemicals in human TK6 cells

Directory of Open Access Journals (Sweden)

Andrew Williams

2015-12-01

Full Text Available Genotoxicity testing is a critical component of chemical assessment. The use of integrated approaches in genetic toxicology, including the incorporation of gene expression data to determine the DNA damage response pathways involved in response, is becoming more common. In companion papers previously published in Environmental and Molecular Mutagenesis, Li et al. (2015 [6] developed a dose optimization protocol that was based on evaluating expression changes in several well-characterized stress-response genes using quantitative real-time PCR in human lymphoblastoid TK6 cells in culture. This optimization approach was applied to the analysis of TK6 cells exposed to one of 14 genotoxic or 14 non-genotoxic agents, with sampling 4 h post-exposure. Microarray-based transcriptomic analyses were then used to develop a classifier for genotoxicity using the nearest shrunken centroids method. A panel of 65 genes was identified that could accurately classify toxicants as genotoxic or non-genotoxic. In Buick et al. (2015 [1], the utility of the biomarker for chemicals that require metabolic activation was evaluated. In this study, TK6 cells were exposed to increasing doses of four chemicals (two genotoxic that require metabolic activation and two non-genotoxic chemicals in the presence of rat liver S9 to demonstrate that S9 does not impair the ability to classify genotoxicity using this genomic biomarker in TK6cells.
Classifier for gravitational-wave inspiral signals in nonideal single-detector data

Science.gov (United States)

Kapadia, S. J.; Dent, T.; Dal Canton, T.

2017-11-01

We describe a multivariate classifier for candidate events in a templated search for gravitational-wave (GW) inspiral signals from neutron-star-black-hole (NS-BH) binaries, in data from ground-based detectors where sensitivity is limited by non-Gaussian noise transients. The standard signal-to-noise ratio (SNR) and chi-squared test for inspiral searches use only properties of a single matched filter at the time of an event; instead, we propose a classifier using features derived from a bank of inspiral templates around the time of each event, and also from a search using approximate sine-Gaussian templates. The classifier thus extracts additional information from strain data to discriminate inspiral signals from noise transients. We evaluate a random forest classifier on a set of single-detector events obtained from realistic simulated advanced LIGO data, using simulated NS-BH signals added to the data. The new classifier detects a factor of 1.5-2 more signals at low false positive rates as compared to the standard "reweighted SNR" statistic, and does not require the chi-squared test to be computed. Conversely, if only the SNR and chi-squared values of single-detector events are available, random forest classification performs nearly identically to the reweighted SNR.
Scoring and Classifying Examinees Using Measurement Decision Theory

Directory of Open Access Journals (Sweden)

Lawrence M. Rudner

2009-04-01

Full Text Available This paper describes and evaluates the use of measurement decision theory (MDT to classify examinees based on their item response patterns. The model has a simple framework that starts with the conditional probabilities of examinees in each category or mastery state responding correctly to each item. The presented evaluation investigates: (1 the classification accuracy of tests scored using decision theory; (2 the effectiveness of different sequential testing procedures; and (3 the number of items needed to make a classification. A large percentage of examinees can be classified accurately with very few items using decision theory. A Java Applet for self instruction and software for generating, calibrating and scoring MDT data are provided.
Feature selection for Bayesian network classifiers using the MDL-FS score

NARCIS (Netherlands)

Drugan, Madalina M.; Wiering, Marco A.

When constructing a Bayesian network classifier from data, the more or less redundant features included in a dataset may bias the classifier and as a consequence may result in a relatively poor classification accuracy. In this paper, we study the problem of selecting appropriate subsets of features
Comparing cosmic web classifiers using information theory

Energy Technology Data Exchange (ETDEWEB)

Leclercq, Florent [Institute of Cosmology and Gravitation (ICG), University of Portsmouth, Dennis Sciama Building, Burnaby Road, Portsmouth PO1 3FX (United Kingdom); Lavaux, Guilhem; Wandelt, Benjamin [Institut d' Astrophysique de Paris (IAP), UMR 7095, CNRS – UPMC Université Paris 6, Sorbonne Universités, 98bis boulevard Arago, F-75014 Paris (France); Jasche, Jens, E-mail: florent.leclercq@polytechnique.org, E-mail: lavaux@iap.fr, E-mail: j.jasche@tum.de, E-mail: wandelt@iap.fr [Excellence Cluster Universe, Technische Universität München, Boltzmannstrasse 2, D-85748 Garching (Germany)

2016-08-01

We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.
Comparing cosmic web classifiers using information theory

International Nuclear Information System (INIS)

Leclercq, Florent; Lavaux, Guilhem; Wandelt, Benjamin; Jasche, Jens

2016-01-01

We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Our study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.
Detection of Fundus Lesions Using Classifier Selection

Science.gov (United States)

Nagayoshi, Hiroto; Hiramatsu, Yoshitaka; Sako, Hiroshi; Himaga, Mitsutoshi; Kato, Satoshi

A system for detecting fundus lesions caused by diabetic retinopathy from fundus images is being developed. The system can screen the images in advance in order to reduce the inspection workload on doctors. One of the difficulties that must be addressed in completing this system is how to remove false positives (which tend to arise near blood vessels) without decreasing the detection rate of lesions in other areas. To overcome this difficulty, we developed classifier selection according to the position of a candidate lesion, and we introduced new features that can distinguish true lesions from false positives. A system incorporating classifier selection and these new features was tested in experiments using 55 fundus images with some lesions and 223 images without lesions. The results of the experiments confirm the effectiveness of the proposed system, namely, degrees of sensitivity and specificity of 98% and 81%, respectively.
An evaluation of sampling and full enumeration strategies for Fisher Jenks classification in big data settings

Science.gov (United States)

Rey, Sergio J.; Stephens, Philip A.; Laura, Jason R.

2017-01-01

Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.
Empirical study of classification process for two-stage turbo air classifier in series

Science.gov (United States)

Yu, Yuan; Liu, Jiaxiang; Li, Gang

2013-05-01

The suitable process parameters for a two-stage turbo air classifier are important for obtaining the ultrafine powder that has a narrow particle-size distribution, however little has been published internationally on the classification process for the two-stage turbo air classifier in series. The influence of the process parameters of a two-stage turbo air classifier in series on classification performance is empirically studied by using aluminum oxide powders as the experimental material. The experimental results show the following: 1) When the rotor cage rotary speed of the first-stage classifier is increased from 2 300 r/min to 2 500 r/min with a constant rotor cage rotary speed of the second-stage classifier, classification precision is increased from 0.64 to 0.67. However, in this case, the final ultrafine powder yield is decreased from 79% to 74%, which means the classification precision and the final ultrafine powder yield can be regulated through adjusting the rotor cage rotary speed of the first-stage classifier. 2) When the rotor cage rotary speed of the second-stage classifier is increased from 2 500 r/min to 3 100 r/min with a constant rotor cage rotary speed of the first-stage classifier, the cut size is decreased from 13.16 μm to 8.76 μm, which means the cut size of the ultrafine powder can be regulated through adjusting the rotor cage rotary speed of the second-stage classifier. 3) When the feeding speed is increased from 35 kg/h to 50 kg/h, the "fish-hook" effect is strengthened, which makes the ultrafine powder yield decrease. 4) To weaken the "fish-hook" effect, the equalization of the two-stage wind speeds or the combination of a high first-stage wind speed with a low second-stage wind speed should be selected. This empirical study provides a criterion of process parameter configurations for a two-stage or multi-stage classifier in series, which offers a theoretical basis for practical production.
EXTENDED SPEECH EMOTION RECOGNITION AND PREDICTION

Directory of Open Access Journals (Sweden)

Theodoros Anagnostopoulos

2014-11-01

Full Text Available Humans are considered to reason and act rationally and that is believed to be their fundamental difference from the rest of the living entities. Furthermore, modern approaches in the science of psychology underline that humans as a thinking creatures are also sentimental and emotional organisms. There are fifteen universal extended emotions plus neutral emotion: hot anger, cold anger, panic, fear, anxiety, despair, sadness, elation, happiness, interest, boredom, shame, pride, disgust, contempt and neutral position. The scope of the current research is to understand the emotional state of a human being by capturing the speech utterances that one uses during a common conversation. It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers. The paper deals with emotion classification by a set of majority voting classifiers that combines three certain types of basic classifiers with low computational complexity. The basic classifiers stem from different theoretical background in order to avoid bias and redundancy which gives the proposed set of classifiers the ability to generalize in the emotion domain space.
Case base classification on digital mammograms: improving the performance of case base classifier

Science.gov (United States)

Raman, Valliappan; Then, H. H.; Sumari, Putra; Venkatesa Mohan, N.

2011-10-01

Breast cancer continues to be a significant public health problem in the world. Early detection is the key for improving breast cancer prognosis. The aim of the research presented here is in twofold. First stage of research involves machine learning techniques, which segments and extracts features from the mass of digital mammograms. Second level is on problem solving approach which includes classification of mass by performance based case base classifier. In this paper we build a case-based Classifier in order to diagnose mammographic images. We explain different methods and behaviors that have been added to the classifier to improve the performance of the classifier. Currently the initial Performance base Classifier with Bagging is proposed in the paper and it's been implemented and it shows an improvement in specificity and sensitivity.
Novelty Detection Classifiers in Weed Mapping: Silybum marianum Detection on UAV Multispectral Images.

Science.gov (United States)

Alexandridis, Thomas K; Tamouridou, Afroditi Alexandra; Pantazi, Xanthoula Eirini; Lagopodi, Anastasia L; Kashefi, Javid; Ovakoglou, Georgios; Polychronos, Vassilios; Moshou, Dimitrios

2017-09-01

In the present study, the detection and mapping of Silybum marianum (L.) Gaertn. weed using novelty detection classifiers is reported. A multispectral camera (green-red-NIR) on board a fixed wing unmanned aerial vehicle (UAV) was employed for obtaining high-resolution images. Four novelty detection classifiers were used to identify S. marianum between other vegetation in a field. The classifiers were One Class Support Vector Machine (OC-SVM), One Class Self-Organizing Maps (OC-SOM), Autoencoders and One Class Principal Component Analysis (OC-PCA). As input features to the novelty detection classifiers, the three spectral bands and texture were used. The S. marianum identification accuracy using OC-SVM reached an overall accuracy of 96%. The results show the feasibility of effective S. marianum mapping by means of novelty detection classifiers acting on multispectral UAV imagery.
Extending cluster Lot Quality Assurance Sampling designs for surveillance programs

OpenAIRE

Hund, Lauren; Pagano, Marcello

2014-01-01

Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance based on the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than ...
Variants of the Borda count method for combining ranked classifier hypotheses

NARCIS (Netherlands)

van Erp, Merijn; Schomaker, Lambert; Schomaker, Lambert; Vuurpijl, Louis

2000-01-01

The Borda count is a simple yet effective method of combining rankings. In pattern recognition, classifiers are often able to return a ranked set of results. Several experiments have been conducted to test the ability of the Borda count and two variant methods to combine these ranked classifier
Should OCD be classified as an anxiety disorder in DSM-V?

NARCIS (Netherlands)

Stein, Dan J.; Fineberg, Naomi A.; Bienvenu, O. Joseph; Denys, Damiaan; Lochner, Christine; Nestadt, Gerald; Leckman, James F.; Rauch, Scott L.; Phillips, Katharine A.

2010-01-01

In DSM-III, DSM-III-R, and DSM-IV, obsessive-compulsive disorder (OCD) was classified as an anxiety disorder. In ICD-10, OCD is classified separately from the anxiety disorders, although within the same larger category as anxiety disorders (as one of the "neurotic, stress-related, and somatoform
29 CFR 1910.307 - Hazardous (classified) locations.

Science.gov (United States)

2010-07-01

... equipment at the location. (c) Electrical installations. Equipment, wiring methods, and installations of... covers the requirements for electric equipment and wiring in locations that are classified depending on... provisions of this section. (4) Division and zone classification. In Class I locations, an installation must...
Parameterization of a fuzzy classifier for the diagnosis of an industrial process

International Nuclear Information System (INIS)

Toscano, R.; Lyonnet, P.

2002-01-01

The aim of this paper is to present a classifier based on a fuzzy inference system. For this classifier, we propose a parameterization method, which is not necessarily based on an iterative training. This approach can be seen as a pre-parameterization, which allows the determination of the rules base and the parameters of the membership functions. We also present a continuous and derivable version of the previous classifier and suggest an iterative learning algorithm based on a gradient method. An example using the learning basis IRIS, which is a benchmark for classification problems, is presented showing the performances of this classifier. Finally this classifier is applied to the diagnosis of a DC motor showing the utility of this method. However in many cases the total knowledge necessary to the synthesis of the fuzzy diagnosis system (FDS) is not, in general, directly available. It must be extracted from an often-considerable mass of information. For this reason, a general methodology for the design of a FDS is presented and illustrated on a non-linear plant
An SVM classifier to separate false signals from microcalcifications in digital mammograms

Energy Technology Data Exchange (ETDEWEB)

Bazzani, Armando; Bollini, Dante; Brancaccio, Rosa; Campanini, Renato; Riccardi, Alessandro; Romani, Davide [Department of Physics, University of Bologna (Italy); INFN, Bologna (Italy); Lanconelli, Nico [Department of Physics, University of Bologna, and INFN, Bologna (Italy). E-mail: nico.lanconelli@bo.infn.it; Bevilacqua, Alessandro [Department of Electronics, Computer Science and Systems, University of Bologna, and INFN, Bologna (Italy)

2001-06-01

In this paper we investigate the feasibility of using an SVM (support vector machine) classifier in our automatic system for the detection of clustered microcalcifications in digital mammograms. SVM is a technique for pattern recognition which relies on the statistical learning theory. It minimizes a function of two terms: the number of misclassified vectors of the training set and a term regarding the generalization classifier capability. We compare the SVM classifier with an MLP (multi-layer perceptron) in the false-positive reduction phase of our detection scheme: a detected signal is considered either microcalcification or false signal, according to the value of a set of its features. The SVM classifier gets slightly better results than the MLP one (Az value of 0.963 against 0.958) in the presence of a high number of training data; the improvement becomes much more evident (Az value of 0.952 against 0.918) in training sets of reduced size. Finally, the setting of the SVM classifier is much easier than the MLP one. (author)
A review of novel strategies of sample preparation for the determination of antibacterial residues in foodstuffs using liquid chromatography-based analytical methods

Energy Technology Data Exchange (ETDEWEB)

Marazuela, M.D., E-mail: marazuela@quim.ucm.es [Department of Analytical Chemistry, Faculty of Chemistry, Universidad Complutense de Madrid, E-28040 Madrid (Spain); Bogialli, S [Department of Chemistry, University of Rome ' La Sapienza' , Piazza Aldo Moro, 5 00185 Rome (Italy)

2009-07-10

The determination of trace residues and contaminants in food has been of growing concern over the past few years. Residual antibacterials in food constitute a risk to human health, especially because they can contribute to the transmission of antibiotic-resistant pathogenic bacteria through the food chain. Therefore, to ensure food safety EU and USA regulatory agencies have established lists of forbidden or banned substances and tolerance levels for authorized veterinary drugs (e.g. antibacterials). In addition, the EU Commission Decision 2002/657/EC has set requirements about the performance of analytical methods for the determination of veterinary drug residues in food and feedstuffs. During the past years, the use of powerful mass spectrometric detectors in combination with innovative chromatographic technologies has solved many problems related to sensitivity and selectivity of this type of analysis. However sample preparation still remains as the bottleneck step, mainly in terms of analysis time and sources of error. This review covering research published between 2004 and 2008 intends to provide an update overview of the past five years, on recent trends in sample preparation for the determination of antibacterial residues in foods, making special emphasis in on-line, high-throughput, multi-class methods and including several applications in detail.
Learning for VMM + WTA Embedded Classifiers

Science.gov (United States)

2016-03-31

Learning for VMM + WTA Embedded Classifiers Jennifer Hasler and Sahil Shah Electrical and Computer Engineering Georgia Institute of Technology...enabling correct classification of each novel acoustic signal (generator, idle car, and idle truck ). The classification structure requires, after...measured on our SoC FPAA IC. The test input is composed of signals from urban environment for 3 objects (generator, idle car, and idle truck

Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition

International Nuclear Information System (INIS)

Shen Hongbin; Chou Kuochen

2005-01-01

The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at www.pami.sjtu.edu.cn/people/hbshen
75 FR 707 - Classified National Security Information

Science.gov (United States)

2010-01-05

... classified at one of the following three levels: (1) ``Top Secret'' shall be applied to information, the... exercise this authority. (2) ``Top Secret'' original classification authority may be delegated only by the... official has been delegated ``Top Secret'' original classification authority by the agency head. (4) Each...
A distributed approach for optimizing cascaded classifier topologies in real-time stream mining systems.

Science.gov (United States)

Foo, Brian; van der Schaar, Mihaela

2010-11-01

In this paper, we discuss distributed optimization techniques for configuring classifiers in a real-time, informationally-distributed stream mining system. Due to the large volume of streaming data, stream mining systems must often cope with overload, which can lead to poor performance and intolerable processing delay for real-time applications. Furthermore, optimizing over an entire system of classifiers is a difficult task since changing the filtering process at one classifier can impact both the feature values of data arriving at classifiers further downstream and thus, the classification performance achieved by an ensemble of classifiers, as well as the end-to-end processing delay. To address this problem, this paper makes three main contributions: 1) Based on classification and queuing theoretic models, we propose a utility metric that captures both the performance and the delay of a binary filtering classifier system. 2) We introduce a low-complexity framework for estimating the system utility by observing, estimating, and/or exchanging parameters between the inter-related classifiers deployed across the system. 3) We provide distributed algorithms to reconfigure the system, and analyze the algorithms based on their convergence properties, optimality, information exchange overhead, and rate of adaptation to non-stationary data sources. We provide results using different video classifier systems.
The three-dimensional origin of the classifying algebra

International Nuclear Information System (INIS)

Fuchs, Juergen; Schweigert, Christoph; Stigner, Carl

2010-01-01

It is known that reflection coefficients for bulk fields of a rational conformal field theory in the presence of an elementary boundary condition can be obtained as representation matrices of irreducible representations of the classifying algebra, a semisimple commutative associative complex algebra. We show how this algebra arises naturally from the three-dimensional geometry of factorization of correlators of bulk fields on the disk. This allows us to derive explicit expressions for the structure constants of the classifying algebra as invariants of ribbon graphs in the three-manifold S 2 xS 1 . Our result unravels a precise relation between intertwiners of the action of the mapping class group on spaces of conformal blocks and boundary conditions in rational conformal field theories.
A deep learning method for classifying mammographic breast density categories.

Science.gov (United States)

Mohamed, Aly A; Berg, Wendie A; Peng, Hong; Luo, Yahong; Jankowitz, Rachel C; Wu, Shandong

2018-01-01

Mammographic breast density is an established risk marker for breast cancer and is visually assessed by radiologists in routine mammogram image reading, using four qualitative Breast Imaging and Reporting Data System (BI-RADS) breast density categories. It is particularly difficult for radiologists to consistently distinguish the two most common and most variably assigned BI-RADS categories, i.e., "scattered density" and "heterogeneously dense". The aim of this work was to investigate a deep learning-based breast density classifier to consistently distinguish these two categories, aiming at providing a potential computerized tool to assist radiologists in assigning a BI-RADS category in current clinical workflow. In this study, we constructed a convolutional neural network (CNN)-based model coupled with a large (i.e., 22,000 images) digital mammogram imaging dataset to evaluate the classification performance between the two aforementioned breast density categories. All images were collected from a cohort of 1,427 women who underwent standard digital mammography screening from 2005 to 2016 at our institution. The truths of the density categories were based on standard clinical assessment made by board-certified breast imaging radiologists. Effects of direct training from scratch solely using digital mammogram images and transfer learning of a pretrained model on a large nonmedical imaging dataset were evaluated for the specific task of breast density classification. In order to measure the classification performance, the CNN classifier was also tested on a refined version of the mammogram image dataset by removing some potentially inaccurately labeled images. Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were used to measure the accuracy of the classifier. The AUC was 0.9421 when the CNN-model was trained from scratch on our own mammogram images, and the accuracy increased gradually along with an increased size of training samples
Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks.

Science.gov (United States)

Zhang, Cuicui; Liang, Xuefeng; Matsuyama, Takashi

2014-12-08

Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the "small sample size" (SSS) problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1) how to define diverse base classifiers from the small data; (2) how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0-1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.
Generic Learning-Based Ensemble Framework for Small Sample Size Face Recognition in Multi-Camera Networks

Directory of Open Access Journals (Sweden)

Cuicui Zhang

2014-12-01

Full Text Available Multi-camera networks have gained great interest in video-based surveillance systems for security monitoring, access control, etc. Person re-identification is an essential and challenging task in multi-camera networks, which aims to determine if a given individual has already appeared over the camera network. Individual recognition often uses faces as a trial and requires a large number of samples during the training phrase. This is difficult to fulfill due to the limitation of the camera hardware system and the unconstrained image capturing conditions. Conventional face recognition algorithms often encounter the “small sample size” (SSS problem arising from the small number of training samples compared to the high dimensionality of the sample space. To overcome this problem, interest in the combination of multiple base classifiers has sparked research efforts in ensemble methods. However, existing ensemble methods still open two questions: (1 how to define diverse base classifiers from the small data; (2 how to avoid the diversity/accuracy dilemma occurring during ensemble. To address these problems, this paper proposes a novel generic learning-based ensemble framework, which augments the small data by generating new samples based on a generic distribution and introduces a tailored 0–1 knapsack algorithm to alleviate the diversity/accuracy dilemma. More diverse base classifiers can be generated from the expanded face space, and more appropriate base classifiers are selected for ensemble. Extensive experimental results on four benchmarks demonstrate the higher ability of our system to cope with the SSS problem compared to the state-of-the-art system.
A support vector machine and a random forest classifier indicates a 15-miRNA set related to osteosarcoma recurrence

Directory of Open Access Journals (Sweden)

He Y

2018-01-01

Full Text Available Yunfei He,1,2,* Jun Ma,1,* An Wang,1,3,* Weiheng Wang,1 Shengchang Luo,1 Yaoming Liu,2 Xiaojian Ye1 1Department of Orthopaedics, Changzheng Hospital Affiliated with Second Military Medical University, Shanghai, 2Department of Orthopaedics, Lanzhou General Hospital of Lanzhou Military Command Region, Lanzhou, 3Department of Orthopaedics, Shanghai Armed Police Force Hospital, Shanghai, People’s Republic of China *These authors contributed equally to this work Background: Osteosarcoma, which originates in the mesenchymal tissue, is the prevalent primary solid malignancy of the bone. It is of great importance to explore the mechanisms of metastasis and recurrence, which are two primary reasons accounting for the high death rate in osteosarcoma. Data and methods: Three miRNA expression profiles related to osteosarcoma were downloaded from GEO DataSets. Differentially expressed miRNAs (DEmiRs were screened using MetaDE.ES of the MetaDE package. A support vector machine (SVM classifier was constructed using optimal miRNAs, and its prediction efficiency for recurrence was detected in independent datasets. Finally, a co-expression network was constructed based on the DEmiRs and their target genes. Results: In total, 78 significantly DEmiRs were screened. The SVM classifier constructed by 15 miRNAs could accurately classify 58 samples in 65 samples (89.2% in the GSE39040 database, which was validated in another two databases, GSE39052 (84.62%, 22/26 and GSE79181 (91.3%, 21/23. Cox regression showed that four miRNAs, including hsa-miR-10b, hsa-miR-1227, hsa-miR-146b-3p, and hsa-miR-873, significantly correlated with tumor recurrence time. There were 137, 147, 145, and 77 target genes of the above four miRNAs, respectively, which were assigned to 17 gene ontology functionally annotated terms and 14 Kyoto Encyclopedia of Genes and Genomes pathways. Among them, the “Osteoclast differentiation” pathway contained a total of seven target genes and was
75 FR 37253 - Classified National Security Information

Science.gov (United States)

2010-06-28

... ``Secret.'' (3) Each interior page of a classified document shall be marked at the top and bottom either... ``(TS)'' for Top Secret, ``(S)'' for Secret, and ``(C)'' for Confidential will be used. (2) Portions... from the informational text. (1) Conspicuously place the overall classification at the top and bottom...
Classifying cognitive profiles using machine learning with privileged information in Mild Cognitive Impairment

Directory of Open Access Journals (Sweden)

Hanin Hamdan Alahmadi

2016-11-01

Full Text Available Early diagnosis of dementia is critical for assessing disease progression and potential treatment. State-or-the-art machine learning techniques have been increasingly employed to take on this diagnostic task. In this study, we employed Generalised Matrix Learning Vector Quantization (GMLVQ classifiers to discriminate patients with Mild Cognitive Impairment (MCI from healthy controls based on their cognitive skills. Further, we adopted a ``Learning with privileged information'' approach to combine cognitive and fMRI data for the classification task. The resulting classifier operates solely on the cognitive data while it incorporates the fMRI data as privileged information (PI during training. This novel classifier is of practical use as the collection of brain imaging data is not always possible with patients and older participants.MCI patients and healthy age-matched controls were trained to extract structure from temporal sequences. We ask whether machine learning classifiers can be used to discriminate patients from controls based on the learning performance and whether differences between these groups relate to individual cognitive profiles. To this end, we tested participants in four cognitive tasks: working memory, cognitive inhibition, divided attention, and selective attention. We also collected fMRI data before and after training on the learning task and extracted fMRI responses and connectivity as features for machine learning classifiers. Our results show that the PI guided GMLVQ classifiers outperform the baseline classifier that only used the cognitive data. In addition, we found that for the baseline classifier, divided attention is the only relevant cognitive feature. When PI was incorporated, divided attention remained the most relevant feature while cognitive inhibition became also relevant for the task. Interestingly, this analysis for the fMRI GMLVQ classifier suggests that (1 when overall fMRI signal for structured stimuli is
A metabolic fingerprinting approach based on selected ion flow tube mass spectrometry (SIFT-MS) and chemometrics: A reliable tool for Mediterranean origin-labeled olive oils authentication.

Science.gov (United States)

Bajoub, Aadil; Medina-Rodríguez, Santiago; Ajal, El Amine; Cuadros-Rodríguez, Luis; Monasterio, Romina Paula; Vercammen, Joeri; Fernández-Gutiérrez, Alberto; Carrasco-Pancorbo, Alegría

2018-04-01

Selected Ion flow tube mass spectrometry (SIFT-MS) in combination with chemometrics was used to authenticate the geographical origin of Mediterranean virgin olive oils (VOOs) produced under geographical origin labels. In particular, 130 oil samples from six different Mediterranean regions (Kalamata (Greece); Toscana (Italy); Meknès and Tyout (Morocco); and Priego de Córdoba and Baena (Spain)) were considered. The headspace volatile fingerprints were measured by SIFT-MS in full scan with H 3 O + , NO + and O 2 + as precursor ions and the results were subjected to chemometric treatments. Principal Component Analysis (PCA) was used for preliminary multivariate data analysis and Partial Least Squares-Discriminant Analysis (PLS-DA) was applied to build different models (considering the three reagent ions) to classify samples according to the country of origin and regions (within the same country). The multi-class PLS-DA models showed very good performance in terms of fitting accuracy (98.90-100%) and prediction accuracy (96.70-100% accuracy for cross validation and 97.30-100% accuracy for external validation (test set)). Considering the two-class PLS-DA models, the one for the Spanish samples showed 100% sensitivity, specificity and accuracy in calibration, cross validation and external validation; the model for Moroccan oils also showed very satisfactory results (with perfect scores for almost every parameter in all the cases). Copyright © 2017 Elsevier Ltd. All rights reserved.
A Novel Cascade Classifier for Automatic Microcalcification Detection.

Directory of Open Access Journals (Sweden)

Seung Yeon Shin

Full Text Available In this paper, we present a novel cascaded classification framework for automatic detection of individual and clusters of microcalcifications (μC. Our framework comprises three classification stages: i a random forest (RF classifier for simple features capturing the second order local structure of individual μCs, where non-μC pixels in the target mammogram are efficiently eliminated; ii a more complex discriminative restricted Boltzmann machine (DRBM classifier for μC candidates determined in the RF stage, which automatically learns the detailed morphology of μC appearances for improved discriminative power; and iii a detector to detect clusters of μCs from the individual μC detection results, using two different criteria. From the two-stage RF-DRBM classifier, we are able to distinguish μCs using explicitly computed features, as well as learn implicit features that are able to further discriminate between confusing cases. Experimental evaluation is conducted on the original Mammographic Image Analysis Society (MIAS and mini-MIAS databases, as well as our own Seoul National University Bundang Hospital digital mammographic database. It is shown that the proposed method outperforms comparable methods in terms of receiver operating characteristic (ROC and precision-recall curves for detection of individual μCs and free-response receiver operating characteristic (FROC curve for detection of clustered μCs.
Learning to classify organic and conventional wheat - a machine-learning driven approach using the MeltDB 2.0 metabolomics analysis platform

Directory of Open Access Journals (Sweden)

Nikolas eKessler

2015-03-01

Full Text Available We present results of our machine learning approach to the problem of classifying GC-MS data originating from wheat grains of different farming systems. The aim is to investigate the potential of learning algorithms to classify GC-MS data to be either from conventionally grown or from organically grown samples and considering different cultivars. The motivation of our work is rather obvious on the background of nowadays increased demand for organic food in post-industrialized societies and the necessity to prove organic food authenticity. The background of our data set is given by up to eleven wheat cultivars that have been cultivated in both farming systems, organic and conventional, throughout three years. More than 300 GC-MS measurements were recorded and subsequently processed and analyzed in the MeltDB 2.0 metabolomics analysis platform, being briefly outlined in this paper. We further describe how unsupervised (t-SNE, PCA and supervised (RF, SVM methods can be applied for sample visualization and classification. Our results clearly show that years have most and wheat cultivars have second-most influence on the metabolic composition of a sample. We can also show, that for a given year and cultivar, organic and conventional cultivation can be distinguished by machine-learning algorithms.
Feature weighting using particle swarm optimization for learning vector quantization classifier

Science.gov (United States)

Dongoran, A.; Rahmadani, S.; Zarlis, M.; Zakarias

2018-03-01

This paper discusses and proposes a method of feature weighting in classification assignments on competitive learning artificial neural network LVQ. The weighting feature method is the search for the weight of an attribute using the PSO so as to give effect to the resulting output. This method is then applied to the LVQ-Classifier and tested on the 3 datasets obtained from the UCI Machine Learning repository. Then an accuracy analysis will be generated by two approaches. The first approach using LVQ1, referred to as LVQ-Classifier and the second approach referred to as PSOFW-LVQ, is a proposed model. The result shows that the PSO algorithm is capable of finding attribute weights that increase LVQ-classifier accuracy.
Accuracy Evaluation of C4.5 and Naive Bayes Classifiers Using Attribute Ranking Method

Directory of Open Access Journals (Sweden)

S. Sivakumari

2009-03-01

Full Text Available This paper intends to classify the Ljubljana Breast Cancer dataset using C4.5 Decision Tree and Nai?ve Bayes classifiers. In this work, classification is carriedout using two methods. In the first method, dataset is analysed using all the attributes in the dataset. In the second method, attributes are ranked using information gain ranking technique and only the high ranked attributes are used to build the classification model. We are evaluating the results of C4.5 Decision Tree and Nai?ve Bayes classifiers in terms of classifier accuracy for various folds of cross validation. Our results show that both the classifiers achieve good accuracy on the dataset.
A Constrained Multi-Objective Learning Algorithm for Feed-Forward Neural Network Classifiers

Directory of Open Access Journals (Sweden)

M. Njah

2017-06-01

Full Text Available This paper proposes a new approach to address the optimal design of a Feed-forward Neural Network (FNN based classifier. The originality of the proposed methodology, called CMOA, lie in the use of a new constraint handling technique based on a self-adaptive penalty procedure in order to direct the entire search effort towards finding only Pareto optimal solutions that are acceptable. Neurons and connections of the FNN Classifier are dynamically built during the learning process. The approach includes differential evolution to create new individuals and then keeps only the non-dominated ones as the basis for the next generation. The designed FNN Classifier is applied to six binary classification benchmark problems, obtained from the UCI repository, and results indicated the advantages of the proposed approach over other existing multi-objective evolutionary neural networks classifiers reported recently in the literature.
40 CFR 260.32 - Variances to be classified as a boiler.

Science.gov (United States)

2010-07-01

... 40 Protection of Environment 25 2010-07-01 2010-07-01 false Variances to be classified as a boiler... be classified as a boiler. In accordance with the standards and criteria in § 260.10 (definition of “boiler”), and the procedures in § 260.33, the Administrator may determine on a case-by-case basis that...
A Machine Learning Ensemble Classifier for Early Prediction of Diabetic Retinopathy.

Science.gov (United States)

S K, Somasundaram; P, Alli

2017-11-09

The main complication of diabetes is Diabetic retinopathy (DR), retinal vascular disease and it leads to the blindness. Regular screening for early DR disease detection is considered as an intensive labor and resource oriented task. Therefore, automatic detection of DR diseases is performed only by using the computational technique is the great solution. An automatic method is more reliable to determine the presence of an abnormality in Fundus images (FI) but, the classification process is poorly performed. Recently, few research works have been designed for analyzing texture discrimination capacity in FI to distinguish the healthy images. However, the feature extraction (FE) process was not performed well, due to the high dimensionality. Therefore, to identify retinal features for DR disease diagnosis and early detection using Machine Learning and Ensemble Classification method, called, Machine Learning Bagging Ensemble Classifier (ML-BEC) is designed. The ML-BEC method comprises of two stages. The first stage in ML-BEC method comprises extraction of the candidate objects from Retinal Images (RI). The candidate objects or the features for DR disease diagnosis include blood vessels, optic nerve, neural tissue, neuroretinal rim, optic disc size, thickness and variance. These features are initially extracted by applying Machine Learning technique called, t-distributed Stochastic Neighbor Embedding (t-SNE). Besides, t-SNE generates a probability distribution across high-dimensional images where the images are separated into similar and dissimilar pairs. Then, t-SNE describes a similar probability distribution across the points in the low-dimensional map. This lessens the Kullback-Leibler divergence among two distributions regarding the locations of the points on the map. The second stage comprises of application of ensemble classifiers to the extracted features for providing accurate analysis of digital FI using machine learning. In this stage, an automatic detection
Local curvature analysis for classifying breast tumors: Preliminary analysis in dedicated breast CT

International Nuclear Information System (INIS)

Lee, Juhun; Nishikawa, Robert M.; Reiser, Ingrid; Boone, John M.; Lindfors, Karen K.

2015-01-01

Purpose: The purpose of this study is to measure the effectiveness of local curvature measures as novel image features for classifying breast tumors. Methods: A total of 119 breast lesions from 104 noncontrast dedicated breast computed tomography images of women were used in this study. Volumetric segmentation was done using a seed-based segmentation algorithm and then a triangulated surface was extracted from the resulting segmentation. Total, mean, and Gaussian curvatures were then computed. Normalized curvatures were used as classification features. In addition, traditional image features were also extracted and a forward feature selection scheme was used to select the optimal feature set. Logistic regression was used as a classifier and leave-one-out cross-validation was utilized to evaluate the classification performances of the features. The area under the receiver operating characteristic curve (AUC, area under curve) was used as a figure of merit. Results: Among curvature measures, the normalized total curvature (C_T) showed the best classification performance (AUC of 0.74), while the others showed no classification power individually. Five traditional image features (two shape, two margin, and one texture descriptors) were selected via the feature selection scheme and its resulting classifier achieved an AUC of 0.83. Among those five features, the radial gradient index (RGI), which is a margin descriptor, showed the best classification performance (AUC of 0.73). A classifier combining RGI and C_T yielded an AUC of 0.81, which showed similar performance (i.e., no statistically significant difference) to the classifier with the above five traditional image features. Additional comparisons in AUC values between classifiers using different combinations of traditional image features and C_T were conducted. The results showed that C_T was able to replace the other four image features for the classification task. Conclusions: The normalized curvature measure
A simple, fast and cheap non-SPE screening method for antibacterial residue analysis in milk and liver using liquid chromatography-tandem mass spectrometry.

Science.gov (United States)

Martins, Magda Targa; Melo, Jéssica; Barreto, Fabiano; Hoff, Rodrigo Barcellos; Jank, Louise; Bittencourt, Michele Soares; Arsand, Juliana Bazzan; Schapoval, Elfrides Eva Scherman

2014-11-01

In routine laboratory work, screening methods for multiclass analysis can process a large number of samples in a short time. The main challenge is to develop a methodology to detect as many different classes of residues as possible, combined with speed and low cost. An efficient technique for the analysis of multiclass antibacterial residues (fluoroquinolones, tetracyclines, sulfonamides and trimethoprim) was developed based on simple, environment-friendly extraction for bovine milk, cattle and poultry liver. Acidified ethanol was used as an extracting solvent for milk samples. Liver samples were treated using EDTA-washed sand for cell disruption, methanol:water and acidified acetonitrile as extracting solvent. A total of 24 antibacterial residues were detected and confirmed using liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS), at levels between 10, 25 and 50% of the maximum residue limit (MRL). For liver samples a metabolite (sulfaquinoxaline-OH) was also monitored. A validation procedure was conducted for screening purposes in accordance with European Union requirements (2002/657/EC). The detection capability (CCβ) false compliant rate was less than 5% at the lowest level for each residue. Specificity and ruggedness were also discussed. Incurred and routine samples were analyzed and the method was successfully applied. The results proved that this method can be an important tool in routine analysis, since it is very fast and reliable. Copyright © 2014. Published by Elsevier B.V.

75 FR 733 - Implementation of the Executive Order, ``Classified National Security Information''

Science.gov (United States)

2010-01-05

... of the Executive Order, ``Classified National Security Information'' Memorandum for the Heads of... Security Information'' (the ``order''), which substantially advances my goals for reforming the security... classified information shall provide the Director of the Information Security Oversight Office (ISOO) a copy...
Proposed hybrid-classifier ensemble algorithm to map snow cover area

Science.gov (United States)

Nijhawan, Rahul; Raman, Balasubramanian; Das, Josodhir

2018-01-01

Metaclassification ensemble approach is known to improve the prediction performance of snow-covered area. The methodology adopted in this case is based on neural network along with four state-of-art machine learning algorithms: support vector machine, artificial neural networks, spectral angle mapper, K-mean clustering, and a snow index: normalized difference snow index. An AdaBoost ensemble algorithm related to decision tree for snow-cover mapping is also proposed. According to available literature, these methods have been rarely used for snow-cover mapping. Employing the above techniques, a study was conducted for Raktavarn and Chaturangi Bamak glaciers, Uttarakhand, Himalaya using multispectral Landsat 7 ETM+ (enhanced thematic mapper) image. The study also compares the results with those obtained from statistical combination methods (majority rule and belief functions) and accuracies of individual classifiers. Accuracy assessment is performed by computing the quantity and allocation disagreement, analyzing statistic measures (accuracy, precision, specificity, AUC, and sensitivity) and receiver operating characteristic curves. A total of 225 combinations of parameters for individual classifiers were trained and tested on the dataset and results were compared with the proposed approach. It was observed that the proposed methodology produced the highest classification accuracy (95.21%), close to (94.01%) that was produced by the proposed AdaBoost ensemble algorithm. From the sets of observations, it was concluded that the ensemble of classifiers produced better results compared to individual classifiers.
Classifier models and architectures for EEG-based neonatal seizure detection

International Nuclear Information System (INIS)

Greene, B R; Marnane, W P; Lightbody, G; Reilly, R B; Boylan, G B

2008-01-01

Neonatal seizures are the most common neurological emergency in the neonatal period and are associated with a poor long-term outcome. Early detection and treatment may improve prognosis. This paper aims to develop an optimal set of parameters and a comprehensive scheme for patient-independent multi-channel EEG-based neonatal seizure detection. We employed a dataset containing 411 neonatal seizures. The dataset consists of multi-channel EEG recordings with a mean duration of 14.8 h from 17 neonatal patients. Early-integration and late-integration classifier architectures were considered for the combination of information across EEG channels. Three classifier models based on linear discriminants, quadratic discriminants and regularized discriminants were employed. Furthermore, the effect of electrode montage was considered. The best performing seizure detection system was found to be an early integration configuration employing a regularized discriminant classifier model. A referential EEG montage was found to outperform the more standard bipolar electrode montage for automated neonatal seizure detection. A cross-fold validation estimate of the classifier performance for the best performing system yielded 81.03% of seizures correctly detected with a false detection rate of 3.82%. With post-processing, the false detection rate was reduced to 1.30% with 59.49% of seizures correctly detected. These results represent a comprehensive illustration that robust reliable patient-independent neonatal seizure detection is possible using multi-channel EEG
Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS

OpenAIRE

Sumintadireja, Prihadi; Irawan, Dasapta Erwin; Rezky, Yuanno; Gio, Prana Ugiana; Agustin, Anggita

2016-01-01

This file is the dataset for the following paper "Classifying hot water chemistry: Application of MULTIVARIATE STATISTICS". Authors: Prihadi Sumintadireja1, Dasapta Erwin Irawan1, Yuano Rezky2, Prana Ugiana Gio3, Anggita Agustin1
The Effect of Reading Comprehension and Problem Solving Strategies on Classifying Elementary 4th Grade Students with High and Low Problem Solving Success

Science.gov (United States)

Ulu, Mustafa

2017-01-01

In this study, the effect of fluent reading (speed, reading accuracy percentage, prosodic reading), comprehension (literal comprehension, inferential comprehension) and problem solving strategies on classifying students with high and low problem solving success was researched. The sampling of the research is composed of 279 students at elementary…
Discrimination-Aware Classifiers for Student Performance Prediction

Science.gov (United States)

Luo, Ling; Koprinska, Irena; Liu, Wei

2015-01-01

In this paper we consider discrimination-aware classification of educational data. Mining and using rules that distinguish groups of students based on sensitive attributes such as gender and nationality may lead to discrimination. It is desirable to keep the sensitive attributes during the training of a classifier to avoid information loss but…
Two-categorical bundles and their classifying spaces

DEFF Research Database (Denmark)

Baas, Nils A.; Bökstedt, M.; Kro, T.A.

2012-01-01

-category is a classifying space for the associated principal 2-bundles. In the process of proving this we develop a lot of powerful machinery which may be useful in further studies of 2-categorical topology. As a corollary we get a new proof of the classification of principal bundles. A calculation based...
Evaluation of immunization coverage by lot quality assurance sampling compared with 30-cluster sampling in a primary health centre in India.

OpenAIRE

Singh, J.; Jain, D. C.; Sharma, R. S.; Verghese, T.

1996-01-01

The immunization coverage of infants, children and women residing in a primary health centre (PHC) area in Rajasthan was evaluated both by lot quality assurance sampling (LQAS) and by the 30-cluster sampling method recommended by WHO's Expanded Programme on Immunization (EPI). The LQAS survey was used to classify 27 mutually exclusive subunits of the population, defined as residents in health subcentre areas, on the basis of acceptable or unacceptable levels of immunization coverage among inf...
A Bayesian classifier for symbol recognition

OpenAIRE

Barrat , Sabine; Tabbone , Salvatore; Nourrissier , Patrick

2007-01-01

URL : http://www.buyans.com/POL/UploadedFile/134_9977.pdf; International audience; We present in this paper an original adaptation of Bayesian networks to symbol recognition problem. More precisely, a descriptor combination method, which enables to improve significantly the recognition rate compared to the recognition rates obtained by each descriptor, is presented. In this perspective, we use a simple Bayesian classifier, called naive Bayes. In fact, probabilistic graphical models, more spec...
Air classifier technology (ACT) in dry powder inhalation. Part 1 : Introduction of a novel force distribution concept (FDC) explaining the performance of a basic air classifier on adhesive mixtures

NARCIS (Netherlands)

de Boer, A H; Hagedoorn, P; Gjaltema, D; Goede, J; Frijlink, H W

2003-01-01

Air classifier technology (ACT) is introduced as part of formulation integrated dry powder inhaler development (FIDPI) to optimise the de-agglomeration of inhalation powders. Carrier retention and de-agglomeration results obtained with a basic classifier concept are discussed. The theoretical
Application of Chemometric Techniques to Colorimetric Data in Classifying Automobile Paint

International Nuclear Information System (INIS)

Nur Awatif Rosli; Rozita Osman; Norashikin Saim; Mohd Zuli Jaafar

2015-01-01

The analysis of paint chips is of great interest to forensic investigators, particularly in the examination of hit-and run cases. This study proposes a direct and rapid method in classifying automobile paint samples based on colorimetric data sets; absorption value, reflectance value, luminosity value (L), degree of redness (a) and degree of yellowness (b) obtained from video spectral comparator (VSC) technique. A total of 42 automobile paint samples from 7 manufacturers were analysed. The colorimetric datasets obtained from VSC analysis were subjected to chemometric technique namely cluster analysis (CA) and principal component analysis (PCA). Based on CA, 5 clusters were generated; Cluster 1 consisted of silver color, cluster 2 consisted of white color, cluster 3 consisted of blue and black colors, cluster 4 consisted of red color and cluster 5 consisted of light blue color. PCA resulted in two latent factors explaining 95.58 % of the total variance, enabled to group the 42 automobile paints into five groups. Chemometric application on colorimetric datasets provide meaningful classification of automobile paints based on their tone colour (L, a, b) and light intensity These approaches have the potential to ease the interpretation of complex spectral data involving a large number of comparisons. (author)
A new approach to enhance the performance of decision tree for classifying gene expression data.

Science.gov (United States)

Hassan, Md; Kotagiri, Ramamohanarao

2013-12-20

Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.
Deep learning in jet reconstruction at CMS

CERN Document Server

Stoye, Markus

2017-01-01

Deep learning has led to several breakthroughs outside the field of high energy physics, yet in jet reconstruction for the CMS experiment at the CERN LHC it has not been used so far. This report shows results of applying deep learning strategies to jet reconstruction at the stage of identifying the original parton association of the jet (jet tagging), which is crucial for physics analyses at the LHC experiments. We introduce a custom deep neural network architecture for jet tagging. We compare the performance of this novel method with the other established approaches at CMS and show that the proposed strategy provides a significant improvement. The strategy provides the first multi-class classifier, instead of the few binary classifiers that previously were used, and thus yields more information and in a more convenient way. The performance results obtained with simulation imply a significant improvement for a large number of important physics analysis at the CMS experiment.
Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya.

Directory of Open Access Journals (Sweden)

Julia Jezmir

Full Text Available To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS methodology.The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance.This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains.Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints.
Use of Lot Quality Assurance Sampling to Ascertain Levels of Drug Resistant Tuberculosis in Western Kenya.

Science.gov (United States)

Jezmir, Julia; Cohen, Ted; Zignol, Matteo; Nyakan, Edwin; Hedt-Gauthier, Bethany L; Gardner, Adrian; Kamle, Lydia; Injera, Wilfred; Carter, E Jane

2016-01-01

To classify the prevalence of multi-drug resistant tuberculosis (MDR-TB) in two different geographic settings in western Kenya using the Lot Quality Assurance Sampling (LQAS) methodology. The prevalence of drug resistance was classified among treatment-naïve smear positive TB patients in two settings, one rural and one urban. These regions were classified as having high or low prevalence of MDR-TB according to a static, two-way LQAS sampling plan selected to classify high resistance regions at greater than 5% resistance and low resistance regions at less than 1% resistance. This study classified both the urban and rural settings as having low levels of TB drug resistance. Out of the 105 patients screened in each setting, two patients were diagnosed with MDR-TB in the urban setting and one patient was diagnosed with MDR-TB in the rural setting. An additional 27 patients were diagnosed with a variety of mono- and poly- resistant strains. Further drug resistance surveillance using LQAS may help identify the levels and geographical distribution of drug resistance in Kenya and may have applications in other countries in the African Region facing similar resource constraints.
Analysis and minimization of overtraining effect in rule-based classifiers for computer-aided diagnosis

International Nuclear Information System (INIS)

Li Qiang; Doi Kunio

2006-01-01

Computer-aided diagnostic (CAD) schemes have been developed to assist radiologists detect various lesions in medical images. In CAD schemes, classifiers play a key role in achieving a high lesion detection rate and a low false-positive rate. Although many popular classifiers such as linear discriminant analysis and artificial neural networks have been employed in CAD schemes for reduction of false positives, a rule-based classifier has probably been the simplest and most frequently used one since the early days of development of various CAD schemes. However, with existing rule-based classifiers, there are major disadvantages that significantly reduce their practicality and credibility. The disadvantages include manual design, poor reproducibility, poor evaluation methods such as resubstitution, and a large overtraining effect. An automated rule-based classifier with a minimized overtraining effect can overcome or significantly reduce the extent of the above-mentioned disadvantages. In this study, we developed an 'optimal' method for the selection of cutoff thresholds and a fully automated rule-based classifier. Experimental results performed with Monte Carlo simulation and a real lung nodule CT data set demonstrated that the automated threshold selection method can completely eliminate overtraining effect in the procedure of cutoff threshold selection, and thus can minimize overall overtraining effect in the constructed rule-based classifier. We believe that this threshold selection method is very useful in the construction of automated rule-based classifiers with minimized overtraining effect
A three-parameter model for classifying anurans into four genera based on advertisement calls.

Science.gov (United States)

Gingras, Bruno; Fitch, William Tecumseh

2013-01-01

The vocalizations of anurans are innate in structure and may therefore contain indicators of phylogenetic history. Thus, advertisement calls of species which are more closely related phylogenetically are predicted to be more similar than those of distant species. This hypothesis was evaluated by comparing several widely used machine-learning algorithms. Recordings of advertisement calls from 142 species belonging to four genera were analyzed. A logistic regression model, using mean values for dominant frequency, coefficient of variation of root-mean square energy, and spectral flux, correctly classified advertisement calls with regard to genus with an accuracy above 70%. Similar accuracy rates were obtained using these parameters with a support vector machine model, a K-nearest neighbor algorithm, and a multivariate Gaussian distribution classifier, whereas a Gaussian mixture model performed slightly worse. In contrast, models based on mel-frequency cepstral coefficients did not fare as well. Comparable accuracy levels were obtained on out-of-sample recordings from 52 of the 142 original species. The results suggest that a combination of low-level acoustic attributes is sufficient to discriminate efficiently between the vocalizations of these four genera, thus supporting the initial premise and validating the use of high-throughput algorithms on animal vocalizations to evaluate phylogenetic hypotheses.
Statistical text classifier to detect specific type of medical incidents.

Science.gov (United States)

Wong, Zoie Shui-Yee; Akiyama, Masanori

2013-01-01

WHO Patient Safety has put focus to increase the coherence and expressiveness of patient safety classification with the foundation of International Classification for Patient Safety (ICPS). Text classification and statistical approaches has showed to be successful to identifysafety problems in the Aviation industryusing incident text information. It has been challenging to comprehend the taxonomy of medical incidents in a structured manner. Independent reporting mechanisms for patient safety incidents have been established in the UK, Canada, Australia, Japan, Hong Kong etc. This research demonstrates the potential to construct statistical text classifiers to detect specific type of medical incidents using incident text data. An illustrative example for classifying look-alike sound-alike (LASA) medication incidents using structured text from 227 advisories related to medication errors from Global Patient Safety Alerts (GPSA) is shown in this poster presentation. The classifier was built using logistic regression model. ROC curve and the AUC value indicated that this is a satisfactory good model.
A Naive-Bayes classifier for damage detection in engineering materials

Energy Technology Data Exchange (ETDEWEB)

Addin, O. [Laboratory of Intelligent Systems, Institute of Advanced Technology, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia); Sapuan, S.M. [Department of Mechanical and Manufacturing Engineering, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia)]. E-mail: sapuan@eng.upm.edu.my; Mahdi, E. [Department of Aerospace Engineering, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia); Othman, M. [Department of Communication Technology and Networks, Universiti Putra Malaysia, 43400 Serdang, Selangor (Malaysia)

2007-07-01

This paper is intended to introduce the Bayesian network in general and the Naive-Bayes classifier in particular as one of the most successful classification systems to simulate damage detection in engineering materials. A method for feature subset selection has also been introduced too. The method is based on mean and maximum values of the amplitudes of waves after dividing them into folds then grouping them by a clustering algorithm (e.g. k-means algorithm). The Naive-Bayes classifier and the feature sub-set selection method were analyzed and tested on two sets of data. The data sets were conducted based on artificial damages created in quasi isotopic laminated composites of the AS4/3501-6 graphite/epoxy system and ball bearing of the type 6204 with a steel cage. The Naive-Bayes classifier and the proposed feature subset selection algorithm have been shown as efficient techniques for damage detection in engineering materials.
Discovering mammography-based machine learning classifiers for breast cancer diagnosis.

Science.gov (United States)

Ramos-Pollán, Raúl; Guevara-López, Miguel Angel; Suárez-Ortega, Cesar; Díaz-Herrero, Guillermo; Franco-Valiente, Jose Miguel; Rubio-Del-Solar, Manuel; González-de-Posada, Naimy; Vaz, Mario Augusto Pires; Loureiro, Joana; Ramos, Isabel

2012-08-01

This work explores the design of mammography-based machine learning classifiers (MLC) and proposes a new method to build MLC for breast cancer diagnosis. We massively evaluated MLC configurations to classify features vectors extracted from segmented regions (pathological lesion or normal tissue) on craniocaudal (CC) and/or mediolateral oblique (MLO) mammography image views, providing BI-RADS diagnosis. Previously, appropriate combinations of image processing and normalization techniques were applied to reduce image artifacts and increase mammograms details. The method can be used under different data acquisition circumstances and exploits computer clusters to select well performing MLC configurations. We evaluated 286 cases extracted from the repository owned by HSJ-FMUP, where specialized radiologists segmented regions on CC and/or MLO images (biopsies provided the golden standard). Around 20,000 MLC configurations were evaluated, obtaining classifiers achieving an area under the ROC curve of 0.996 when combining features vectors extracted from CC and MLO views of the same case.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.