WorldWideScience

Sample records for supervised classification algorithms

  1. QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms.

    Science.gov (United States)

    Zwartjes, Ardjan; Havinga, Paul J M; Smit, Gerard J M; Hurink, Johann L

    2016-10-01

    In this work, we introduce QUEST (QUantile Estimation after Supervised Training), an adaptive classification algorithm for Wireless Sensor Networks (WSNs) that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.

  2. QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Ardjan Zwartjes

    2016-10-01

    Full Text Available In this work, we introduce QUEST (QUantile Estimation after Supervised Training, an adaptive classification algorithm for Wireless Sensor Networks (WSNs that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.

  3. A semi-supervised classification algorithm using the TAD-derived background as training data

    Science.gov (United States)

    Fan, Lei; Ambeau, Brittany; Messinger, David W.

    2013-05-01

    In general, spectral image classification algorithms fall into one of two categories: supervised and unsupervised. In unsupervised approaches, the algorithm automatically identifies clusters in the data without a priori information about those clusters (except perhaps the expected number of them). Supervised approaches require an analyst to identify training data to learn the characteristics of the clusters such that they can then classify all other pixels into one of the pre-defined groups. The classification algorithm presented here is a semi-supervised approach based on the Topological Anomaly Detection (TAD) algorithm. The TAD algorithm defines background components based on a mutual k-Nearest Neighbor graph model of the data, along with a spectral connected components analysis. Here, the largest components produced by TAD are used as regions of interest (ROI's),or training data for a supervised classification scheme. By combining those ROI's with a Gaussian Maximum Likelihood (GML) or a Minimum Distance to the Mean (MDM) algorithm, we are able to achieve a semi supervised classification method. We test this classification algorithm against data collected by the HyMAP sensor over the Cooke City, MT area and University of Pavia scene.

  4. QUEST : Eliminating online supervised learning for efficient classification algorithms

    NARCIS (Netherlands)

    Zwartjes, Ardjan; Havinga, Paul J.M.; Smit, Gerard J.M.; Hurink, Johann L.

    2016-01-01

    In this work, we introduce QUEST (QUantile Estimation after Supervised Training), an adaptive classification algorithm for Wireless Sensor Networks (WSNs) that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting

  5. Robust Semi-Supervised Manifold Learning Algorithm for Classification

    Directory of Open Access Journals (Sweden)

    Mingxia Chen

    2018-01-01

    Full Text Available In the recent years, manifold learning methods have been widely used in data classification to tackle the curse of dimensionality problem, since they can discover the potential intrinsic low-dimensional structures of the high-dimensional data. Given partially labeled data, the semi-supervised manifold learning algorithms are proposed to predict the labels of the unlabeled points, taking into account label information. However, these semi-supervised manifold learning algorithms are not robust against noisy points, especially when the labeled data contain noise. In this paper, we propose a framework for robust semi-supervised manifold learning (RSSML to address this problem. The noisy levels of the labeled points are firstly predicted, and then a regularization term is constructed to reduce the impact of labeled points containing noise. A new robust semi-supervised optimization model is proposed by adding the regularization term to the traditional semi-supervised optimization model. Numerical experiments are given to show the improvement and efficiency of RSSML on noisy data sets.

  6. Weakly supervised classification in high energy physics

    Energy Technology Data Exchange (ETDEWEB)

    Dery, Lucio Mwinmaarong [Physics Department, Stanford University,Stanford, CA, 94305 (United States); Nachman, Benjamin [Physics Division, Lawrence Berkeley National Laboratory,1 Cyclotron Rd, Berkeley, CA, 94720 (United States); Rubbo, Francesco; Schwartzman, Ariel [SLAC National Accelerator Laboratory, Stanford University,2575 Sand Hill Rd, Menlo Park, CA, 94025 (United States)

    2017-05-29

    As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. This paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics — quark versus gluon tagging — we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervised classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.

  7. Weakly supervised classification in high energy physics

    International Nuclear Information System (INIS)

    Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco; Schwartzman, Ariel

    2017-01-01

    As machine learning algorithms become increasingly sophisticated to exploit subtle features of the data, they often become more dependent on simulations. This paper presents a new approach called weakly supervised classification in which class proportions are the only input into the machine learning algorithm. Using one of the most challenging binary classification tasks in high energy physics — quark versus gluon tagging — we show that weakly supervised classification can match the performance of fully supervised algorithms. Furthermore, by design, the new algorithm is insensitive to any mis-modeling of discriminating features in the data by the simulation. Weakly supervised classification is a general procedure that can be applied to a wide variety of learning problems to boost performance and robustness when detailed simulations are not reliable or not available.

  8. A New Method for Solving Supervised Data Classification Problems

    Directory of Open Access Journals (Sweden)

    Parvaneh Shabanzadeh

    2014-01-01

    Full Text Available Supervised data classification is one of the techniques used to extract nontrivial information from data. Classification is a widely used technique in various fields, including data mining, industry, medicine, science, and law. This paper considers a new algorithm for supervised data classification problems associated with the cluster analysis. The mathematical formulations for this algorithm are based on nonsmooth, nonconvex optimization. A new algorithm for solving this optimization problem is utilized. The new algorithm uses a derivative-free technique, with robustness and efficiency. To improve classification performance and efficiency in generating classification model, a new feature selection algorithm based on techniques of convex programming is suggested. Proposed methods are tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithms.

  9. Supervised Learning for Visual Pattern Classification

    Science.gov (United States)

    Zheng, Nanning; Xue, Jianru

    This chapter presents an overview of the topics and major ideas of supervised learning for visual pattern classification. Two prevalent algorithms, i.e., the support vector machine (SVM) and the boosting algorithm, are briefly introduced. SVMs and boosting algorithms are two hot topics of recent research in supervised learning. SVMs improve the generalization of the learning machine by implementing the rule of structural risk minimization (SRM). It exhibits good generalization even when little training data are available for machine training. The boosting algorithm can boost a weak classifier to a strong classifier by means of the so-called classifier combination. This algorithm provides a general way for producing a classifier with high generalization capability from a great number of weak classifiers.

  10. Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning.

    Science.gov (United States)

    Gönen, Mehmet

    2014-03-01

    Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that combines linear dimensionality reduction with linear binary classification for supervised multilabel learning and present a deterministic variational approximation algorithm to learn the proposed probabilistic model. We then extend the proposed method to find intrinsic dimensionality of the projected subspace using automatic relevance determination and to handle semi-supervised learning using a low-density assumption. We perform supervised learning experiments on four benchmark multilabel learning data sets by comparing our method with baseline linear dimensionality reduction algorithms. These experiments show that the proposed approach achieves good performance values in terms of hamming loss, average AUC, macro F 1 , and micro F 1 on held-out test data. The low-dimensional embeddings obtained by our method are also very useful for exploratory data analysis. We also show the effectiveness of our approach in finding intrinsic subspace dimensionality and semi-supervised learning tasks.

  11. A Supervised Classification Algorithm for Note Onset Detection

    Directory of Open Access Journals (Sweden)

    Douglas Eck

    2007-01-01

    Full Text Available This paper presents a novel approach to detecting onsets in music audio files. We use a supervised learning algorithm to classify spectrogram frames extracted from digital audio as being onsets or nononsets. Frames classified as onsets are then treated with a simple peak-picking algorithm based on a moving average. We present two versions of this approach. The first version uses a single neural network classifier. The second version combines the predictions of several networks trained using different hyperparameters. We describe the details of the algorithm and summarize the performance of both variants on several datasets. We also examine our choice of hyperparameters by describing results of cross-validation experiments done on a custom dataset. We conclude that a supervised learning approach to note onset detection performs well and warrants further investigation.

  12. A Novel Classification Algorithm Based on Incremental Semi-Supervised Support Vector Machine.

    Directory of Open Access Journals (Sweden)

    Fei Gao

    Full Text Available For current computational intelligence techniques, a major challenge is how to learn new concepts in changing environment. Traditional learning schemes could not adequately address this problem due to a lack of dynamic data selection mechanism. In this paper, inspired by human learning process, a novel classification algorithm based on incremental semi-supervised support vector machine (SVM is proposed. Through the analysis of prediction confidence of samples and data distribution in a changing environment, a "soft-start" approach, a data selection mechanism and a data cleaning mechanism are designed, which complete the construction of our incremental semi-supervised learning system. Noticeably, with the ingenious design procedure of our proposed algorithm, the computation complexity is reduced effectively. In addition, for the possible appearance of some new labeled samples in the learning process, a detailed analysis is also carried out. The results show that our algorithm does not rely on the model of sample distribution, has an extremely low rate of introducing wrong semi-labeled samples and can effectively make use of the unlabeled samples to enrich the knowledge system of classifier and improve the accuracy rate. Moreover, our method also has outstanding generalization performance and the ability to overcome the concept drift in a changing environment.

  13. Assessment of various supervised learning algorithms using different performance metrics

    Science.gov (United States)

    Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.

    2017-11-01

    Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.

  14. Out-of-Sample Generalizations for Supervised Manifold Learning for Classification.

    Science.gov (United States)

    Vural, Elif; Guillemot, Christine

    2016-03-01

    Supervised manifold learning methods for data classification map high-dimensional data samples to a lower dimensional domain in a structure-preserving way while increasing the separation between different classes. Most manifold learning methods compute the embedding only of the initially available data; however, the generalization of the embedding to novel points, i.e., the out-of-sample extension problem, becomes especially important in classification applications. In this paper, we propose a semi-supervised method for building an interpolation function that provides an out-of-sample extension for general supervised manifold learning algorithms studied in the context of classification. The proposed algorithm computes a radial basis function interpolator that minimizes an objective function consisting of the total embedding error of unlabeled test samples, defined as their distance to the embeddings of the manifolds of their own class, as well as a regularization term that controls the smoothness of the interpolation function in a direction-dependent way. The class labels of test data and the interpolation function parameters are estimated jointly with an iterative process. Experimental results on face and object images demonstrate the potential of the proposed out-of-sample extension algorithm for the classification of manifold-modeled data sets.

  15. A supervised learning rule for classification of spatiotemporal spike patterns.

    Science.gov (United States)

    Lilin Guo; Zhenzhong Wang; Adjouadi, Malek

    2016-08-01

    This study introduces a novel supervised algorithm for spiking neurons that take into consideration synapse delays and axonal delays associated with weights. It can be utilized for both classification and association and uses several biologically influenced properties, such as axonal and synaptic delays. This algorithm also takes into consideration spike-timing-dependent plasticity as in Remote Supervised Method (ReSuMe). This paper focuses on the classification aspect alone. Spiked neurons trained according to this proposed learning rule are capable of classifying different categories by the associated sequences of precisely timed spikes. Simulation results have shown that the proposed learning method greatly improves classification accuracy when compared to the Spike Pattern Association Neuron (SPAN) and the Tempotron learning rule.

  16. Automatic Classification Using Supervised Learning in a Medical Document Filtering Application.

    Science.gov (United States)

    Mostafa, J.; Lam, W.

    2000-01-01

    Presents a multilevel model of the information filtering process that permits document classification. Evaluates a document classification approach based on a supervised learning algorithm, measures the accuracy of the algorithm in a neural network that was trained to classify medical documents on cell biology, and discusses filtering…

  17. A supervised contextual classifier based on a region-growth algorithm

    DEFF Research Database (Denmark)

    Lira, Jorge; Maletti, Gabriela Mariel

    2002-01-01

    A supervised classification scheme to segment optical multi-spectral images has been developed. In this classifier, an automated region-growth algorithm delineates the training sets. This algorithm handles three parameters: an initial pixel seed, a window size and a threshold for each class. A su...

  18. Musical Instrument Classification Based on Nonlinear Recurrence Analysis and Supervised Learning

    Directory of Open Access Journals (Sweden)

    R.Rui

    2013-04-01

    Full Text Available In this paper, the phase space reconstruction of time series produced by different instruments is discussed based on the nonlinear dynamic theory. The dense ratio, a novel quantitative recurrence parameter, is proposed to describe the difference of wind instruments, stringed instruments and keyboard instruments in the phase space by analyzing the recursive property of every instrument. Furthermore, a novel supervised learning algorithm for automatic classification of individual musical instrument signals is addressed deriving from the idea of supervised non-negative matrix factorization (NMF algorithm. In our approach, the orthogonal basis matrix could be obtained without updating the matrix iteratively, which NMF is unable to do. The experimental results indicate that the accuracy of the proposed method is improved by 3% comparing with the conventional features in the individual instrument classification.

  19. GMDH-Based Semi-Supervised Feature Selection for Electricity Load Classification Forecasting

    Directory of Open Access Journals (Sweden)

    Lintao Yang

    2018-01-01

    Full Text Available With the development of smart power grids, communication network technology and sensor technology, there has been an exponential growth in complex electricity load data. Irregular electricity load fluctuations caused by the weather and holiday factors disrupt the daily operation of the power companies. To deal with these challenges, this paper investigates a day-ahead electricity peak load interval forecasting problem. It transforms the conventional continuous forecasting problem into a novel interval forecasting problem, and then further converts the interval forecasting problem into the classification forecasting problem. In addition, an indicator system influencing the electricity load is established from three dimensions, namely the load series, calendar data, and weather data. A semi-supervised feature selection algorithm is proposed to address an electricity load classification forecasting issue based on the group method of data handling (GMDH technology. The proposed algorithm consists of three main stages: (1 training the basic classifier; (2 selectively marking the most suitable samples from the unclassified label data, and adding them to an initial training set; and (3 training the classification models on the final training set and classifying the test samples. An empirical analysis of electricity load dataset from four Chinese cities is conducted. Results show that the proposed model can address the electricity load classification forecasting problem more efficiently and effectively than the FW-Semi FS (forward semi-supervised feature selection and GMDH-U (GMDH-based semi-supervised feature selection for customer classification models.

  20. Benchmarking protein classification algorithms via supervised cross-validation

    NARCIS (Netherlands)

    Kertész-Farkas, A.; Dhir, S.; Sonego, P.; Pacurar, M.; Netoteia, S.; Nijveen, H.; Kuzniar, A.; Leunissen, J.A.M.; Kocsor, A.; Pongor, S.

    2008-01-01

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold,

  1. Classification of perovskites with supervised self-organizing maps

    International Nuclear Information System (INIS)

    Kuzmanovski, Igor; Dimitrovska-Lazova, Sandra; Aleksovska, Slobotka

    2007-01-01

    In this work supervised self-organizing maps were used for structural classification of perovskites. For this purpose, structural data for total number of 286 perovskites, belonging to ABO 3 and/or A 2 BB'O 6 types, were collected from literature: 130 of these are cubic, 85 orthorhombic and 71 monoclinic. For classification purposes, the effective ionic radii of the cations, electronegativities of the cations in B-position, as well as, the oxidation states of these cations, were used as input variables. The parameters of the developed models, as well as, the most suitable variables for classification purposes were selected using genetic algorithms. Two-third of all the compounds were used in the training phase. During the optimization process the performances of the models were checked using cross-validation leave-1/10-out. The performances of obtained solutions were checked using the test set composed of the remaining one-third of the compounds. The obtained models for classification of these three classes of perovskite compounds show very good results. Namely, the classification of the compounds in the test set resulted in small number of discrepancies (4.2-6.4%) between the actual crystallographic class and the one predicted by the models. All these results are strong arguments for the validity of supervised self-organizing maps for performing such types of classification. Therefore, the proposed procedure could be successfully used for crystallographic classification of perovskites in one of these three classes

  2. Conduction Delay Learning Model for Unsupervised and Supervised Classification of Spatio-Temporal Spike Patterns.

    Science.gov (United States)

    Matsubara, Takashi

    2017-01-01

    Precise spike timing is considered to play a fundamental role in communications and signal processing in biological neural networks. Understanding the mechanism of spike timing adjustment would deepen our understanding of biological systems and enable advanced engineering applications such as efficient computational architectures. However, the biological mechanisms that adjust and maintain spike timing remain unclear. Existing algorithms adopt a supervised approach, which adjusts the axonal conduction delay and synaptic efficacy until the spike timings approximate the desired timings. This study proposes a spike timing-dependent learning model that adjusts the axonal conduction delay and synaptic efficacy in both unsupervised and supervised manners. The proposed learning algorithm approximates the Expectation-Maximization algorithm, and classifies the input data encoded into spatio-temporal spike patterns. Even in the supervised classification, the algorithm requires no external spikes indicating the desired spike timings unlike existing algorithms. Furthermore, because the algorithm is consistent with biological models and hypotheses found in existing biological studies, it could capture the mechanism underlying biological delay learning.

  3. Projected estimators for robust semi-supervised classification

    DEFF Research Database (Denmark)

    Krijthe, Jesse H.; Loog, Marco

    2017-01-01

    For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semi-supervised learning, the procedure...... specifically, we prove that, measured on the labeled and unlabeled training data, this semi-supervised procedure never gives a lower quadratic loss than the supervised alternative. To our knowledge this is the first approach that offers such strong, albeit conservative, guarantees for improvement over...... the supervised solution. The characteristics of our approach are explicated using benchmark datasets to further understand the similarities and differences between the quadratic loss criterion used in the theoretical results and the classification accuracy typically considered in practice....

  4. An iterated Laplacian based semi-supervised dimensionality reduction for classification of breast cancer on ultrasound images.

    Science.gov (United States)

    Liu, Xiao; Shi, Jun; Zhou, Shichong; Lu, Minhua

    2014-01-01

    The dimensionality reduction is an important step in ultrasound image based computer-aided diagnosis (CAD) for breast cancer. A newly proposed l2,1 regularized correntropy algorithm for robust feature selection (CRFS) has achieved good performance for noise corrupted data. Therefore, it has the potential to reduce the dimensions of ultrasound image features. However, in clinical practice, the collection of labeled instances is usually expensive and time costing, while it is relatively easy to acquire the unlabeled or undetermined instances. Therefore, the semi-supervised learning is very suitable for clinical CAD. The iterated Laplacian regularization (Iter-LR) is a new regularization method, which has been proved to outperform the traditional graph Laplacian regularization in semi-supervised classification and ranking. In this study, to augment the classification accuracy of the breast ultrasound CAD based on texture feature, we propose an Iter-LR-based semi-supervised CRFS (Iter-LR-CRFS) algorithm, and then apply it to reduce the feature dimensions of ultrasound images for breast CAD. We compared the Iter-LR-CRFS with LR-CRFS, original supervised CRFS, and principal component analysis. The experimental results indicate that the proposed Iter-LR-CRFS significantly outperforms all other algorithms.

  5. Supervised classification of distributed data streams for smart grids

    Energy Technology Data Exchange (ETDEWEB)

    Guarracino, Mario R. [High Performance Computing and Networking - National Research Council of Italy, Naples (Italy); Irpino, Antonio; Verde, Rosanna [Seconda Universita degli Studi di Napoli, Dipartimento di Studi Europei e Mediterranei, Caserta (Italy); Radziukyniene, Neringa [Lithuanian Energy Institute, Laboratory of Systems Control and Automation, Kaunas (Lithuania)

    2012-03-15

    The electricity system inherited from the 19th and 20th centuries has been a reliable but centralized system. With the spreading of local, distributed and intermittent renewable energy resources, top-down central control of the grid no longer meets modern requirements. For these reasons, the power grid has been equipped with smart meters integrating bi-directional communications, advanced power measurement and management capabilities. Smart meters make it possible to remotely turn power on or off to a customer, read usage information, detect a service outage and the unauthorized use of electricity. To fully exploit their capabilities, we foresee the usage of distributed supervised classification algorithms. By gathering data available from meters and other sensors, such algorithms can create local classification models for attack detection, online monitoring, privacy preservation, workload balancing, prediction of energy demand and incoming faults. In this paper we present a decentralized distributed classification algorithm based on proximal support vector machines. The method uses partial knowledge, in form of data streams, to build its local model on each meter. We demonstrate the performance of the proposed scheme on synthetic datasets. (orig.)

  6. Supervised Cross-Modal Factor Analysis for Multiple Modal Data Classification

    KAUST Repository

    Wang, Jingbin

    2015-10-09

    In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., An image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods.

  7. Dynamic classification system in large-scale supervision of energy efficiency in buildings

    International Nuclear Information System (INIS)

    Kiluk, S.

    2014-01-01

    Highlights: • Rough set approximation of classification improves energy efficiency prediction. • Dynamic features of diagnostic classification allow for its precise prediction. • Indiscernibility in large population enhances identification of process features. • Diagnostic information can be refined by dynamic references to local neighbourhood. • We introduce data exploration validation based on system dynamics and uncertainty. - Abstract: Data mining and knowledge discovery applied to the billing data provide the diagnostic instruments for the evaluation of energy use in buildings connected to a district heating network. To ensure the validity of an algorithm-based classification system, the dynamic properties of a sequence of partitions for consecutive detected events were investigated. The information regarding the dynamic properties of the classification system refers to the similarities between the supervised objects and migrations that originate from the changes in the building energy use and loss similarity to their neighbourhood and thus represents the refinement of knowledge. In this study, we demonstrate that algorithm-based diagnostic knowledge has dynamic properties that can be exploited with a rough set predictor to evaluate whether the implementation of classification for supervision of energy use aligns with the dynamics of changes of district heating-supplied building properties. Moreover, we demonstrate the refinement of the current knowledge with the previous findings and we present the creation of predictive diagnostic systems based on knowledge dynamics with a satisfactory level of classification errors, even for non-stationary data

  8. Projected estimators for robust semi-supervised classification

    NARCIS (Netherlands)

    Krijthe, J.H.; Loog, M.

    2017-01-01

    For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semi-supervised learning, the

  9. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

    Science.gov (United States)

    Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H.; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data. PMID:28403159

  10. Multispectral and Panchromatic used Enhancement Resolution and Study Effective Enhancement on Supervised and Unsupervised Classification Land – Cover

    Science.gov (United States)

    Salman, S. S.; Abbas, W. A.

    2018-05-01

    The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.

  11. Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery.

    Science.gov (United States)

    Sigdel, Madhav; Dinç, İmren; Dinç, Semih; Sigdel, Madhu S; Pusey, Marc L; Aygün, Ramazan S

    2014-03-01

    In this paper, we investigate the performance of two wrapper methods for semi-supervised learning algorithms for classification of protein crystallization images with limited labeled images. Firstly, we evaluate the performance of semi-supervised approach using self-training with naïve Bayesian (NB) and sequential minimum optimization (SMO) as the base classifiers. The confidence values returned by these classifiers are used to select high confident predictions to be used for self-training. Secondly, we analyze the performance of Yet Another Two Stage Idea (YATSI) semi-supervised learning using NB, SMO, multilayer perceptron (MLP), J48 and random forest (RF) classifiers. These results are compared with the basic supervised learning using the same training sets. We perform our experiments on a dataset consisting of 2250 protein crystallization images for different proportions of training and test data. Our results indicate that NB and SMO using both self-training and YATSI semi-supervised approaches improve accuracies with respect to supervised learning. On the other hand, MLP, J48 and RF perform better using basic supervised learning. Overall, random forest classifier yields the best accuracy with supervised learning for our dataset.

  12. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

    DEFF Research Database (Denmark)

    Yoo, C.; Gernaey, Krist

    2008-01-01

    importance in the projection (VIP) information of the DPLS method. The power of the gene selection method and the proposed supervised hierarchical clustering method is illustrated on a three microarray data sets of leukemia, breast, and colon cancer. Supervised machine learning algorithms thus enable...

  13. A Semisupervised Cascade Classification Algorithm

    Directory of Open Access Journals (Sweden)

    Stamatis Karlos

    2016-01-01

    Full Text Available Classification is one of the most important tasks of data mining techniques, which have been adopted by several modern applications. The shortage of enough labeled data in the majority of these applications has shifted the interest towards using semisupervised methods. Under such schemes, the use of collected unlabeled data combined with a clearly smaller set of labeled examples leads to similar or even better classification accuracy against supervised algorithms, which use labeled examples exclusively during the training phase. A novel approach for increasing semisupervised classification using Cascade Classifier technique is presented in this paper. The main characteristic of Cascade Classifier strategy is the use of a base classifier for increasing the feature space by adding either the predicted class or the probability class distribution of the initial data. The classifier of the second level is supplied with the new dataset and extracts the decision for each instance. In this work, a self-trained NB∇C4.5 classifier algorithm is presented, which combines the characteristics of Naive Bayes as a base classifier and the speed of C4.5 for final classification. We performed an in-depth comparison with other well-known semisupervised classification methods on standard benchmark datasets and we finally reached to the point that the presented technique has better accuracy in most cases.

  14. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    Science.gov (United States)

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  15. Feature Genes Selection Using Supervised Locally Linear Embedding and Correlation Coefficient for Microarray Classification.

    Science.gov (United States)

    Xu, Jiucheng; Mu, Huiyu; Wang, Yun; Huang, Fangzhou

    2018-01-01

    The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC 2 ), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.

  16. SemiBoost: boosting for semi-supervised learning.

    Science.gov (United States)

    Mallapragada, Pavan Kumar; Jin, Rong; Jain, Anil K; Liu, Yi

    2009-11-01

    Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.

  17. Classification of gene expression data: A hubness-aware semi-supervised approach.

    Science.gov (United States)

    Buza, Krisztian

    2016-04-01

    Classification of gene expression data is the common denominator of various biomedical recognition tasks. However, obtaining class labels for large training samples may be difficult or even impossible in many cases. Therefore, semi-supervised classification techniques are required as semi-supervised classifiers take advantage of unlabeled data. Gene expression data is high-dimensional which gives rise to the phenomena known under the umbrella of the curse of dimensionality, one of its recently explored aspects being the presence of hubs or hubness for short. Therefore, hubness-aware classifiers have been developed recently, such as Naive Hubness-Bayesian k-Nearest Neighbor (NHBNN). In this paper, we propose a semi-supervised extension of NHBNN which follows the self-training schema. As one of the core components of self-training is the certainty score, we propose a new hubness-aware certainty score. We performed experiments on publicly available gene expression data. These experiments show that the proposed classifier outperforms its competitors. We investigated the impact of each of the components (classification algorithm, semi-supervised technique, hubness-aware certainty score) separately and showed that each of these components are relevant to the performance of the proposed approach. Our results imply that our approach may increase classification accuracy and reduce computational costs (i.e., runtime). Based on the promising results presented in the paper, we envision that hubness-aware techniques will be used in various other biomedical machine learning tasks. In order to accelerate this process, we made an implementation of hubness-aware machine learning techniques publicly available in the PyHubs software package (http://www.biointelligence.hu/pyhubs) implemented in Python, one of the most popular programming languages of data science. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  18. A SURVEY OF SEMI-SUPERVISED LEARNING

    OpenAIRE

    Amrita Sadarangani *, Dr. Anjali Jivani

    2016-01-01

    Semi Supervised Learning involves using both labeled and unlabeled data to train a classifier or for clustering. Semi supervised learning finds usage in many applications, since labeled data can be hard to find in many cases. Currently, a lot of research is being conducted in this area. This paper discusses the different algorithms of semi supervised learning and then their advantages and limitations are compared. The differences between supervised classification and semi-supervised classific...

  19. Semi-Supervised Classification for Fault Diagnosis in Nuclear Power Plants

    International Nuclear Information System (INIS)

    Ma, Jian Ping; Jiang, Jin

    2014-01-01

    Pattern classification methods have become important tools for fault diagnosis in industrial systems. However, it is normally difficult to obtain reliable labeled data to train a supervised pattern classification model for applications in a nuclear power plant (NPP). However, unlabeled data easily become available through increased deployment of supervisory, control, and data acquisition (SCADA) systems. In this paper, a fault diagnosis scheme based on semi-supervised classification (SSC) method is developed with specific applications for NPP. In this scheme, newly measured plant data are treated as unlabeled data. They are integrated with selected labeled data to train a SSC model which is then used to estimate labels of the new data. Compared to exclusive supervised approaches, the proposed scheme requires significantly less number of labeled data to train a classifier. Furthermore, it is shown that higher degree of uncertainties in the labeled data can be tolerated. The developed scheme has been validated using the data generated from a desktop NPP simulator and also from a physical NPP simulator using a graph-based SSC algorithm. Two case studies have been used in the validation process. In the first case study, three faults have been simulated on the desktop simulator. These faults have all been classified successfully with only four labeled data points per fault case. In the second case, six types of fault are simulated on the physical NPP simulator. All faults have been successfully diagnosed. The results have demonstrated that SSC is a promising tool for fault diagnosis

  20. Toward optimal feature selection using ranking methods and classification algorithms

    Directory of Open Access Journals (Sweden)

    Novaković Jasmina

    2011-01-01

    Full Text Available We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, C4.5 decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. Our cases confirm that, in order to be sure that a subset of features giving the highest accuracy has been selected, the use of many different indices is recommended.

  1. Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions.

    Science.gov (United States)

    Chen, Ke; Wang, Shihai

    2011-01-01

    Semi-supervised learning concerns the problem of learning in the presence of labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes all three semi-supervised assumptions, i.e., smoothness, cluster, and manifold assumptions, together into account during boosting learning. In this paper, we propose a novel cost functional consisting of the margin cost on labeled data and the regularization penalty on unlabeled data based on three fundamental semi-supervised assumptions. Thus, minimizing our proposed cost functional with a greedy yet stagewise functional optimization procedure leads to a generic boosting framework for semi-supervised learning. Extensive experiments demonstrate that our algorithm yields favorite results for benchmark and real-world classification tasks in comparison to state-of-the-art semi-supervised learning algorithms, including newly developed boosting algorithms. Finally, we discuss relevant issues and relate our algorithm to the previous work.

  2. Learning Supervised Topic Models for Classification and Regression from Crowds

    DEFF Research Database (Denmark)

    Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete

    2017-01-01

    problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages...... annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression...

  3. An Incremental Classification Algorithm for Mining Data with Feature Space Heterogeneity

    Directory of Open Access Journals (Sweden)

    Yu Wang

    2014-01-01

    Full Text Available Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH, to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.

  4. Supervised learning for the automated transcription of spacer classification from spoligotype films

    Directory of Open Access Journals (Sweden)

    Abernethy Neil

    2009-08-01

    Full Text Available Abstract Background Molecular genotyping of bacteria has revolutionized the study of tuberculosis epidemiology, yet these established laboratory techniques typically require subjective and laborious interpretation by trained professionals. In the context of a Tuberculosis Case Contact study in The Gambia we used a reverse hybridization laboratory assay called spoligotype analysis. To facilitate processing of spoligotype images we have developed tools and algorithms to automate the classification and transcription of these data directly to a database while allowing for manual editing. Results Features extracted from each of the 1849 spots on a spoligo film were classified using two supervised learning algorithms. A graphical user interface allows manual editing of the classification, before export to a database. The application was tested on ten films of differing quality and the results of the best classifier were compared to expert manual classification, giving a median correct classification rate of 98.1% (inter quartile range: 97.1% to 99.2%, with an automated processing time of less than 1 minute per film. Conclusion The software implementation offers considerable time savings over manual processing whilst allowing expert editing of the automated classification. The automatic upload of the classification to a database reduces the chances of transcription errors.

  5. Learning Supervised Topic Models for Classification and Regression from Crowds.

    Science.gov (United States)

    Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete; Pereira, Francisco C

    2017-12-01

    The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.

  6. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

    Directory of Open Access Journals (Sweden)

    Deborah Galpert

    2015-01-01

    Full Text Available Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

  7. A review of supervised object-based land-cover image classification

    Science.gov (United States)

    Ma, Lei; Li, Manchun; Ma, Xiaoxue; Cheng, Liang; Du, Peijun; Liu, Yongxue

    2017-08-01

    Object-based image classification for land-cover mapping purposes using remote-sensing imagery has attracted significant attention in recent years. Numerous studies conducted over the past decade have investigated a broad array of sensors, feature selection, classifiers, and other factors of interest. However, these research results have not yet been synthesized to provide coherent guidance on the effect of different supervised object-based land-cover classification processes. In this study, we first construct a database with 28 fields using qualitative and quantitative information extracted from 254 experimental cases described in 173 scientific papers. Second, the results of the meta-analysis are reported, including general characteristics of the studies (e.g., the geographic range of relevant institutes, preferred journals) and the relationships between factors of interest (e.g., spatial resolution and study area or optimal segmentation scale, accuracy and number of targeted classes), especially with respect to the classification accuracy of different sensors, segmentation scale, training set size, supervised classifiers, and land-cover types. Third, useful data on supervised object-based image classification are determined from the meta-analysis. For example, we find that supervised object-based classification is currently experiencing rapid advances, while development of the fuzzy technique is limited in the object-based framework. Furthermore, spatial resolution correlates with the optimal segmentation scale and study area, and Random Forest (RF) shows the best performance in object-based classification. The area-based accuracy assessment method can obtain stable classification performance, and indicates a strong correlation between accuracy and training set size, while the accuracy of the point-based method is likely to be unstable due to mixed objects. In addition, the overall accuracy benefits from higher spatial resolution images (e.g., unmanned aerial

  8. Seizure Classification From EEG Signals Using Transfer Learning, Semi-Supervised Learning and TSK Fuzzy System.

    Science.gov (United States)

    Jiang, Yizhang; Wu, Dongrui; Deng, Zhaohong; Qian, Pengjiang; Wang, Jun; Wang, Guanjin; Chung, Fu-Lai; Choi, Kup-Sze; Wang, Shitong

    2017-12-01

    Recognition of epileptic seizures from offline EEG signals is very important in clinical diagnosis of epilepsy. Compared with manual labeling of EEG signals by doctors, machine learning approaches can be faster and more consistent. However, the classification accuracy is usually not satisfactory for two main reasons: the distributions of the data used for training and testing may be different, and the amount of training data may not be enough. In addition, most machine learning approaches generate black-box models that are difficult to interpret. In this paper, we integrate transductive transfer learning, semi-supervised learning and TSK fuzzy system to tackle these three problems. More specifically, we use transfer learning to reduce the discrepancy in data distribution between the training and testing data, employ semi-supervised learning to use the unlabeled testing data to remedy the shortage of training data, and adopt TSK fuzzy system to increase model interpretability. Two learning algorithms are proposed to train the system. Our experimental results show that the proposed approaches can achieve better performance than many state-of-the-art seizure classification algorithms.

  9. Experiments on Supervised Learning Algorithms for Text Categorization

    Science.gov (United States)

    Namburu, Setu Madhavi; Tu, Haiying; Luo, Jianhui; Pattipati, Krishna R.

    2005-01-01

    Modern information society is facing the challenge of handling massive volume of online documents, news, intelligence reports, and so on. How to use the information accurately and in a timely manner becomes a major concern in many areas. While the general information may also include images and voice, we focus on the categorization of text data in this paper. We provide a brief overview of the information processing flow for text categorization, and discuss two supervised learning algorithms, viz., support vector machines (SVM) and partial least squares (PLS), which have been successfully applied in other domains, e.g., fault diagnosis [9]. While SVM has been well explored for binary classification and was reported as an efficient algorithm for text categorization, PLS has not yet been applied to text categorization. Our experiments are conducted on three data sets: Reuter's- 21578 dataset about corporate mergers and data acquisitions (ACQ), WebKB and the 20-Newsgroups. Results show that the performance of PLS is comparable to SVM in text categorization. A major drawback of SVM for multi-class categorization is that it requires a voting scheme based on the results of pair-wise classification. PLS does not have this drawback and could be a better candidate for multi-class text categorization.

  10. Graph-Based Semi-Supervised Hyperspectral Image Classification Using Spatial Information

    Science.gov (United States)

    Jamshidpour, N.; Homayouni, S.; Safari, A.

    2017-09-01

    Hyperspectral image classification has been one of the most popular research areas in the remote sensing community in the past decades. However, there are still some problems that need specific attentions. For example, the lack of enough labeled samples and the high dimensionality problem are two most important issues which degrade the performance of supervised classification dramatically. The main idea of semi-supervised learning is to overcome these issues by the contribution of unlabeled samples, which are available in an enormous amount. In this paper, we propose a graph-based semi-supervised classification method, which uses both spectral and spatial information for hyperspectral image classification. More specifically, two graphs were designed and constructed in order to exploit the relationship among pixels in spectral and spatial spaces respectively. Then, the Laplacians of both graphs were merged to form a weighted joint graph. The experiments were carried out on two different benchmark hyperspectral data sets. The proposed method performed significantly better than the well-known supervised classification methods, such as SVM. The assessments consisted of both accuracy and homogeneity analyses of the produced classification maps. The proposed spectral-spatial SSL method considerably increased the classification accuracy when the labeled training data set is too scarce.When there were only five labeled samples for each class, the performance improved 5.92% and 10.76% compared to spatial graph-based SSL, for AVIRIS Indian Pine and Pavia University data sets respectively.

  11. GRAPH-BASED SEMI-SUPERVISED HYPERSPECTRAL IMAGE CLASSIFICATION USING SPATIAL INFORMATION

    Directory of Open Access Journals (Sweden)

    N. Jamshidpour

    2017-09-01

    Full Text Available Hyperspectral image classification has been one of the most popular research areas in the remote sensing community in the past decades. However, there are still some problems that need specific attentions. For example, the lack of enough labeled samples and the high dimensionality problem are two most important issues which degrade the performance of supervised classification dramatically. The main idea of semi-supervised learning is to overcome these issues by the contribution of unlabeled samples, which are available in an enormous amount. In this paper, we propose a graph-based semi-supervised classification method, which uses both spectral and spatial information for hyperspectral image classification. More specifically, two graphs were designed and constructed in order to exploit the relationship among pixels in spectral and spatial spaces respectively. Then, the Laplacians of both graphs were merged to form a weighted joint graph. The experiments were carried out on two different benchmark hyperspectral data sets. The proposed method performed significantly better than the well-known supervised classification methods, such as SVM. The assessments consisted of both accuracy and homogeneity analyses of the produced classification maps. The proposed spectral-spatial SSL method considerably increased the classification accuracy when the labeled training data set is too scarce.When there were only five labeled samples for each class, the performance improved 5.92% and 10.76% compared to spatial graph-based SSL, for AVIRIS Indian Pine and Pavia University data sets respectively.

  12. Observation versus classification in supervised category learning.

    Science.gov (United States)

    Levering, Kimery R; Kurtz, Kenneth J

    2015-02-01

    The traditional supervised classification paradigm encourages learners to acquire only the knowledge needed to predict category membership (a discriminative approach). An alternative that aligns with important aspects of real-world concept formation is learning with a broader focus to acquire knowledge of the internal structure of each category (a generative approach). Our work addresses the impact of a particular component of the traditional classification task: the guess-and-correct cycle. We compare classification learning to a supervised observational learning task in which learners are shown labeled examples but make no classification response. The goals of this work sit at two levels: (1) testing for differences in the nature of the category representations that arise from two basic learning modes; and (2) evaluating the generative/discriminative continuum as a theoretical tool for understand learning modes and their outcomes. Specifically, we view the guess-and-correct cycle as consistent with a more discriminative approach and therefore expected it to lead to narrower category knowledge. Across two experiments, the observational mode led to greater sensitivity to distributional properties of features and correlations between features. We conclude that a relatively subtle procedural difference in supervised category learning substantially impacts what learners come to know about the categories. The results demonstrate the value of the generative/discriminative continuum as a tool for advancing the psychology of category learning and also provide a valuable constraint for formal models and associated theories.

  13. A new supervised learning algorithm for spiking neurons.

    Science.gov (United States)

    Xu, Yan; Zeng, Xiaoqin; Zhong, Shuiming

    2013-06-01

    The purpose of supervised learning with temporal encoding for spiking neurons is to make the neurons emit a specific spike train encoded by the precise firing times of spikes. If only running time is considered, the supervised learning for a spiking neuron is equivalent to distinguishing the times of desired output spikes and the other time during the running process of the neuron through adjusting synaptic weights, which can be regarded as a classification problem. Based on this idea, this letter proposes a new supervised learning method for spiking neurons with temporal encoding; it first transforms the supervised learning into a classification problem and then solves the problem by using the perceptron learning rule. The experiment results show that the proposed method has higher learning accuracy and efficiency over the existing learning methods, so it is more powerful for solving complex and real-time problems.

  14. Ship-Iceberg Discrimination in Sentinel-2 Multispectral Imagery by Supervised Classification

    Directory of Open Access Journals (Sweden)

    Peder Heiselberg

    2017-11-01

    Full Text Available The European Space Agency Sentinel-2 satellites provide multispectral images with pixel sizes down to 10 m. This high resolution allows for fast and frequent detection, classification and discrimination of various objects in the sea, which is relevant in general and specifically for the vast Arctic environment. We analyze several sets of multispectral image data from Denmark and Greenland fall and winter, and describe a supervised search and classification algorithm based on physical parameters that successfully finds and classifies all objects in the sea with reflectance above a threshold. It discriminates between objects like ships, islands, wakes, and icebergs, ice floes, and clouds with accuracy better than 90%. Pan-sharpening the infrared bands leads to classification and discrimination of ice floes and clouds better than 95%. For complex images with abundant ice floes or clouds, however, the false alarm rate dominates for small non-sailing boats.

  15. Semi-supervised and unsupervised extreme learning machines.

    Science.gov (United States)

    Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

    2014-12-01

    Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.

  16. Enhanced manifold regularization for semi-supervised classification.

    Science.gov (United States)

    Gan, Haitao; Luo, Zhizeng; Fan, Yingle; Sang, Nong

    2016-06-01

    Manifold regularization (MR) has become one of the most widely used approaches in the semi-supervised learning field. It has shown superiority by exploiting the local manifold structure of both labeled and unlabeled data. The manifold structure is modeled by constructing a Laplacian graph and then incorporated in learning through a smoothness regularization term. Hence the labels of labeled and unlabeled data vary smoothly along the geodesics on the manifold. However, MR has ignored the discriminative ability of the labeled and unlabeled data. To address the problem, we propose an enhanced MR framework for semi-supervised classification in which the local discriminative information of the labeled and unlabeled data is explicitly exploited. To make full use of labeled data, we firstly employ a semi-supervised clustering method to discover the underlying data space structure of the whole dataset. Then we construct a local discrimination graph to model the discriminative information of labeled and unlabeled data according to the discovered intrinsic structure. Therefore, the data points that may be from different clusters, though similar on the manifold, are enforced far away from each other. Finally, the discrimination graph is incorporated into the MR framework. In particular, we utilize semi-supervised fuzzy c-means and Laplacian regularized Kernel minimum squared error for semi-supervised clustering and classification, respectively. Experimental results on several benchmark datasets and face recognition demonstrate the effectiveness of our proposed method.

  17. Artificial neural network classification using a minimal training set - Comparison to conventional supervised classification

    Science.gov (United States)

    Hepner, George F.; Logan, Thomas; Ritter, Niles; Bryant, Nevin

    1990-01-01

    Recent research has shown an artificial neural network (ANN) to be capable of pattern recognition and the classification of image data. This paper examines the potential for the application of neural network computing to satellite image processing. A second objective is to provide a preliminary comparison and ANN classification. An artificial neural network can be trained to do land-cover classification of satellite imagery using selected sites representative of each class in a manner similar to conventional supervised classification. One of the major problems associated with recognition and classifications of pattern from remotely sensed data is the time and cost of developing a set of training sites. This reseach compares the use of an ANN back propagation classification procedure with a conventional supervised maximum likelihood classification procedure using a minimal training set. When using a minimal training set, the neural network is able to provide a land-cover classification superior to the classification derived from the conventional classification procedure. This research is the foundation for developing application parameters for further prototyping of software and hardware implementations for artificial neural networks in satellite image and geographic information processing.

  18. Supervised Learning Applied to Air Traffic Trajectory Classification

    Science.gov (United States)

    Bosson, Christabelle; Nikoleris, Tasos

    2018-01-01

    Given the recent increase of interest in introducing new vehicle types and missions into the National Airspace System, a transition towards a more autonomous air traffic control system is required in order to enable and handle increased density and complexity. This paper presents an exploratory effort of the needed autonomous capabilities by exploring supervised learning techniques in the context of aircraft trajectories. In particular, it focuses on the application of machine learning algorithms and neural network models to a runway recognition trajectory-classification study. It investigates the applicability and effectiveness of various classifiers using datasets containing trajectory records for a month of air traffic. A feature importance and sensitivity analysis are conducted to challenge the chosen time-based datasets and the ten selected features. The study demonstrates that classification accuracy levels of 90% and above can be reached in less than 40 seconds of training for most machine learning classifiers when one track data point, described by the ten selected features at a particular time step, per trajectory is used as input. It also shows that neural network models can achieve similar accuracy levels but at higher training time costs.

  19. Generative Adversarial Networks-Based Semi-Supervised Learning for Hyperspectral Image Classification

    Directory of Open Access Journals (Sweden)

    Zhi He

    2017-10-01

    Full Text Available Classification of hyperspectral image (HSI is an important research topic in the remote sensing community. Significant efforts (e.g., deep learning have been concentrated on this task. However, it is still an open issue to classify the high-dimensional HSI with a limited number of training samples. In this paper, we propose a semi-supervised HSI classification method inspired by the generative adversarial networks (GANs. Unlike the supervised methods, the proposed HSI classification method is semi-supervised, which can make full use of the limited labeled samples as well as the sufficient unlabeled samples. Core ideas of the proposed method are twofold. First, the three-dimensional bilateral filter (3DBF is adopted to extract the spectral-spatial features by naturally treating the HSI as a volumetric dataset. The spatial information is integrated into the extracted features by 3DBF, which is propitious to the subsequent classification step. Second, GANs are trained on the spectral-spatial features for semi-supervised learning. A GAN contains two neural networks (i.e., generator and discriminator trained in opposition to one another. The semi-supervised learning is achieved by adding samples from the generator to the features and increasing the dimension of the classifier output. Experimental results obtained on three benchmark HSI datasets have confirmed the effectiveness of the proposed method , especially with a limited number of labeled samples.

  20. Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification

    Science.gov (United States)

    Iyigun, Cem; Ben-Israel, Adi

    Semi-supervised clustering is an attempt to reconcile clustering (unsupervised learning) and classification (supervised learning, using prior information on the data). These two modes of data analysis are combined in a parameterized model, the parameter θ ∈ [0, 1] is the weight attributed to the prior information, θ = 0 corresponding to clustering, and θ = 1 to classification. The results (cluster centers, classification rule) depend on the parameter θ, an insensitivity to θ indicates that the prior information is in agreement with the intrinsic cluster structure, and is otherwise redundant. This explains why some data sets (such as the Wisconsin breast cancer data, Merz and Murphy, UCI repository of machine learning databases, University of California, Irvine, CA) give good results for all reasonable classification methods. The uncertainty of classification is represented here by the geometric mean of the membership probabilities, shown to be an entropic distance related to the Kullback-Leibler divergence.

  1. A new tool for supervised classification of satellite images available on web servers: Google Maps as a case study

    Science.gov (United States)

    García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun

    2016-10-01

    This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.

  2. Time series classification using k-Nearest neighbours, Multilayer Perceptron and Learning Vector Quantization algorithms

    Directory of Open Access Journals (Sweden)

    Jiří Fejfar

    2012-01-01

    Full Text Available We are presenting results comparison of three artificial intelligence algorithms in a classification of time series derived from musical excerpts in this paper. Algorithms were chosen to represent different principles of classification – statistic approach, neural networks and competitive learning. The first algorithm is a classical k-Nearest neighbours algorithm, the second algorithm is Multilayer Perceptron (MPL, an example of artificial neural network and the third one is a Learning Vector Quantization (LVQ algorithm representing supervised counterpart to unsupervised Self Organizing Map (SOM.After our own former experiments with unlabelled data we moved forward to the data labels utilization, which generally led to a better accuracy of classification results. As we need huge data set of labelled time series (a priori knowledge of correct class which each time series instance belongs to, we used, with a good experience in former studies, musical excerpts as a source of real-world time series. We are using standard deviation of the sound signal as a descriptor of a musical excerpts volume level.We are describing principle of each algorithm as well as its implementation briefly, giving links for further research. Classification results of each algorithm are presented in a confusion matrix showing numbers of misclassifications and allowing to evaluate overall accuracy of the algorithm. Results are compared and particular misclassifications are discussed for each algorithm. Finally the best solution is chosen and further research goals are given.

  3. A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification.

    Science.gov (United States)

    Peikari, Mohammad; Salama, Sherine; Nofech-Mozes, Sharon; Martel, Anne L

    2018-05-08

    Completely labeled pathology datasets are often challenging and time-consuming to obtain. Semi-supervised learning (SSL) methods are able to learn from fewer labeled data points with the help of a large number of unlabeled data points. In this paper, we investigated the possibility of using clustering analysis to identify the underlying structure of the data space for SSL. A cluster-then-label method was proposed to identify high-density regions in the data space which were then used to help a supervised SVM in finding the decision boundary. We have compared our method with other supervised and semi-supervised state-of-the-art techniques using two different classification tasks applied to breast pathology datasets. We found that compared with other state-of-the-art supervised and semi-supervised methods, our SSL method is able to improve classification performance when a limited number of labeled data instances are made available. We also showed that it is important to examine the underlying distribution of the data space before applying SSL techniques to ensure semi-supervised learning assumptions are not violated by the data.

  4. Semi-supervised morphosyntactic classification of Old Icelandic.

    Science.gov (United States)

    Urban, Kryztof; Tangherlini, Timothy R; Vijūnas, Aurelijus; Broadwell, Peter M

    2014-01-01

    We present IceMorph, a semi-supervised morphosyntactic analyzer of Old Icelandic. In addition to machine-read corpora and dictionaries, it applies a small set of declension prototypes to map corpus words to dictionary entries. A web-based GUI allows expert users to modify and augment data through an online process. A machine learning module incorporates prototype data, edit-distance metrics, and expert feedback to continuously update part-of-speech and morphosyntactic classification. An advantage of the analyzer is its ability to achieve competitive classification accuracy with minimum training data.

  5. Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments

    Science.gov (United States)

    Han, Wenjing; Coutinho, Eduardo; Li, Haifeng; Schuller, Björn; Yu, Xiaojie; Zhu, Xuan

    2016-01-01

    Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances. PMID:27627768

  6. Semi-Supervised Active Learning for Sound Classification in Hybrid Learning Environments.

    Science.gov (United States)

    Han, Wenjing; Coutinho, Eduardo; Ruan, Huabin; Li, Haifeng; Schuller, Björn; Yu, Xiaojie; Zhu, Xuan

    2016-01-01

    Coping with scarcity of labeled data is a common problem in sound classification tasks. Approaches for classifying sounds are commonly based on supervised learning algorithms, which require labeled data which is often scarce and leads to models that do not generalize well. In this paper, we make an efficient combination of confidence-based Active Learning and Self-Training with the aim of minimizing the need for human annotation for sound classification model training. The proposed method pre-processes the instances that are ready for labeling by calculating their classifier confidence scores, and then delivers the candidates with lower scores to human annotators, and those with high scores are automatically labeled by the machine. We demonstrate the feasibility and efficacy of this method in two practical scenarios: pool-based and stream-based processing. Extensive experimental results indicate that our approach requires significantly less labeled instances to reach the same performance in both scenarios compared to Passive Learning, Active Learning and Self-Training. A reduction of 52.2% in human labeled instances is achieved in both of the pool-based and stream-based scenarios on a sound classification task considering 16,930 sound instances.

  7. SAW Classification Algorithm for Chinese Text Classification

    OpenAIRE

    Xiaoli Guo; Huiyu Sun; Tiehua Zhou; Ling Wang; Zhaoyang Qu; Jiannan Zang

    2015-01-01

    Considering the explosive growth of data, the increased amount of text data’s effect on the performance of text categorization forward the need for higher requirements, such that the existing classification method cannot be satisfied. Based on the study of existing text classification technology and semantics, this paper puts forward a kind of Chinese text classification oriented SAW (Structural Auxiliary Word) algorithm. The algorithm uses the special space effect of Chinese text where words...

  8. Results of Evolution Supervised by Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Lorentz JÄNTSCHI

    2010-09-01

    Full Text Available The efficiency of a genetic algorithm is frequently assessed using a series of operators of evolution like crossover operators, mutation operators or other dynamic parameters. The present paper aimed to review the main results of evolution supervised by genetic algorithms used to identify solutions to agricultural and horticultural hard problems and to discuss the results of using a genetic algorithms on structure-activity relationships in terms of behavior of evolution supervised by genetic algorithms. A genetic algorithm had been developed and implemented in order to identify the optimal solution in term of estimation power of a multiple linear regression approach for structure-activity relationships. Three survival and three selection strategies (proportional, deterministic and tournament were investigated in order to identify the best survival-selection strategy able to lead to the model with higher estimation power. The Molecular Descriptors Family for structure characterization of a sample of 206 polychlorinated biphenyls with measured octanol-water partition coefficients was used as case study. Evolution using different selection and survival strategies proved to create populations of genotypes living in the evolution space with different diversity and variability. Under a series of criteria of comparisons these populations proved to be grouped and the groups were showed to be statistically different one to each other. The conclusions about genetic algorithm evolution according to a number of criteria were also highlighted.

  9. Supervised Classification High-Resolution Remote-Sensing Image Based on Interval Type-2 Fuzzy Membership Function

    Directory of Open Access Journals (Sweden)

    Chunyan Wang

    2018-05-01

    Full Text Available Because of the degradation of classification accuracy that is caused by the uncertainty of pixel class and classification decisions of high-resolution remote-sensing images, we proposed a supervised classification method that is based on an interval type-2 fuzzy membership function for high-resolution remote-sensing images. We analyze the data features of a high-resolution remote-sensing image and construct a type-1 membership function model in a homogenous region by supervised sampling in order to characterize the uncertainty of the pixel class. On the basis of the fuzzy membership function model in the homogeneous region and in accordance with the 3σ criterion of normal distribution, we proposed a method for modeling three types of interval type-2 membership functions and analyze the different types of functions to improve the uncertainty of pixel class expressed by the type-1 fuzzy membership function and to enhance the accuracy of classification decision. According to the principle that importance will increase with a decrease in the distance between the original, upper, and lower fuzzy membership of the training data and the corresponding frequency value in the histogram, we use the weighted average sum of three types of fuzzy membership as the new fuzzy membership of the pixel to be classified and then integrated into the neighborhood pixel relations, constructing a classification decision model. We use the proposed method to classify real high-resolution remote-sensing images and synthetic images. Additionally, we qualitatively and quantitatively evaluate the test results. The results show that a higher classification accuracy can be achieved with the proposed algorithm.

  10. MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING

    Data.gov (United States)

    National Aeronautics and Space Administration — MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING MOHAMMAD SALIM AHMED, LATIFUR KHAN, NIKUNJ OZA, AND MANDAVA RAJESWARI Abstract....

  11. Unsupervised Classification Using Immune Algorithm

    OpenAIRE

    Al-Muallim, M. T.; El-Kouatly, R.

    2012-01-01

    Unsupervised classification algorithm based on clonal selection principle named Unsupervised Clonal Selection Classification (UCSC) is proposed in this paper. The new proposed algorithm is data driven and self-adaptive, it adjusts its parameters to the data to make the classification operation as fast as possible. The performance of UCSC is evaluated by comparing it with the well known K-means algorithm using several artificial and real-life data sets. The experiments show that the proposed U...

  12. An evaluation of unsupervised and supervised learning algorithms for clustering landscape types in the United States

    Science.gov (United States)

    Wendel, Jochen; Buttenfield, Barbara P.; Stanislawski, Larry V.

    2016-01-01

    Knowledge of landscape type can inform cartographic generalization of hydrographic features, because landscape characteristics provide an important geographic context that affects variation in channel geometry, flow pattern, and network configuration. Landscape types are characterized by expansive spatial gradients, lacking abrupt changes between adjacent classes; and as having a limited number of outliers that might confound classification. The US Geological Survey (USGS) is exploring methods to automate generalization of features in the National Hydrography Data set (NHD), to associate specific sequences of processing operations and parameters with specific landscape characteristics, thus obviating manual selection of a unique processing strategy for every NHD watershed unit. A chronology of methods to delineate physiographic regions for the United States is described, including a recent maximum likelihood classification based on seven input variables. This research compares unsupervised and supervised algorithms applied to these seven input variables, to evaluate and possibly refine the recent classification. Evaluation metrics for unsupervised methods include the Davies–Bouldin index, the Silhouette index, and the Dunn index as well as quantization and topographic error metrics. Cross validation and misclassification rate analysis are used to evaluate supervised classification methods. The paper reports the comparative analysis and its impact on the selection of landscape regions. The compared solutions show problems in areas of high landscape diversity. There is some indication that additional input variables, additional classes, or more sophisticated methods can refine the existing classification.

  13. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  14. Recursive automatic classification algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Bauman, E V; Dorofeyuk, A A

    1982-03-01

    A variational statement of the automatic classification problem is given. The dependence of the form of the optimal partition surface on the form of the classification objective functional is investigated. A recursive algorithm is proposed for maximising a functional of reasonably general form. The convergence problem is analysed in connection with the proposed algorithm. 8 references.

  15. Active semi-supervised learning method with hybrid deep belief networks.

    Science.gov (United States)

    Zhou, Shusen; Chen, Qingcai; Wang, Xiaolong

    2014-01-01

    In this paper, we develop a novel semi-supervised learning algorithm called active hybrid deep belief networks (AHD), to address the semi-supervised sentiment classification problem with deep learning. First, we construct the previous several hidden layers using restricted Boltzmann machines (RBM), which can reduce the dimension and abstract the information of the reviews quickly. Second, we construct the following hidden layers using convolutional restricted Boltzmann machines (CRBM), which can abstract the information of reviews effectively. Third, the constructed deep architecture is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Finally, active learning method is combined based on the proposed deep architecture. We did several experiments on five sentiment classification datasets, and show that AHD is competitive with previous semi-supervised learning algorithm. Experiments are also conducted to verify the effectiveness of our proposed method with different number of labeled reviews and unlabeled reviews respectively.

  16. Classification of remotely sensed images

    CSIR Research Space (South Africa)

    Dudeni, N

    2008-10-01

    Full Text Available For this research, the researchers examine various existing image classification algorithms with the aim of demonstrating how these algorithms can be applied to remote sensing images. These algorithms are broadly divided into supervised...

  17. Machine learning algorithms for mode-of-action classification in toxicity assessment.

    Science.gov (United States)

    Zhang, Yile; Wong, Yau Shu; Deng, Jian; Anton, Cristina; Gabos, Stephan; Zhang, Weiping; Huang, Dorothy Yu; Jin, Can

    2016-01-01

    Real Time Cell Analysis (RTCA) technology is used to monitor cellular changes continuously over the entire exposure period. Combining with different testing concentrations, the profiles have potential in probing the mode of action (MOA) of the testing substances. In this paper, we present machine learning approaches for MOA assessment. Computational tools based on artificial neural network (ANN) and support vector machine (SVM) are developed to analyze the time-concentration response curves (TCRCs) of human cell lines responding to tested chemicals. The techniques are capable of learning data from given TCRCs with known MOA information and then making MOA classification for the unknown toxicity. A novel data processing step based on wavelet transform is introduced to extract important features from the original TCRC data. From the dose response curves, time interval leading to higher classification success rate can be selected as input to enhance the performance of the machine learning algorithm. This is particularly helpful when handling cases with limited and imbalanced data. The validation of the proposed method is demonstrated by the supervised learning algorithm applied to the exposure data of HepG2 cell line to 63 chemicals with 11 concentrations in each test case. Classification success rate in the range of 85 to 95 % are obtained using SVM for MOA classification with two clusters to cases up to four clusters. Wavelet transform is capable of capturing important features of TCRCs for MOA classification. The proposed SVM scheme incorporated with wavelet transform has a great potential for large scale MOA classification and high-through output chemical screening.

  18. Phenotype classification of zebrafish embryos by supervised learning.

    Directory of Open Access Journals (Sweden)

    Nathalie Jeanray

    Full Text Available Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification.

  19. Graph-based semi-supervised learning with genomic data integration using condition-responsive genes applied to phenotype classification.

    Science.gov (United States)

    Doostparast Torshizi, Abolfazl; Petzold, Linda R

    2018-01-01

    Data integration methods that combine data from different molecular levels such as genome, epigenome, transcriptome, etc., have received a great deal of interest in the past few years. It has been demonstrated that the synergistic effects of different biological data types can boost learning capabilities and lead to a better understanding of the underlying interactions among molecular levels. In this paper we present a graph-based semi-supervised classification algorithm that incorporates latent biological knowledge in the form of biological pathways with gene expression and DNA methylation data. The process of graph construction from biological pathways is based on detecting condition-responsive genes, where 3 sets of genes are finally extracted: all condition responsive genes, high-frequency condition-responsive genes, and P-value-filtered genes. The proposed approach is applied to ovarian cancer data downloaded from the Human Genome Atlas. Extensive numerical experiments demonstrate superior performance of the proposed approach compared to other state-of-the-art algorithms, including the latest graph-based classification techniques. Simulation results demonstrate that integrating various data types enhances classification performance and leads to a better understanding of interrelations between diverse omics data types. The proposed approach outperforms many of the state-of-the-art data integration algorithms. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  20. Global Optimization Ensemble Model for Classification Methods

    Science.gov (United States)

    Anwar, Hina; Qamar, Usman; Muzaffar Qureshi, Abdul Wahab

    2014-01-01

    Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC) that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity. PMID:24883382

  1. Global Optimization Ensemble Model for Classification Methods

    Directory of Open Access Journals (Sweden)

    Hina Anwar

    2014-01-01

    Full Text Available Supervised learning is the process of data mining for deducing rules from training datasets. A broad array of supervised learning algorithms exists, every one of them with its own advantages and drawbacks. There are some basic issues that affect the accuracy of classifier while solving a supervised learning problem, like bias-variance tradeoff, dimensionality of input space, and noise in the input data space. All these problems affect the accuracy of classifier and are the reason that there is no global optimal method for classification. There is not any generalized improvement method that can increase the accuracy of any classifier while addressing all the problems stated above. This paper proposes a global optimization ensemble model for classification methods (GMC that can improve the overall accuracy for supervised learning problems. The experimental results on various public datasets showed that the proposed model improved the accuracy of the classification models from 1% to 30% depending upon the algorithm complexity.

  2. CLASSIFICATION ALGORITHMS FOR BIG DATA ANALYSIS, A MAP REDUCE APPROACH

    Directory of Open Access Journals (Sweden)

    V. A. Ayma

    2015-03-01

    Full Text Available Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP, which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA’s machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM. The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.

  3. Supervised Classification Performance of Multispectral Images

    OpenAIRE

    Perumal, K.; Bhaskaran, R.

    2010-01-01

    Nowadays government and private agencies use remote sensing imagery for a wide range of applications from military applications to farm development. The images may be a panchromatic, multispectral, hyperspectral or even ultraspectral of terra bytes. Remote sensing image classification is one amongst the most significant application worlds for remote sensing. A few number of image classification algorithms have proved good precision in classifying remote sensing data. But, of late, due to the ...

  4. Classification of autism spectrum disorder using supervised learning of brain connectivity measures extracted from synchrostates

    Science.gov (United States)

    Jamal, Wasifa; Das, Saptarshi; Oprescu, Ioana-Anastasia; Maharatna, Koushik; Apicella, Fabio; Sicca, Federico

    2014-08-01

    Objective. The paper investigates the presence of autism using the functional brain connectivity measures derived from electro-encephalogram (EEG) of children during face perception tasks. Approach. Phase synchronized patterns from 128-channel EEG signals are obtained for typical children and children with autism spectrum disorder (ASD). The phase synchronized states or synchrostates temporally switch amongst themselves as an underlying process for the completion of a particular cognitive task. We used 12 subjects in each group (ASD and typical) for analyzing their EEG while processing fearful, happy and neutral faces. The minimal and maximally occurring synchrostates for each subject are chosen for extraction of brain connectivity features, which are used for classification between these two groups of subjects. Among different supervised learning techniques, we here explored the discriminant analysis and support vector machine both with polynomial kernels for the classification task. Main results. The leave one out cross-validation of the classification algorithm gives 94.7% accuracy as the best performance with corresponding sensitivity and specificity values as 85.7% and 100% respectively. Significance. The proposed method gives high classification accuracies and outperforms other contemporary research results. The effectiveness of the proposed method for classification of autistic and typical children suggests the possibility of using it on a larger population to validate it for clinical practice.

  5. Exploiting unsupervised and supervised classification for segmentation of the pathological lung in CT

    International Nuclear Information System (INIS)

    Korfiatis, P; Costaridou, L; Kalogeropoulou, C; Petsas, T; Daoussis, D; Adonopoulos, A

    2009-01-01

    Delineation of lung fields in presence of diffuse lung diseases (DLPDs), such as interstitial pneumonias (IP), challenges segmentation algorithms. To deal with IP patterns affecting the lung border an automated image texture classification scheme is proposed. The proposed segmentation scheme is based on supervised texture classification between lung tissue (normal and abnormal) and surrounding tissue (pleura and thoracic wall) in the lung border region. This region is coarsely defined around an initial estimate of lung border, provided by means of Markov Radom Field modeling and morphological operations. Subsequently, a support vector machine classifier was trained to distinguish between the above two classes of tissue, using textural feature of gray scale and wavelet domains. 17 patients diagnosed with IP, secondary to connective tissue diseases were examined. Segmentation performance in terms of overlap was 0.924±0.021, and for shape differentiation mean, rms and maximum distance were 1.663±0.816, 2.334±1.574 and 8.0515±6.549 mm, respectively. An accurate, automated scheme is proposed for segmenting abnormal lung fields in HRC affected by IP

  6. Exploiting unsupervised and supervised classification for segmentation of the pathological lung in CT

    Science.gov (United States)

    Korfiatis, P.; Kalogeropoulou, C.; Daoussis, D.; Petsas, T.; Adonopoulos, A.; Costaridou, L.

    2009-07-01

    Delineation of lung fields in presence of diffuse lung diseases (DLPDs), such as interstitial pneumonias (IP), challenges segmentation algorithms. To deal with IP patterns affecting the lung border an automated image texture classification scheme is proposed. The proposed segmentation scheme is based on supervised texture classification between lung tissue (normal and abnormal) and surrounding tissue (pleura and thoracic wall) in the lung border region. This region is coarsely defined around an initial estimate of lung border, provided by means of Markov Radom Field modeling and morphological operations. Subsequently, a support vector machine classifier was trained to distinguish between the above two classes of tissue, using textural feature of gray scale and wavelet domains. 17 patients diagnosed with IP, secondary to connective tissue diseases were examined. Segmentation performance in terms of overlap was 0.924±0.021, and for shape differentiation mean, rms and maximum distance were 1.663±0.816, 2.334±1.574 and 8.0515±6.549 mm, respectively. An accurate, automated scheme is proposed for segmenting abnormal lung fields in HRC affected by IP

  7. Multi-Modal Curriculum Learning for Semi-Supervised Image Classification.

    Science.gov (United States)

    Gong, Chen; Tao, Dacheng; Maybank, Stephen J; Liu, Wei; Kang, Guoliang; Yang, Jie

    2016-07-01

    Semi-supervised image classification aims to classify a large quantity of unlabeled images by typically harnessing scarce labeled images. Existing semi-supervised methods often suffer from inadequate classification accuracy when encountering difficult yet critical images, such as outliers, because they treat all unlabeled images equally and conduct classifications in an imperfectly ordered sequence. In this paper, we employ the curriculum learning methodology by investigating the difficulty of classifying every unlabeled image. The reliability and the discriminability of these unlabeled images are particularly investigated for evaluating their difficulty. As a result, an optimized image sequence is generated during the iterative propagations, and the unlabeled images are logically classified from simple to difficult. Furthermore, since images are usually characterized by multiple visual feature descriptors, we associate each kind of features with a teacher, and design a multi-modal curriculum learning (MMCL) strategy to integrate the information from different feature modalities. In each propagation, each teacher analyzes the difficulties of the currently unlabeled images from its own modality viewpoint. A consensus is subsequently reached among all the teachers, determining the currently simplest images (i.e., a curriculum), which are to be reliably classified by the multi-modal learner. This well-organized propagation process leveraging multiple teachers and one learner enables our MMCL to outperform five state-of-the-art methods on eight popular image data sets.

  8. The Costs of Supervised Classification: The Effect of Learning Task on Conceptual Flexibility

    Science.gov (United States)

    Hoffman, Aaron B.; Rehder, Bob

    2010-01-01

    Research has shown that learning a concept via standard supervised classification leads to a focus on diagnostic features, whereas learning by inferring missing features promotes the acquisition of within-category information. Accordingly, we predicted that classification learning would produce a deficit in people's ability to draw "novel…

  9. Online co-regularized algorithms

    NARCIS (Netherlands)

    Ruijter, T. de; Tsivtsivadze, E.; Heskes, T.

    2012-01-01

    We propose an online co-regularized learning algorithm for classification and regression tasks. We demonstrate that by sequentially co-regularizing prediction functions on unlabeled data points, our algorithm provides improved performance in comparison to supervised methods on several UCI benchmarks

  10. Multi-Label Classification by Semi-Supervised Singular Value Decomposition.

    Science.gov (United States)

    Jing, Liping; Shen, Chenyang; Yang, Liu; Yu, Jian; Ng, Michael K

    2017-10-01

    Multi-label problems arise in various domains, including automatic multimedia data categorization, and have generated significant interest in computer vision and machine learning community. However, existing methods do not adequately address two key challenges: exploiting correlations between labels and making up for the lack of labelled data or even missing labelled data. In this paper, we proposed to use a semi-supervised singular value decomposition (SVD) to handle these two challenges. The proposed model takes advantage of the nuclear norm regularization on the SVD to effectively capture the label correlations. Meanwhile, it introduces manifold regularization on mapping to capture the intrinsic structure among data, which provides a good way to reduce the required labelled data with improving the classification performance. Furthermore, we designed an efficient algorithm to solve the proposed model based on the alternating direction method of multipliers, and thus, it can efficiently deal with large-scale data sets. Experimental results for synthetic and real-world multimedia data sets demonstrate that the proposed method can exploit the label correlations and obtain promising and better label prediction results than the state-of-the-art methods.

  11. Automated segmentation of geographic atrophy in fundus autofluorescence images using supervised pixel classification.

    Science.gov (United States)

    Hu, Zhihong; Medioni, Gerard G; Hernandez, Matthias; Sadda, Srinivas R

    2015-01-01

    Geographic atrophy (GA) is a manifestation of the advanced or late stage of age-related macular degeneration (AMD). AMD is the leading cause of blindness in people over the age of 65 in the western world. The purpose of this study is to develop a fully automated supervised pixel classification approach for segmenting GA, including uni- and multifocal patches in fundus autofluorescene (FAF) images. The image features include region-wise intensity measures, gray-level co-occurrence matrix measures, and Gaussian filter banks. A [Formula: see text]-nearest-neighbor pixel classifier is applied to obtain a GA probability map, representing the likelihood that the image pixel belongs to GA. Sixteen randomly chosen FAF images were obtained from 16 subjects with GA. The algorithm-defined GA regions are compared with manual delineation performed by a certified image reading center grader. Eight-fold cross-validation is applied to evaluate the algorithm performance. The mean overlap ratio (OR), area correlation (Pearson's [Formula: see text]), accuracy (ACC), true positive rate (TPR), specificity (SPC), positive predictive value (PPV), and false discovery rate (FDR) between the algorithm- and manually defined GA regions are [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text], respectively.

  12. A Novel Approach to Developing a Supervised Spatial Decision Support System for Image Classification: A Study of Paddy Rice Investigation

    Directory of Open Access Journals (Sweden)

    Shih-Hsun Chang

    2014-01-01

    Full Text Available Paddy rice area estimation via remote sensing techniques has been well established in recent years. Texture information and vegetation indicators are widely used to improve the classification accuracy of satellite images. Accordingly, this study employs texture information and vegetation indicators as ancillary information for classifying paddy rice through remote sensing images. In the first stage, the images are attained using a remote sensing technique and ancillary information is employed to increase the accuracy of classification. In the second stage, we decide to construct an efficient supervised classifier, which is used to evaluate the ancillary information. In the third stage, linear discriminant analysis (LDA is introduced. LDA is a well-known method for classifying images to various categories. Also, the particle swarm optimization (PSO algorithm is employed to optimize the LDA classification outcomes and increase classification performance. In the fourth stage, we discuss the strategy of selecting different window sizes and analyze particle numbers and iteration numbers with corresponding accuracy. Accordingly, a rational strategy for the combination of ancillary information is introduced. Afterwards, the PSO algorithm improves the accuracy rate from 82.26% to 89.31%. The improved accuracy results in a much lower salt-and-pepper effect in the thematic map.

  13. Supervised and Unsupervised Classification for Pattern Recognition Purposes

    Directory of Open Access Journals (Sweden)

    Catalina COCIANU

    2006-01-01

    Full Text Available A cluster analysis task has to identify the grouping trends of data, to decide on the sound clusters as well as to validate somehow the resulted structure. The identification of the grouping tendency existing in a data collection assumes the selection of a framework stated in terms of a mathematical model allowing to express the similarity degree between couples of particular objects, quasi-metrics expressing the similarity between an object an a cluster and between clusters, respectively. In supervised classification, we are provided with a collection of preclassified patterns, and the problem is to label a newly encountered pattern. Typically, the given training patterns are used to learn the descriptions of classes which in turn are used to label a new pattern. The final section of the paper presents a new methodology for supervised learning based on PCA. The classes are represented in the measurement/feature space by a continuous repartitions

  14. Supervised machine learning and active learning in classification of radiology reports.

    Science.gov (United States)

    Nguyen, Dung H M; Patrick, Jon D

    2014-01-01

    This paper presents an automated system for classifying the results of imaging examinations (CT, MRI, positron emission tomography) into reportable and non-reportable cancer cases. This system is part of an industrial-strength processing pipeline built to extract content from radiology reports for use in the Victorian Cancer Registry. In addition to traditional supervised learning methods such as conditional random fields and support vector machines, active learning (AL) approaches were investigated to optimize training production and further improve classification performance. The project involved two pilot sites in Victoria, Australia (Lake Imaging (Ballarat) and Peter MacCallum Cancer Centre (Melbourne)) and, in collaboration with the NSW Central Registry, one pilot site at Westmead Hospital (Sydney). The reportability classifier performance achieved 98.25% sensitivity and 96.14% specificity on the cancer registry's held-out test set. Up to 92% of training data needed for supervised machine learning can be saved by AL. AL is a promising method for optimizing the supervised training production used in classification of radiology reports. When an AL strategy is applied during the data selection process, the cost of manual classification can be reduced significantly. The most important practical application of the reportability classifier is that it can dramatically reduce human effort in identifying relevant reports from the large imaging pool for further investigation of cancer. The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports to support cancer registries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  15. Semi-Supervised Learning for Classification of Protein Sequence Data

    Directory of Open Access Journals (Sweden)

    Brian R. King

    2008-01-01

    Full Text Available Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.

  16. Sow-activity classification from acceleration patterns

    DEFF Research Database (Denmark)

    Escalante, Hugo Jair; Rodriguez, Sara V.; Cordero, Jorge

    2013-01-01

    sow-activity classification can be approached with standard machine learning methods for pattern classification. Individual predictions for elements of times series of arbitrary length are combined to classify it as a whole. An extensive comparison of representative learning algorithms, including......This paper describes a supervised learning approach to sow-activity classification from accelerometer measurements. In the proposed methodology, pairs of accelerometer measurements and activity types are considered as labeled instances of a usual supervised classification task. Under this scenario...... neural networks, support vector machines, and ensemble methods, is presented. Experimental results are reported using a data set for sow-activity classification collected in a real production herd. The data set, which has been widely used in related works, includes measurements from active (Feeding...

  17. Feasibility study of stain-free classification of cell apoptosis based on diffraction imaging flow cytometry and supervised machine learning techniques.

    Science.gov (United States)

    Feng, Jingwen; Feng, Tong; Yang, Chengwen; Wang, Wei; Sa, Yu; Feng, Yuanming

    2018-06-01

    This study was to explore the feasibility of prediction and classification of cells in different stages of apoptosis with a stain-free method based on diffraction images and supervised machine learning. Apoptosis was induced in human chronic myelogenous leukemia K562 cells by cis-platinum (DDP). A newly developed technique of polarization diffraction imaging flow cytometry (p-DIFC) was performed to acquire diffraction images of the cells in three different statuses (viable, early apoptotic and late apoptotic/necrotic) after cell separation through fluorescence activated cell sorting with Annexin V-PE and SYTOX® Green double staining. The texture features of the diffraction images were extracted with in-house software based on the Gray-level co-occurrence matrix algorithm to generate datasets for cell classification with supervised machine learning method. Therefore, this new method has been verified in hydrogen peroxide induced apoptosis model of HL-60. Results show that accuracy of higher than 90% was achieved respectively in independent test datasets from each cell type based on logistic regression with ridge estimators, which indicated that p-DIFC system has a great potential in predicting and classifying cells in different stages of apoptosis.

  18. Supervised Gaussian mixture model based remote sensing image ...

    African Journals Online (AJOL)

    Using the supervised classification technique, both simulated and empirical satellite remote sensing data are used to train and test the Gaussian mixture model algorithm. For the purpose of validating the experiment, the resulting classified satellite image is compared with the ground truth data. For the simulated modelling, ...

  19. Non supervised classification of vegetable covers on digital images of remote sensors: Landsat - ETM+

    International Nuclear Information System (INIS)

    Arango Gutierrez, Mauricio; Branch Bedoya, John William; Botero Fernandez, Veronica

    2005-01-01

    The plant species diversity in Colombia and the lack of inventory of them suggests the need for a process that facilitates the work of investigators in these disciplines. Remote satellite sensors such as landsat ETM+ and non-supervised artificial intelligence techniques, such as self-organizing maps - SOM, could provide viable alternatives for advancing in the rapid obtaining of information related to zones with different vegetative covers in the national geography. The zone proposed for the study case was classified in a supervised form by the method of maximum likelihood by another investigation in forest sciences and eight types of vegetative covers were discriminated. This information served as a base line to evaluate the performance of the non-supervised sort keys isodata and SOM. However, the information that the images provided had to first be purified according to the criteria of use and data quality, so that adequate information for these non-supervised methods were used. For this, several concepts were used; such as, image statistics, spectral behavior of the vegetative communities, sensor characteristics and the average divergence that allowed to define the best bands and their combinations. Principal component analysis was applied to these to reduce to the number of data while conserving a large percentage of the information. The non-supervised techniques were applied to these purified data, modifying some parameters that could yield a better convergence of the methods. The results obtained were compared with the supervised classification via confusion matrices and it was concluded that there was not a good convergence of non-supervised classification methods with this process for the case of vegetative covers

  20. [Quantitative classification in catering trade and countermeasures of supervision and management in Hunan Province].

    Science.gov (United States)

    Liu, Xiulan; Chen, Lizhang; He, Xiang

    2012-02-01

    To analyze the status quo of quantitative classification in Hunan Province catering industry, and to discuss the countermeasures in-depth. According to relevant laws and regulations, and after referring to Daily supervision and quantitative scoring sheet and consulting experts, a checklist of key supervision indicators was made. The implementation of quantitative classification in 10 cities in Hunan Province was studied, and the status quo was analyzed. All the 390 catering units implemented quantitative classified management. The larger the catering enterprise, the higher level of quantitative classification. In addition to cafeterias, the smaller the catering units, the higher point of deduction, and snack bars and beverage stores were the highest. For those quantified and classified as C and D, the point of deduction was higher in the procurement and storage of raw materials, operation processing and other aspects. The quantitative classification of Hunan Province has relatively wide coverage. There are hidden risks in food security in small catering units, snack bars, and beverage stores. The food hygienic condition of Hunan Province needs to be improved.

  1. Active relearning for robust supervised classification of pulmonary emphysema

    Science.gov (United States)

    Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.

    2012-03-01

    Radiologists are adept at recognizing the appearance of lung parenchymal abnormalities in CT scans. However, the inconsistent differential diagnosis, due to subjective aggregation, mandates supervised classification. Towards optimizing Emphysema classification, we introduce a physician-in-the-loop feedback approach in order to minimize uncertainty in the selected training samples. Using multi-view inductive learning with the training samples, an ensemble of Support Vector Machine (SVM) models, each based on a specific pair-wise dissimilarity metric, was constructed in less than six seconds. In the active relearning phase, the ensemble-expert label conflicts were resolved by an expert. This just-in-time feedback with unoptimized SVMs yielded 15% increase in classification accuracy and 25% reduction in the number of support vectors. The generality of relearning was assessed in the optimized parameter space of six different classifiers across seven dissimilarity metrics. The resultant average accuracy improved to 21%. The co-operative feedback method proposed here could enhance both diagnostic and staging throughput efficiency in chest radiology practice.

  2. Diagnostic information system dynamics in the evaluation of machine learning algorithms for the supervision of energy efficiency of district heating-supplied buildings

    International Nuclear Information System (INIS)

    Kiluk, Sebastian

    2017-01-01

    Highlights: • Energy efficiency classification sustainability benefits from knowledge prediction. • Diagnostic classification can be validated with its dynamics and current data. • Diagnostic classification dynamics provides novelty extraction for knowledge update. • Data mining comparison can be performed with knowledge dynamics and uncertainty. • Diagnostic information refinement benefits form comparing classifiers dynamics. - Abstract: Modern ways of exploring the diagnostic knowledge provided by data mining and machine learning raise some concern about the ways of evaluating the quality of output knowledge, usually represented by information systems. Especially in district heating, the stationarity of efficiency models, and thus the relevance of diagnostic classification system, cannot be ensured due to the impact of social, economic or technological changes, which are hard to identify or predict. Therefore, data mining and machine learning have become an attractive strategy for automatically and continuously absorbing such dynamics. This paper presents a new method of evaluation and comparison of diagnostic information systems gathered algorithmically in district heating efficiency supervision based on exploring the evolution of information system and analyzing its dynamic features. The process of data mining and knowledge discovery was applied to the data acquired from district heating substations’ energy meters to provide the automated discovery of diagnostic knowledge base necessary for the efficiency supervision of district heating-supplied buildings. The implemented algorithm consists of several steps of processing the billing data, including preparation, segmentation, aggregation and knowledge discovery stage, where classes of abstract models representing energy efficiency constitute an information system representing diagnostic knowledge about the energy efficiency of buildings favorably operating under similar climate conditions and

  3. Fault Diagnosis of Supervision and Homogenization Distance Based on Local Linear Embedding Algorithm

    Directory of Open Access Journals (Sweden)

    Guangbin Wang

    2015-01-01

    Full Text Available In view of the problems of uneven distribution of reality fault samples and dimension reduction effect of locally linear embedding (LLE algorithm which is easily affected by neighboring points, an improved local linear embedding algorithm of homogenization distance (HLLE is developed. The method makes the overall distribution of sample points tend to be homogenization and reduces the influence of neighboring points using homogenization distance instead of the traditional Euclidean distance. It is helpful to choose effective neighboring points to construct weight matrix for dimension reduction. Because the fault recognition performance improvement of HLLE is limited and unstable, the paper further proposes a new local linear embedding algorithm of supervision and homogenization distance (SHLLE by adding the supervised learning mechanism. On the basis of homogenization distance, supervised learning increases the category information of sample points so that the same category of sample points will be gathered and the heterogeneous category of sample points will be scattered. It effectively improves the performance of fault diagnosis and maintains stability at the same time. A comparison of the methods mentioned above was made by simulation experiment with rotor system fault diagnosis, and the results show that SHLLE algorithm has superior fault recognition performance.

  4. Supervised learning classification models for prediction of plant virus encoded RNA silencing suppressors.

    Directory of Open Access Journals (Sweden)

    Zeenia Jagga

    Full Text Available Viral encoded RNA silencing suppressor proteins interfere with the host RNA silencing machinery, facilitating viral infection by evading host immunity. In plant hosts, the viral proteins have several basic science implications and biotechnology applications. However in silico identification of these proteins is limited by their high sequence diversity. In this study we developed supervised learning based classification models for plant viral RNA silencing suppressor proteins in plant viruses. We developed four classifiers based on supervised learning algorithms: J48, Random Forest, LibSVM and Naïve Bayes algorithms, with enriched model learning by correlation based feature selection. Structural and physicochemical features calculated for experimentally verified primary protein sequences were used to train the classifiers. The training features include amino acid composition; auto correlation coefficients; composition, transition, and distribution of various physicochemical properties; and pseudo amino acid composition. Performance analysis of predictive models based on 10 fold cross-validation and independent data testing revealed that the Random Forest based model was the best and achieved 86.11% overall accuracy and 86.22% balanced accuracy with a remarkably high area under the Receivers Operating Characteristic curve of 0.95 to predict viral RNA silencing suppressor proteins. The prediction models for plant viral RNA silencing suppressors can potentially aid identification of novel viral RNA silencing suppressors, which will provide valuable insights into the mechanism of RNA silencing and could be further explored as potential targets for designing novel antiviral therapeutics. Also, the key subset of identified optimal features may help in determining compositional patterns in the viral proteins which are important determinants for RNA silencing suppressor activities. The best prediction model developed in the study is available as a

  5. Genetic classification of populations using supervised learning.

    Directory of Open Access Journals (Sweden)

    Michael Bridges

    2011-05-01

    Full Text Available There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories. This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines to the classification of three populations (two from Scotland and one from Bulgaria. The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.

  6. Genetic classification of populations using supervised learning.

    LENUS (Irish Health Repository)

    Bridges, Michael

    2011-01-01

    There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.

  7. Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods

    Science.gov (United States)

    Ma, Xiaoke; Wang, Bingbo; Yu, Liang

    2018-01-01

    Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.

  8. Semi-supervised prediction of gene regulatory networks using machine learning algorithms.

    Science.gov (United States)

    Patel, Nihir; Wang, Jason T L

    2015-10-01

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

  9. Supervised Filter Learning for Representation Based Face Recognition.

    Directory of Open Access Journals (Sweden)

    Chao Bi

    Full Text Available Representation based classification methods, such as Sparse Representation Classification (SRC and Linear Regression Classification (LRC have been developed for face recognition problem successfully. However, most of these methods use the original face images without any preprocessing for recognition. Thus, their performances may be affected by some problematic factors (such as illumination and expression variances in the face images. In order to overcome this limitation, a novel supervised filter learning algorithm is proposed for representation based face recognition in this paper. The underlying idea of our algorithm is to learn a filter so that the within-class representation residuals of the faces' Local Binary Pattern (LBP features are minimized and the between-class representation residuals of the faces' LBP features are maximized. Therefore, the LBP features of filtered face images are more discriminative for representation based classifiers. Furthermore, we also extend our algorithm for heterogeneous face recognition problem. Extensive experiments are carried out on five databases and the experimental results verify the efficacy of the proposed algorithm.

  10. An Automated Algorithm to Screen Massive Training Samples for a Global Impervious Surface Classification

    Science.gov (United States)

    Tan, Bin; Brown de Colstoun, Eric; Wolfe, Robert E.; Tilton, James C.; Huang, Chengquan; Smith, Sarah E.

    2012-01-01

    An algorithm is developed to automatically screen the outliers from massive training samples for Global Land Survey - Imperviousness Mapping Project (GLS-IMP). GLS-IMP is to produce a global 30 m spatial resolution impervious cover data set for years 2000 and 2010 based on the Landsat Global Land Survey (GLS) data set. This unprecedented high resolution impervious cover data set is not only significant to the urbanization studies but also desired by the global carbon, hydrology, and energy balance researches. A supervised classification method, regression tree, is applied in this project. A set of accurate training samples is the key to the supervised classifications. Here we developed the global scale training samples from 1 m or so resolution fine resolution satellite data (Quickbird and Worldview2), and then aggregate the fine resolution impervious cover map to 30 m resolution. In order to improve the classification accuracy, the training samples should be screened before used to train the regression tree. It is impossible to manually screen 30 m resolution training samples collected globally. For example, in Europe only, there are 174 training sites. The size of the sites ranges from 4.5 km by 4.5 km to 8.1 km by 3.6 km. The amount training samples are over six millions. Therefore, we develop this automated statistic based algorithm to screen the training samples in two levels: site and scene level. At the site level, all the training samples are divided to 10 groups according to the percentage of the impervious surface within a sample pixel. The samples following in each 10% forms one group. For each group, both univariate and multivariate outliers are detected and removed. Then the screen process escalates to the scene level. A similar screen process but with a looser threshold is applied on the scene level considering the possible variance due to the site difference. We do not perform the screen process across the scenes because the scenes might vary due to

  11. An immune-inspired semi-supervised algorithm for breast cancer diagnosis.

    Science.gov (United States)

    Peng, Lingxi; Chen, Wenbin; Zhou, Wubai; Li, Fufang; Yang, Jin; Zhang, Jiandong

    2016-10-01

    Breast cancer is the most frequently and world widely diagnosed life-threatening cancer, which is the leading cause of cancer death among women. Early accurate diagnosis can be a big plus in treating breast cancer. Researchers have approached this problem using various data mining and machine learning techniques such as support vector machine, artificial neural network, etc. The computer immunology is also an intelligent method inspired by biological immune system, which has been successfully applied in pattern recognition, combination optimization, machine learning, etc. However, most of these diagnosis methods belong to a supervised diagnosis method. It is very expensive to obtain labeled data in biology and medicine. In this paper, we seamlessly integrate the state-of-the-art research on life science with artificial intelligence, and propose a semi-supervised learning algorithm to reduce the need for labeled data. We use two well-known benchmark breast cancer datasets in our study, which are acquired from the UCI machine learning repository. Extensive experiments are conducted and evaluated on those two datasets. Our experimental results demonstrate the effectiveness and efficiency of our proposed algorithm, which proves that our algorithm is a promising automatic diagnosis method for breast cancer. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  12. Toward Determination of Venous Thrombosis Ages by Using Fuzzy Logic and Supervised Bayes Classification

    National Research Council Canada - National Science Library

    Lim, P

    2001-01-01

    .... Thus, the proposed learning base is constructed in a 3-tuple: observation, label, membership value in term of fuzzy logic for each class and not a 2-tuple as in the usual supervised Bayes classification application...

  13. Supervised Self-Organizing Classification of Superresolution ISAR Images: An Anechoic Chamber Experiment

    Directory of Open Access Journals (Sweden)

    Radoi Emanuel

    2006-01-01

    Full Text Available The problem of the automatic classification of superresolution ISAR images is addressed in the paper. We describe an anechoic chamber experiment involving ten-scale-reduced aircraft models. The radar images of these targets are reconstructed using MUSIC-2D (multiple signal classification method coupled with two additional processing steps: phase unwrapping and symmetry enhancement. A feature vector is then proposed including Fourier descriptors and moment invariants, which are calculated from the target shape and the scattering center distribution extracted from each reconstructed image. The classification is finally performed by a new self-organizing neural network called SART (supervised ART, which is compared to two standard classifiers, MLP (multilayer perceptron and fuzzy KNN ( nearest neighbors. While the classification accuracy is similar, SART is shown to outperform the two other classifiers in terms of training speed and classification speed, especially for large databases. It is also easier to use since it does not require any input parameter related to its structure.

  14. Manifold regularized multitask learning for semi-supervised multilabel image classification.

    Science.gov (United States)

    Luo, Yong; Tao, Dacheng; Geng, Bo; Xu, Chao; Maybank, Stephen J

    2013-02-01

    It is a significant challenge to classify images with multiple labels by using only a small number of labeled samples. One option is to learn a binary classifier for each label and use manifold regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model complexity because different tasks limit one another's search volume, and the manifold regularization ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38 classes, by comparing MRMTL with popular image classification algorithms. The results suggest that MRMTL is effective for image classification.

  15. SSC-EKE: Semi-Supervised Classification with Extensive Knowledge Exploitation.

    Science.gov (United States)

    Qian, Pengjiang; Xi, Chen; Xu, Min; Jiang, Yizhang; Su, Kuan-Hao; Wang, Shitong; Muzic, Raymond F

    2018-01-01

    We introduce a new, semi-supervised classification method that extensively exploits knowledge. The method has three steps. First, the manifold regularization mechanism, adapted from the Laplacian support vector machine (LapSVM), is adopted to mine the manifold structure embedded in all training data, especially in numerous label-unknown data. Meanwhile, by converting the labels into pairwise constraints, the pairwise constraint regularization formula (PCRF) is designed to compensate for the few but valuable labelled data. Second, by further combining the PCRF with the manifold regularization, the precise manifold and pairwise constraint jointly regularized formula (MPCJRF) is achieved. Third, by incorporating the MPCJRF into the framework of the conventional SVM, our approach, referred to as semi-supervised classification with extensive knowledge exploitation (SSC-EKE), is developed. The significance of our research is fourfold: 1) The MPCJRF is an underlying adjustment, with respect to the pairwise constraints, to the graph Laplacian enlisted for approximating the potential data manifold. This type of adjustment plays the correction role, as an unbiased estimation of the data manifold is difficult to obtain, whereas the pairwise constraints, converted from the given labels, have an overall high confidence level. 2) By transforming the values of the two terms in the MPCJRF such that they have the same range, with a trade-off factor varying within the invariant interval [0, 1), the appropriate impact of the pairwise constraints to the graph Laplacian can be self-adaptively determined. 3) The implication regarding extensive knowledge exploitation is embodied in SSC-EKE. That is, the labelled examples are used not only to control the empirical risk but also to constitute the MPCJRF. Moreover, all data, both labelled and unlabelled, are recruited for the model smoothness and manifold regularization. 4) The complete framework of SSC-EKE organically incorporates multiple

  16. Gradient Evolution-based Support Vector Machine Algorithm for Classification

    Science.gov (United States)

    Zulvia, Ferani E.; Kuo, R. J.

    2018-03-01

    This paper proposes a classification algorithm based on a support vector machine (SVM) and gradient evolution (GE) algorithms. SVM algorithm has been widely used in classification. However, its result is significantly influenced by the parameters. Therefore, this paper aims to propose an improvement of SVM algorithm which can find the best SVMs’ parameters automatically. The proposed algorithm employs a GE algorithm to automatically determine the SVMs’ parameters. The GE algorithm takes a role as a global optimizer in finding the best parameter which will be used by SVM algorithm. The proposed GE-SVM algorithm is verified using some benchmark datasets and compared with other metaheuristic-based SVM algorithms. The experimental results show that the proposed GE-SVM algorithm obtains better results than other algorithms tested in this paper.

  17. Optimistic semi-supervised least squares classification

    DEFF Research Database (Denmark)

    Krijthe, Jesse H.; Loog, Marco

    2017-01-01

    The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples. In this work we study a simple self-learning approach to semi-supervised learning applied to the least squares classifier. We show that a soft-label and a hard-label variant ...

  18. Semi-supervised vibration-based classification and condition monitoring of compressors

    Science.gov (United States)

    Potočnik, Primož; Govekar, Edvard

    2017-09-01

    Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.

  19. Hyper-parameter tuning of a decision tree induction algorithm

    NARCIS (Netherlands)

    Mantovani, R.G.; Horváth, T.; Cerri, R.; Vanschoren, J.; de Carvalho, A.C.P.L.F.

    2017-01-01

    Supervised classification is the most studied task in Machine Learning. Among the many algorithms used in such task, Decision Tree algorithms are a popular choice, since they are robust and efficient to construct. Moreover, they have the advantage of producing comprehensible models and satisfactory

  20. Semi-Supervised Bayesian Classification of Materials with Impact-Echo Signals

    Directory of Open Access Journals (Sweden)

    Jorge Igual

    2015-05-01

    Full Text Available The detection and identification of internal defects in a material require the use of some technology that translates the hidden interior damages into observable signals with different signature-defect correspondences. We apply impact-echo techniques for this purpose. The materials are classified according to their defective status (homogeneous, one defect or multiple defects and kind of defect (hole or crack, passing through or not. Every specimen is impacted by a hammer, and the spectrum of the propagated wave is recorded. This spectrum is the input data to a Bayesian classifier that is based on the modeling of the conditional probabilities with a mixture of Gaussians. The parameters of the Gaussian mixtures and the class probabilities are estimated using an extended expectation-maximization algorithm. The advantage of our proposal is that it is flexible, since it obtains good results for a wide range of models even under little supervision; e.g., it obtains a harmonic average of precision and recall value of 92.38% given only a 10% supervision ratio. We test the method with real specimens made of aluminum alloy. The results show that the algorithm works very well. This technique could be applied in many industrial problems, such as the optimization of the marble cutting process.

  1. Slow Learner Prediction Using Multi-Variate Naïve Bayes Classification Algorithm

    Directory of Open Access Journals (Sweden)

    Shiwani Rana

    2017-01-01

    Full Text Available Machine Learning is a field of computer science that learns from data by studying algorithms and their constructions. In machine learning, for specific inputs, algorithms help to make predictions. Classification is a supervised learning approach, which maps a data item into predefined classes. For predicting slow learners in an institute, a modified Naïve Bayes algorithm implemented. The implementation is carried sing Python.  It takes into account a combination of likewise multi-valued attributes. A dataset of the 60 students of BE (Information Technology Third Semester for the subject of Digital Electronics of University Institute of Engineering and Technology (UIET, Panjab University (PU, Chandigarh, India is taken to carry out the simulations. The analysis is done by choosing most significant forty-eight attributes. The experimental results have shown that the modified Naïve Bayes model has outperformed the Naïve Bayes Classifier in accuracy but requires significant improvement in the terms of elapsed time. By using Modified Naïve Bayes approach, the accuracy is found out to be 71.66% whereas it is calculated 66.66% using existing Naïve Bayes model. Further, a comparison is drawn by using WEKA tool. Here, an accuracy of Naïve Bayes is obtained as 58.33 %.

  2. Accuracy Analysis Comparison of Supervised Classification Methods for Anomaly Detection on Levees Using SAR Imagery

    Directory of Open Access Journals (Sweden)

    Ramakalavathi Marapareddy

    2017-10-01

    Full Text Available This paper analyzes the use of a synthetic aperture radar (SAR imagery to support levee condition assessment by detecting potential slide areas in an efficient and cost-effective manner. Levees are prone to a failure in the form of internal erosion within the earthen structure and landslides (also called slough or slump slides. If not repaired, slough slides may lead to levee failures. In this paper, we compare the accuracy of the supervised classification methods minimum distance (MD using Euclidean and Mahalanobis distance, support vector machine (SVM, and maximum likelihood (ML, using SAR technology to detect slough slides on earthen levees. In this work, the effectiveness of the algorithms was demonstrated using quad-polarimetric L-band SAR imagery from the NASA Jet Propulsion Laboratory’s (JPL’s uninhabited aerial vehicle synthetic aperture radar (UAVSAR. The study area is a section of the lower Mississippi River valley in the Southern USA, where earthen flood control levees are maintained by the US Army Corps of Engineers.

  3. A Multiagent-based Intrusion Detection System with the Support of Multi-Class Supervised Classification

    Science.gov (United States)

    Shyu, Mei-Ling; Sainani, Varsha

    The increasing number of network security related incidents have made it necessary for the organizations to actively protect their sensitive data with network intrusion detection systems (IDSs). IDSs are expected to analyze a large volume of data while not placing a significantly added load on the monitoring systems and networks. This requires good data mining strategies which take less time and give accurate results. In this study, a novel data mining assisted multiagent-based intrusion detection system (DMAS-IDS) is proposed, particularly with the support of multiclass supervised classification. These agents can detect and take predefined actions against malicious activities, and data mining techniques can help detect them. Our proposed DMAS-IDS shows superior performance compared to central sniffing IDS techniques, and saves network resources compared to other distributed IDS with mobile agents that activate too many sniffers causing bottlenecks in the network. This is one of the major motivations to use a distributed model based on multiagent platform along with a supervised classification technique.

  4. An Improved Brain-Inspired Emotional Learning Algorithm for Fast Classification

    Directory of Open Access Journals (Sweden)

    Ying Mei

    2017-06-01

    Full Text Available Classification is an important task of machine intelligence in the field of information. The artificial neural network (ANN is widely used for classification. However, the traditional ANN shows slow training speed, and it is hard to meet the real-time requirement for large-scale applications. In this paper, an improved brain-inspired emotional learning (BEL algorithm is proposed for fast classification. The BEL algorithm was put forward to mimic the high speed of the emotional learning mechanism in mammalian brain, which has the superior features of fast learning and low computational complexity. To improve the accuracy of BEL in classification, the genetic algorithm (GA is adopted for optimally tuning the weights and biases of amygdala and orbitofrontal cortex in the BEL neural network. The combinational algorithm named as GA-BEL has been tested on eight University of California at Irvine (UCI datasets and two well-known databases (Japanese Female Facial Expression, Cohn–Kanade. The comparisons of experiments indicate that the proposed GA-BEL is more accurate than the original BEL algorithm, and it is much faster than the traditional algorithm.

  5. EEG source space analysis of the supervised factor analytic approach for the classification of multi-directional arm movement

    Science.gov (United States)

    Shenoy Handiru, Vikram; Vinod, A. P.; Guan, Cuntai

    2017-08-01

    Objective. In electroencephalography (EEG)-based brain-computer interface (BCI) systems for motor control tasks the conventional practice is to decode motor intentions by using scalp EEG. However, scalp EEG only reveals certain limited information about the complex tasks of movement with a higher degree of freedom. Therefore, our objective is to investigate the effectiveness of source-space EEG in extracting relevant features that discriminate arm movement in multiple directions. Approach. We have proposed a novel feature extraction algorithm based on supervised factor analysis that models the data from source-space EEG. To this end, we computed the features from the source dipoles confined to Brodmann areas of interest (BA4a, BA4p and BA6). Further, we embedded class-wise labels of multi-direction (multi-class) source-space EEG to an unsupervised factor analysis to make it into a supervised learning method. Main Results. Our approach provided an average decoding accuracy of 71% for the classification of hand movement in four orthogonal directions, that is significantly higher (>10%) than the classification accuracy obtained using state-of-the-art spatial pattern features in sensor space. Also, the group analysis on the spectral characteristics of source-space EEG indicates that the slow cortical potentials from a set of cortical source dipoles reveal discriminative information regarding the movement parameter, direction. Significance. This study presents evidence that low-frequency components in the source space play an important role in movement kinematics, and thus it may lead to new strategies for BCI-based neurorehabilitation.

  6. Android Malware Classification Using K-Means Clustering Algorithm

    Science.gov (United States)

    Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

    2017-08-01

    Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.

  7. Parallelizing Gene Expression Programming Algorithm in Enabling Large-Scale Classification

    Directory of Open Access Journals (Sweden)

    Lixiong Xu

    2017-01-01

    Full Text Available As one of the most effective function mining algorithms, Gene Expression Programming (GEP algorithm has been widely used in classification, pattern recognition, prediction, and other research fields. Based on the self-evolution, GEP is able to mine an optimal function for dealing with further complicated tasks. However, in big data researches, GEP encounters low efficiency issue due to its long time mining processes. To improve the efficiency of GEP in big data researches especially for processing large-scale classification tasks, this paper presents a parallelized GEP algorithm using MapReduce computing model. The experimental results show that the presented algorithm is scalable and efficient for processing large-scale classification tasks.

  8. Optimizing area under the ROC curve using semi-supervised learning.

    Science.gov (United States)

    Wang, Shijun; Li, Diana; Petrick, Nicholas; Sahiner, Berkman; Linguraru, Marius George; Summers, Ronald M

    2015-01-01

    Receiver operating characteristic (ROC) analysis is a standard methodology to evaluate the performance of a binary classification system. The area under the ROC curve (AUC) is a performance metric that summarizes how well a classifier separates two classes. Traditional AUC optimization techniques are supervised learning methods that utilize only labeled data (i.e., the true class is known for all data) to train the classifiers. In this work, inspired by semi-supervised and transductive learning, we propose two new AUC optimization algorithms hereby referred to as semi-supervised learning receiver operating characteristic (SSLROC) algorithms, which utilize unlabeled test samples in classifier training to maximize AUC. Unlabeled samples are incorporated into the AUC optimization process, and their ranking relationships to labeled positive and negative training samples are considered as optimization constraints. The introduced test samples will cause the learned decision boundary in a multidimensional feature space to adapt not only to the distribution of labeled training data, but also to the distribution of unlabeled test data. We formulate the semi-supervised AUC optimization problem as a semi-definite programming problem based on the margin maximization theory. The proposed methods SSLROC1 (1-norm) and SSLROC2 (2-norm) were evaluated using 34 (determined by power analysis) randomly selected datasets from the University of California, Irvine machine learning repository. Wilcoxon signed rank tests showed that the proposed methods achieved significant improvement compared with state-of-the-art methods. The proposed methods were also applied to a CT colonography dataset for colonic polyp classification and showed promising results.

  9. Supervised classification of continental shelf sediment off western Donegal, Ireland

    Science.gov (United States)

    Monteys, X.; Craven, K.; McCarron, S. G.

    2017-12-01

    Managing human impacts on marine ecosystems requires natural regions to be identified and mapped over a range of hierarchically nested scales. In recent years (2000-present) the Irish National Seabed Survey (INSS) and Integrated Mapping for the Sustainable Development of Ireland's Marine Resources programme (INFOMAR) (Geological Survey Ireland and Marine Institute collaborations) has provided unprecedented quantities of high quality data on Ireland's offshore territories. The increasing availability of large, detailed digital representations of these environments requires the application of objective and quantitative analyses. This study presents results of a new approach for sea floor sediment mapping based on an integrated analysis of INFOMAR multibeam bathymetric data (including the derivatives of slope and relative position), backscatter data (including derivatives of angular response analysis) and sediment groundtruthing over the continental shelf, west of Donegal. It applies a Geographic-Object-Based Image Analysis software package to provide a supervised classification of the surface sediment. This approach can provide a statistically robust, high resolution classification of the seafloor. Initial results display a differentiation of sediment classes and a reduction in artefacts from previously applied methodologies. These results indicate a methodology that could be used during physical habitat mapping and classification of marine environments.

  10. Fall detection using supervised machine learning algorithms: A comparative study

    KAUST Repository

    Zerrouki, Nabil; Harrou, Fouzi; Houacine, Amrane; Sun, Ying

    2017-01-01

    Fall incidents are considered as the leading cause of disability and even mortality among older adults. To address this problem, fall detection and prevention fields receive a lot of intention over the past years and attracted many researcher efforts. We present in the current study an overall performance comparison between fall detection systems using the most popular machine learning approaches which are: Naïve Bayes, K nearest neighbor, neural network, and support vector machine. The analysis of the classification power associated to these most widely utilized algorithms is conducted on two fall detection databases namely FDD and URFD. Since the performance of the classification algorithm is inherently dependent on the features, we extracted and used the same features for all classifiers. The classification evaluation is conducted using different state of the art statistical measures such as the overall accuracy, the F-measure coefficient, and the area under ROC curve (AUC) value.

  11. Fall detection using supervised machine learning algorithms: A comparative study

    KAUST Repository

    Zerrouki, Nabil

    2017-01-05

    Fall incidents are considered as the leading cause of disability and even mortality among older adults. To address this problem, fall detection and prevention fields receive a lot of intention over the past years and attracted many researcher efforts. We present in the current study an overall performance comparison between fall detection systems using the most popular machine learning approaches which are: Naïve Bayes, K nearest neighbor, neural network, and support vector machine. The analysis of the classification power associated to these most widely utilized algorithms is conducted on two fall detection databases namely FDD and URFD. Since the performance of the classification algorithm is inherently dependent on the features, we extracted and used the same features for all classifiers. The classification evaluation is conducted using different state of the art statistical measures such as the overall accuracy, the F-measure coefficient, and the area under ROC curve (AUC) value.

  12. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  13. Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems.

    Science.gov (United States)

    Geraci, Joseph; Dharsee, Moyez; Nuin, Paulo; Haslehurst, Alexandria; Koti, Madhuri; Feilotter, Harriet E; Evans, Ken

    2014-03-01

    We introduce a novel method for visualizing high dimensional data via a discrete dynamical system. This method provides a 2D representation of the relationship between subjects according to a set of variables without geometric projections, transformed axes or principal components. The algorithm exploits a memory-type mechanism inherent in a certain class of discrete dynamical systems collectively referred to as the chaos game that are closely related to iterative function systems. The goal of the algorithm was to create a human readable representation of high dimensional patient data that was capable of detecting unrevealed subclusters of patients from within anticipated classifications. This provides a mechanism to further pursue a more personalized exploration of pathology when used with medical data. For clustering and classification protocols, the dynamical system portion of the algorithm is designed to come after some feature selection filter and before some model evaluation (e.g. clustering accuracy) protocol. In the version given here, a univariate features selection step is performed (in practice more complex feature selection methods are used), a discrete dynamical system is driven by this reduced set of variables (which results in a set of 2D cluster models), these models are evaluated for their accuracy (according to a user-defined binary classification) and finally a visual representation of the top classification models are returned. Thus, in addition to the visualization component, this methodology can be used for both supervised and unsupervised machine learning as the top performing models are returned in the protocol we describe here. Butterfly, the algorithm we introduce and provide working code for, uses a discrete dynamical system to classify high dimensional data and provide a 2D representation of the relationship between subjects. We report results on three datasets (two in the article; one in the appendix) including a public lung cancer

  14. An Extended Spectral-Spatial Classification Approach for Hyperspectral Data

    Science.gov (United States)

    Akbari, D.

    2017-11-01

    In this paper an extended classification approach for hyperspectral imagery based on both spectral and spatial information is proposed. The spatial information is obtained by an enhanced marker-based minimum spanning forest (MSF) algorithm. Three different methods of dimension reduction are first used to obtain the subspace of hyperspectral data: (1) unsupervised feature extraction methods including principal component analysis (PCA), independent component analysis (ICA), and minimum noise fraction (MNF); (2) supervised feature extraction including decision boundary feature extraction (DBFE), discriminate analysis feature extraction (DAFE), and nonparametric weighted feature extraction (NWFE); (3) genetic algorithm (GA). The spectral features obtained are then fed into the enhanced marker-based MSF classification algorithm. In the enhanced MSF algorithm, the markers are extracted from the classification maps obtained by both SVM and watershed segmentation algorithm. To evaluate the proposed approach, the Pavia University hyperspectral data is tested. Experimental results show that the proposed approach using GA achieves an approximately 8 % overall accuracy higher than the original MSF-based algorithm.

  15. Effects of supervised Self Organising Maps parameters on classification performance.

    Science.gov (United States)

    Ballabio, Davide; Vasighi, Mahdi; Filzmoser, Peter

    2013-02-26

    Self Organising Maps (SOMs) are one of the most powerful learning strategies among neural networks algorithms. SOMs have several adaptable parameters and the selection of appropriate network architectures is required in order to make accurate predictions. The major disadvantage of SOMs is probably due to the network optimisation, since this procedure can be often time-expensive. Effects of network size, training epochs and learning rate on the classification performance of SOMs are known, whereas the effect of other parameters (type of SOMs, weights initialisation, training algorithm, topology and boundary conditions) are not so obvious. This study was addressed to analyse the effect of SOMs parameters on the network classification performance, as well as on their computational times, taking into consideration a significant number of real datasets, in order to achieve a comprehensive statistical comparison. Parameters were contemporaneously evaluated by means of an approach based on the design of experiments, which enabled the investigation of their interaction effects. Results highlighted the most important parameters which influence the classification performance and enabled the identification of the optimal settings, as well as the optimal architectures to reduce the computational time of SOMs. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Voxel-Based Neighborhood for Spatial Shape Pattern Classification of Lidar Point Clouds with Supervised Learning

    Directory of Open Access Journals (Sweden)

    Victoria Plaza-Leiva

    2017-03-01

    Full Text Available Improving the effectiveness of spatial shape features classification from 3D lidar data is very relevant because it is largely used as a fundamental step towards higher level scene understanding challenges of autonomous vehicles and terrestrial robots. In this sense, computing neighborhood for points in dense scans becomes a costly process for both training and classification. This paper proposes a new general framework for implementing and comparing different supervised learning classifiers with a simple voxel-based neighborhood computation where points in each non-overlapping voxel in a regular grid are assigned to the same class by considering features within a support region defined by the voxel itself. The contribution provides offline training and online classification procedures as well as five alternative feature vector definitions based on principal component analysis for scatter, tubular and planar shapes. Moreover, the feasibility of this approach is evaluated by implementing a neural network (NN method previously proposed by the authors as well as three other supervised learning classifiers found in scene processing methods: support vector machines (SVM, Gaussian processes (GP, and Gaussian mixture models (GMM. A comparative performance analysis is presented using real point clouds from both natural and urban environments and two different 3D rangefinders (a tilting Hokuyo UTM-30LX and a Riegl. Classification performance metrics and processing time measurements confirm the benefits of the NN classifier and the feasibility of voxel-based neighborhood.

  17. Voxel-Based Neighborhood for Spatial Shape Pattern Classification of Lidar Point Clouds with Supervised Learning.

    Science.gov (United States)

    Plaza-Leiva, Victoria; Gomez-Ruiz, Jose Antonio; Mandow, Anthony; García-Cerezo, Alfonso

    2017-03-15

    Improving the effectiveness of spatial shape features classification from 3D lidar data is very relevant because it is largely used as a fundamental step towards higher level scene understanding challenges of autonomous vehicles and terrestrial robots. In this sense, computing neighborhood for points in dense scans becomes a costly process for both training and classification. This paper proposes a new general framework for implementing and comparing different supervised learning classifiers with a simple voxel-based neighborhood computation where points in each non-overlapping voxel in a regular grid are assigned to the same class by considering features within a support region defined by the voxel itself. The contribution provides offline training and online classification procedures as well as five alternative feature vector definitions based on principal component analysis for scatter, tubular and planar shapes. Moreover, the feasibility of this approach is evaluated by implementing a neural network (NN) method previously proposed by the authors as well as three other supervised learning classifiers found in scene processing methods: support vector machines (SVM), Gaussian processes (GP), and Gaussian mixture models (GMM). A comparative performance analysis is presented using real point clouds from both natural and urban environments and two different 3D rangefinders (a tilting Hokuyo UTM-30LX and a Riegl). Classification performance metrics and processing time measurements confirm the benefits of the NN classifier and the feasibility of voxel-based neighborhood.

  18. Evaluation of forest cover estimates for Haiti using supervised classification of Landsat data

    Science.gov (United States)

    Churches, Christopher E.; Wampler, Peter J.; Sun, Wanxiao; Smith, Andrew J.

    2014-08-01

    This study uses 2010-2011 Landsat Thematic Mapper (TM) imagery to estimate total forested area in Haiti. The thematic map was generated using radiometric normalization of digital numbers by a modified normalization method utilizing pseudo-invariant polygons (PIPs), followed by supervised classification of the mosaicked image using the Food and Agriculture Organization (FAO) of the United Nations Land Cover Classification System. Classification results were compared to other sources of land-cover data produced for similar years, with an emphasis on the statistics presented by the FAO. Three global land cover datasets (GLC2000, Globcover, 2009, and MODIS MCD12Q1), and a national-scale dataset (a land cover analysis by Haitian National Centre for Geospatial Information (CNIGS)) were reclassified and compared. According to our classification, approximately 32.3% of Haiti's total land area was tree covered in 2010-2011. This result was confirmed using an error-adjusted area estimator, which predicted a tree covered area of 32.4%. Standardization to the FAO's forest cover class definition reduces the amount of tree cover of our supervised classification to 29.4%. This result was greater than the reported FAO value of 4% and the value for the recoded GLC2000 dataset of 7.0%, but is comparable to values for three other recoded datasets: MCD12Q1 (21.1%), Globcover (2009) (26.9%), and CNIGS (19.5%). We propose that at coarse resolutions, the segmented and patchy nature of Haiti's forests resulted in a systematic underestimation of the extent of forest cover. It appears the best explanation for the significant difference between our results, FAO statistics, and compared datasets is the accuracy of the data sources and the resolution of the imagery used for land cover analyses. Analysis of recoded global datasets and results from this study suggest a strong linear relationship (R2 = 0.996 for tree cover) between spatial resolution and land cover estimates.

  19. Adaptive phase k-means algorithm for waveform classification

    Science.gov (United States)

    Song, Chengyun; Liu, Zhining; Wang, Yaojun; Xu, Feng; Li, Xingming; Hu, Guangmin

    2018-01-01

    Waveform classification is a powerful technique for seismic facies analysis that describes the heterogeneity and compartments within a reservoir. Horizon interpretation is a critical step in waveform classification. However, the horizon often produces inconsistent waveform phase, and thus results in an unsatisfied classification. To alleviate this problem, an adaptive phase waveform classification method called the adaptive phase k-means is introduced in this paper. Our method improves the traditional k-means algorithm using an adaptive phase distance for waveform similarity measure. The proposed distance is a measure with variable phases as it moves from sample to sample along the traces. Model traces are also updated with the best phase interference in the iterative process. Therefore, our method is robust to phase variations caused by the interpretation horizon. We tested the effectiveness of our algorithm by applying it to synthetic and real data. The satisfactory results reveal that the proposed method tolerates certain waveform phase variation and is a good tool for seismic facies analysis.

  20. Multispectral Image classification using the theories of neural networks

    International Nuclear Information System (INIS)

    Ardisasmita, M.S.; Subki, M.I.R.

    1997-01-01

    Image classification is the one of the important part of digital image analysis. the objective of image classification is to identify and regroup the features occurring in an image into one or several classes in terms of the object. basic to the understanding of multispectral classification is the concept of the spectral response of an object as a function of the electromagnetic radiation and the wavelength of the spectrum. new approaches to classification has been developed to improve the result of analysis, these state-of-the-art classifiers are based upon the theories of neural networks. Neural network classifiers are algorithmes which mimic the computational abilities of the human brain. Artificial neurons are simple emulation's of biological neurons; they take in information from sensors or other artificial neurons, perform very simple operations on this data, and pass the result to other recognize the spectral signature of each image pixel. Neural network image classification has been divided into supervised and unsupervised training procedures. In the supervised approach, examples of each cover type can be located and the computer can compute spectral signatures to categorize all pixels in a digital image into several land cover classes. In supervised classification, spectral signatures are generated by mathematically grouping and it does not require analyst-specified training data. Thus, in the supervised approach we define useful information categories and then examine their spectral reparability; in the unsupervised approach the computer determines spectrally sapable classes and then we define thei information value

  1. Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

    OpenAIRE

    Zhang, Chenrui; Peng, Yuxin

    2018-01-01

    Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for feature learning via solving auxiliary tasks. However, existing methods in this regard suffer from two limitations when extended to video classification. First, they focus only on a single task, whereas ignoring complementarity among different task-specific feat...

  2. Evolutionary Algorithms For Neural Networks Binary And Real Data Classification

    Directory of Open Access Journals (Sweden)

    Dr. Hanan A.R. Akkar

    2015-08-01

    Full Text Available Artificial neural networks are complex networks emulating the way human rational neurons process data. They have been widely used generally in prediction clustering classification and association. The training algorithms that used to determine the network weights are almost the most important factor that influence the neural networks performance. Recently many meta-heuristic and Evolutionary algorithms are employed to optimize neural networks weights to achieve better neural performance. This paper aims to use recently proposed algorithms for optimizing neural networks weights comparing these algorithms performance with other classical meta-heuristic algorithms used for the same purpose. However to evaluate the performance of such algorithms for training neural networks we examine such algorithms to classify four opposite binary XOR clusters and classification of continuous real data sets such as Iris and Ecoli.

  3. Automatic segmentation of MR brain images of preterm infants using supervised classification.

    Science.gov (United States)

    Moeskops, Pim; Benders, Manon J N L; Chiţ, Sabina M; Kersbergen, Karina J; Groenendaal, Floris; de Vries, Linda S; Viergever, Max A; Išgum, Ivana

    2015-09-01

    Preterm birth is often associated with impaired brain development. The state and expected progression of preterm brain development can be evaluated using quantitative assessment of MR images. Such measurements require accurate segmentation of different tissue types in those images. This paper presents an algorithm for the automatic segmentation of unmyelinated white matter (WM), cortical grey matter (GM), and cerebrospinal fluid in the extracerebral space (CSF). The algorithm uses supervised voxel classification in three subsequent stages. In the first stage, voxels that can easily be assigned to one of the three tissue types are labelled. In the second stage, dedicated analysis of the remaining voxels is performed. The first and the second stages both use two-class classification for each tissue type separately. Possible inconsistencies that could result from these tissue-specific segmentation stages are resolved in the third stage, which performs multi-class classification. A set of T1- and T2-weighted images was analysed, but the optimised system performs automatic segmentation using a T2-weighted image only. We have investigated the performance of the algorithm when using training data randomly selected from completely annotated images as well as when using training data from only partially annotated images. The method was evaluated on images of preterm infants acquired at 30 and 40weeks postmenstrual age (PMA). When the method was trained using random selection from the completely annotated images, the average Dice coefficients were 0.95 for WM, 0.81 for GM, and 0.89 for CSF on an independent set of images acquired at 30weeks PMA. When the method was trained using only the partially annotated images, the average Dice coefficients were 0.95 for WM, 0.78 for GM and 0.87 for CSF for the images acquired at 30weeks PMA, and 0.92 for WM, 0.80 for GM and 0.85 for CSF for the images acquired at 40weeks PMA. Even though the segmentations obtained using training data

  4. A Supervised Machine Learning Study of Online Discussion Forums about Type-2 Diabetes

    DEFF Research Database (Denmark)

    Reichert, Jonathan-Raphael; Kristensen, Klaus Langholz; Mukkamala, Raghava Rao

    2017-01-01

    supervised machine learning techniques to analyze the online conversations. In order to analyse these online textual conversations, we have chosen four domain specific models (Emotions, Sentiment, Personality Traits and Patient Journey). As part of text classification, we employed the ensemble learning...... method by using 5 different supervised machine learning algorithms to build a set of text classifiers by using the voting method to predict most probable label for a given textual conversation from the online discussion forums. Our findings show that there is a high amount of trust expressed by a subset...

  5. Packet Classification by Multilevel Cutting of the Classification Space: An Algorithmic-Architectural Solution for IP Packet Classification in Next Generation Networks

    Directory of Open Access Journals (Sweden)

    Motasem Aldiab

    2008-01-01

    Full Text Available Traditionally, the Internet provides only a “best-effort” service, treating all packets going to the same destination equally. However, providing differentiated services for different users based on their quality requirements is increasingly becoming a demanding issue. For this, routers need to have the capability to distinguish and isolate traffic belonging to different flows. This ability to determine the flow each packet belongs to is called packet classification. Technology vendors are reluctant to support algorithmic solutions for classification due to their nondeterministic performance. Although content addressable memories (CAMs are favoured by technology vendors due to their deterministic high-lookup rates, they suffer from the problems of high-power consumption and high-silicon cost. This paper provides a new algorithmic-architectural solution for packet classification that mixes CAMs with algorithms based on multilevel cutting of the classification space into smaller spaces. The provided solution utilizes the geometrical distribution of rules in the classification space. It provides the deterministic performance of CAMs, support for dynamic updates, and added flexibility for system designers.

  6. Prediction of customer behaviour analysis using classification algorithms

    Science.gov (United States)

    Raju, Siva Subramanian; Dhandayudam, Prabha

    2018-04-01

    Customer Relationship management plays a crucial role in analyzing of customer behavior patterns and their values with an enterprise. Analyzing of customer data can be efficient performed using various data mining techniques, with the goal of developing business strategies and to enhance the business. In this paper, three classification models (NB, J48, and MLPNN) are studied and evaluated for our experimental purpose. The performance measures of the three classifications are compared using three different parameters (accuracy, sensitivity, specificity) and experimental results expose J48 algorithm has better accuracy with compare to NB and MLPNN algorithm.

  7. Comparison of Computational Algorithms for the Classification of Liver Cancer using SELDI Mass Spectrometry: A Case Study

    Directory of Open Access Journals (Sweden)

    Robert J Hickey

    2007-01-01

    Full Text Available Introduction: As an alternative to DNA microarrays, mass spectrometry based analysis of proteomic patterns has shown great potential in cancer diagnosis. The ultimate application of this technique in clinical settings relies on the advancement of the technology itself and the maturity of the computational tools used to analyze the data. A number of computational algorithms constructed on different principles are available for the classification of disease status based on proteomic patterns. Nevertheless, few studies have addressed the difference in the performance of these approaches. In this report, we describe a comparative case study on the classification accuracy of hepatocellular carcinoma based on the serum proteomic pattern generated from a Surface Enhanced Laser Desorption/Ionization (SELDI mass spectrometer.Methods: Nine supervised classifi cation algorithms are implemented in R software and compared for the classification accuracy.Results: We found that the support vector machine with radial function is preferable as a tool for classification of hepatocellular carcinoma using features in SELDI mass spectra. Among the rest of the methods, random forest and prediction analysis of microarrays have better performance. A permutation-based technique reveals that the support vector machine with a radial function seems intrinsically superior in learning from the training data since it has a lower prediction error than others when there is essentially no differential signal. On the other hand, the performance of the random forest and prediction analysis of microarrays rely on their capability of capturing the signals with substantial differentiation between groups.Conclusions: Our finding is similar to a previous study, where classification methods based on the Matrix Assisted Laser Desorption/Ionization (MALDI mass spectrometry are compared for the prediction accuracy of ovarian cancer. The support vector machine, random forest and prediction

  8. Combination of supervised and semi-supervised regression models for improved unbiased estimation

    DEFF Research Database (Denmark)

    Arenas-Garía, Jeronimo; Moriana-Varo, Carlos; Larsen, Jan

    2010-01-01

    In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised and semisupervi......In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised...

  9. Prototype-based Models for the Supervised Learning of Classification Schemes

    Science.gov (United States)

    Biehl, Michael; Hammer, Barbara; Villmann, Thomas

    2017-06-01

    An introduction is given to the use of prototype-based models in supervised machine learning. The main concept of the framework is to represent previously observed data in terms of so-called prototypes, which reflect typical properties of the data. Together with a suitable, discriminative distance or dissimilarity measure, prototypes can be used for the classification of complex, possibly high-dimensional data. We illustrate the framework in terms of the popular Learning Vector Quantization (LVQ). Most frequently, standard Euclidean distance is employed as a distance measure. We discuss how LVQ can be equipped with more general dissimilarites. Moreover, we introduce relevance learning as a tool for the data-driven optimization of parameterized distances.

  10. Implementation of several mathematical algorithms to breast tissue density classification

    International Nuclear Information System (INIS)

    Quintana, C.; Redondo, M.; Tirao, G.

    2014-01-01

    The accuracy of mammographic abnormality detection methods is strongly dependent on breast tissue characteristics, where a dense breast tissue can hide lesions causing cancer to be detected at later stages. In addition, breast tissue density is widely accepted to be an important risk indicator for the development of breast cancer. This paper presents the implementation and the performance of different mathematical algorithms designed to standardize the categorization of mammographic images, according to the American College of Radiology classifications. These mathematical techniques are based on intrinsic properties calculations and on comparison with an ideal homogeneous image (joint entropy, mutual information, normalized cross correlation and index Q) as categorization parameters. The algorithms evaluation was performed on 100 cases of the mammographic data sets provided by the Ministerio de Salud de la Provincia de Córdoba, Argentina—Programa de Prevención del Cáncer de Mama (Department of Public Health, Córdoba, Argentina, Breast Cancer Prevention Program). The obtained breast classifications were compared with the expert medical diagnostics, showing a good performance. The implemented algorithms revealed a high potentiality to classify breasts into tissue density categories. - Highlights: • Breast density classification can be obtained by suitable mathematical algorithms. • Mathematical processing help radiologists to obtain the BI-RADS classification. • The entropy and joint entropy show high performance for density classification

  11. Supervised chaos genetic algorithm based state of charge determination for LiFePO4 batteries in electric vehicles

    Science.gov (United States)

    Shen, Yanqing

    2018-04-01

    LiFePO4 battery is developed rapidly in electric vehicle, whose safety and functional capabilities are influenced greatly by the evaluation of available cell capacity. Added with adaptive switch mechanism, this paper advances a supervised chaos genetic algorithm based state of charge determination method, where a combined state space model is employed to simulate battery dynamics. The method is validated by the experiment data collected from battery test system. Results indicate that the supervised chaos genetic algorithm based state of charge determination method shows great performance with less computation complexity and is little influenced by the unknown initial cell state.

  12. AN EXTENDED SPECTRAL–SPATIAL CLASSIFICATION APPROACH FOR HYPERSPECTRAL DATA

    Directory of Open Access Journals (Sweden)

    D. Akbari

    2017-11-01

    Full Text Available In this paper an extended classification approach for hyperspectral imagery based on both spectral and spatial information is proposed. The spatial information is obtained by an enhanced marker-based minimum spanning forest (MSF algorithm. Three different methods of dimension reduction are first used to obtain the subspace of hyperspectral data: (1 unsupervised feature extraction methods including principal component analysis (PCA, independent component analysis (ICA, and minimum noise fraction (MNF; (2 supervised feature extraction including decision boundary feature extraction (DBFE, discriminate analysis feature extraction (DAFE, and nonparametric weighted feature extraction (NWFE; (3 genetic algorithm (GA. The spectral features obtained are then fed into the enhanced marker-based MSF classification algorithm. In the enhanced MSF algorithm, the markers are extracted from the classification maps obtained by both SVM and watershed segmentation algorithm. To evaluate the proposed approach, the Pavia University hyperspectral data is tested. Experimental results show that the proposed approach using GA achieves an approximately 8 % overall accuracy higher than the original MSF-based algorithm.

  13. Semi-Supervised Half-Quadratic Nonnegative Matrix Factorization for Face Recognition

    KAUST Repository

    Alghamdi, Masheal M.

    2014-05-01

    Face recognition is a challenging problem in computer vision. Difficulties such as slight differences between similar faces of different people, changes in facial expressions, light and illumination condition, and pose variations add extra complications to the face recognition research. Many algorithms are devoted to solving the face recognition problem, among which the family of nonnegative matrix factorization (NMF) algorithms has been widely used as a compact data representation method. Different versions of NMF have been proposed. Wang et al. proposed the graph-based semi-supervised nonnegative learning (S2N2L) algorithm that uses labeled data in constructing intrinsic and penalty graph to enforce separability of labeled data, which leads to a greater discriminating power. Moreover the geometrical structure of labeled and unlabeled data is preserved through using the smoothness assumption by creating a similarity graph that conserves the neighboring information for all labeled and unlabeled data. However, S2N2L is sensitive to light changes, illumination, and partial occlusion. In this thesis, we propose a Semi-Supervised Half-Quadratic NMF (SSHQNMF) algorithm that combines the benefits of S2N2L and the robust NMF by the half- quadratic minimization (HQNMF) algorithm.Our algorithm improves upon the S2N2L algorithm by replacing the Frobenius norm with a robust M-Estimator loss function. A multiplicative update solution for our SSHQNMF algorithmis driven using the half- 4 quadratic (HQ) theory. Extensive experiments on ORL, Yale-A and a subset of the PIE data sets for nine M-estimator loss functions for both SSHQNMF and HQNMF algorithms are investigated, and compared with several state-of-the-art supervised and unsupervised algorithms, along with the original S2N2L algorithm in the context of classification, clustering, and robustness against partial occlusion. The proposed algorithm outperformed the other algorithms. Furthermore, SSHQNMF with Maximum Correntropy

  14. Improved wavelet packet classification algorithm for vibrational intrusions in distributed fiber-optic monitoring systems

    Science.gov (United States)

    Wang, Bingjie; Pi, Shaohua; Sun, Qi; Jia, Bo

    2015-05-01

    An improved classification algorithm that considers multiscale wavelet packet Shannon entropy is proposed. Decomposition coefficients at all levels are obtained to build the initial Shannon entropy feature vector. After subtracting the Shannon entropy map of the background signal, components of the strongest discriminating power in the initial feature vector are picked out to rebuild the Shannon entropy feature vector, which is transferred to radial basis function (RBF) neural network for classification. Four types of man-made vibrational intrusion signals are recorded based on a modified Sagnac interferometer. The performance of the improved classification algorithm has been evaluated by the classification experiments via RBF neural network under different diffusion coefficients. An 85% classification accuracy rate is achieved, which is higher than the other common algorithms. The classification results show that this improved classification algorithm can be used to classify vibrational intrusion signals in an automatic real-time monitoring system.

  15. Multilabel user classification using the community structure of online networks.

    Science.gov (United States)

    Rizos, Georgios; Papadopoulos, Symeon; Kompatsiaris, Yiannis

    2017-01-01

    We study the problem of semi-supervised, multi-label user classification of networked data in the online social platform setting. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE), an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph in order to learn a sparse embedding. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user's graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving up to 35% relative improvement compared to the second best competing method in terms of F1-score.

  16. Multilabel user classification using the community structure of online networks.

    Directory of Open Access Journals (Sweden)

    Georgios Rizos

    Full Text Available We study the problem of semi-supervised, multi-label user classification of networked data in the online social platform setting. We propose a framework that combines unsupervised community extraction and supervised, community-based feature weighting before training a classifier. We introduce Approximate Regularized Commute-Time Embedding (ARCTE, an algorithm that projects the users of a social graph onto a latent space, but instead of packing the global structure into a matrix of predefined rank, as many spectral and neural representation learning methods do, it extracts local communities for all users in the graph in order to learn a sparse embedding. To this end, we employ an improvement of personalized PageRank algorithms for searching locally in each user's graph structure. Then, we perform supervised community feature weighting in order to boost the importance of highly predictive communities. We assess our method performance on the problem of user classification by performing an extensive comparative study among various recent methods based on graph embeddings. The comparison shows that ARCTE significantly outperforms the competition in almost all cases, achieving up to 35% relative improvement compared to the second best competing method in terms of F1-score.

  17. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers.

    Science.gov (United States)

    Galpert, Deborah; Fernández, Alberto; Herrera, Francisco; Antunes, Agostinho; Molina-Ruiz, Reinaldo; Agüero-Chapin, Guillermin

    2018-05-03

    The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were

  18. TEXT CLASSIFICATION USING NAIVE BAYES UPDATEABLE ALGORITHM IN SBMPTN TEST QUESTIONS

    Directory of Open Access Journals (Sweden)

    Ristu Saptono

    2017-01-01

    Full Text Available Document classification is a growing interest in the research of text mining. Classification can be done based on the topics, languages, and so on. This study was conducted to determine how Naive Bayes Updateable performs in classifying the SBMPTN exam questions based on its theme. Increment model of one classification algorithm often used in text classification Naive Bayes classifier has the ability to learn from new data introduces with the system even after the classifier has been produced with the existing data. Naive Bayes Classifier classifies the exam questions based on the theme of the field of study by analyzing keywords that appear on the exam questions. One of feature selection method DF-Thresholding is implemented for improving the classification performance. Evaluation of the classification with Naive Bayes classifier algorithm produces 84,61% accuracy.

  19. Classification Formula and Generation Algorithm of Cycle Decomposition Expression for Dihedral Groups

    Directory of Open Access Journals (Sweden)

    Dakun Zhang

    2013-01-01

    Full Text Available The necessary of classification research on common formula of group (dihedral group cycle decomposition expression is illustrated. It includes the reflection and rotation conversion, which derived six common formulae on cycle decomposition expressions of group; it designed the generation algorithm on the cycle decomposition expressions of group, which is based on the method of replacement conversion and the classification formula; algorithm analysis and the results of the process show that the generation algorithm which is based on the classification formula is outperformed by the general algorithm which is based on replacement conversion; it has great significance to solve the enumeration of the necklace combinational scheme, especially the structural problems of combinational scheme, by using group theory and computer.

  20. Walking pattern classification and walking distance estimation algorithms using gait phase information.

    Science.gov (United States)

    Wang, Jeen-Shing; Lin, Che-Wei; Yang, Ya-Ting C; Ho, Yu-Jen

    2012-10-01

    This paper presents a walking pattern classification and a walking distance estimation algorithm using gait phase information. A gait phase information retrieval algorithm was developed to analyze the duration of the phases in a gait cycle (i.e., stance, push-off, swing, and heel-strike phases). Based on the gait phase information, a decision tree based on the relations between gait phases was constructed for classifying three different walking patterns (level walking, walking upstairs, and walking downstairs). Gait phase information was also used for developing a walking distance estimation algorithm. The walking distance estimation algorithm consists of the processes of step count and step length estimation. The proposed walking pattern classification and walking distance estimation algorithm have been validated by a series of experiments. The accuracy of the proposed walking pattern classification was 98.87%, 95.45%, and 95.00% for level walking, walking upstairs, and walking downstairs, respectively. The accuracy of the proposed walking distance estimation algorithm was 96.42% over a walking distance.

  1. A fingerprint classification algorithm based on combination of local and global information

    Science.gov (United States)

    Liu, Chongjin; Fu, Xiang; Bian, Junjie; Feng, Jufu

    2011-12-01

    Fingerprint recognition is one of the most important technologies in biometric identification and has been wildly applied in commercial and forensic areas. Fingerprint classification, as the fundamental procedure in fingerprint recognition, can sharply decrease the quantity for fingerprint matching and improve the efficiency of fingerprint recognition. Most fingerprint classification algorithms are based on the number and position of singular points. Because the singular points detecting method only considers the local information commonly, the classification algorithms are sensitive to noise. In this paper, we propose a novel fingerprint classification algorithm combining the local and global information of fingerprint. Firstly we use local information to detect singular points and measure their quality considering orientation structure and image texture in adjacent areas. Furthermore the global orientation model is adopted to measure the reliability of singular points group. Finally the local quality and global reliability is weighted to classify fingerprint. Experiments demonstrate the accuracy and effectivity of our algorithm especially for the poor quality fingerprint images.

  2. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

    Science.gov (United States)

    Zhou, Fuqun; Zhang, Aining

    2016-10-25

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.

  3. A numeric comparison of variable selection algorithms for supervised learning

    International Nuclear Information System (INIS)

    Palombo, G.; Narsky, I.

    2009-01-01

    Datasets in modern High Energy Physics (HEP) experiments are often described by dozens or even hundreds of input variables. Reducing a full variable set to a subset that most completely represents information about data is therefore an important task in analysis of HEP data. We compare various variable selection algorithms for supervised learning using several datasets such as, for instance, imaging gamma-ray Cherenkov telescope (MAGIC) data found at the UCI repository. We use classifiers and variable selection methods implemented in the statistical package StatPatternRecognition (SPR), a free open-source C++ package developed in the HEP community ( (http://sourceforge.net/projects/statpatrec/)). For each dataset, we select a powerful classifier and estimate its learning accuracy on variable subsets obtained by various selection algorithms. When possible, we also estimate the CPU time needed for the variable subset selection. The results of this analysis are compared with those published previously for these datasets using other statistical packages such as R and Weka. We show that the most accurate, yet slowest, method is a wrapper algorithm known as generalized sequential forward selection ('Add N Remove R') implemented in SPR.

  4. An algorithm for the arithmetic classification of multilattices.

    Science.gov (United States)

    Indelicato, Giuliana

    2013-01-01

    A procedure for the construction and the classification of monoatomic multilattices in arbitrary dimension is developed. The algorithm allows one to determine the location of the points of all monoatomic multilattices with a given symmetry, or to determine whether two assigned multilattices are arithmetically equivalent. This approach is based on ideas from integral matrix theory, in particular the reduction to the Smith normal form, and can be coded to provide a classification software package.

  5. Discriminative semi-supervised feature selection via manifold regularization.

    Science.gov (United States)

    Xu, Zenglin; King, Irwin; Lyu, Michael Rung-Tsong; Jin, Rong

    2010-07-01

    Feature selection has attracted a huge amount of interest in both research and application communities of data mining. We consider the problem of semi-supervised feature selection, where we are given a small amount of labeled examples and a large amount of unlabeled examples. Since a small number of labeled samples are usually insufficient for identifying the relevant features, the critical problem arising from semi-supervised feature selection is how to take advantage of the information underneath the unlabeled data. To address this problem, we propose a novel discriminative semi-supervised feature selection method based on the idea of manifold regularization. The proposed approach selects features through maximizing the classification margin between different classes and simultaneously exploiting the geometry of the probability distribution that generates both labeled and unlabeled data. In comparison with previous semi-supervised feature selection algorithms, our proposed semi-supervised feature selection method is an embedded feature selection method and is able to find more discriminative features. We formulate the proposed feature selection method into a convex-concave optimization problem, where the saddle point corresponds to the optimal solution. To find the optimal solution, the level method, a fairly recent optimization method, is employed. We also present a theoretic proof of the convergence rate for the application of the level method to our problem. Empirical evaluation on several benchmark data sets demonstrates the effectiveness of the proposed semi-supervised feature selection method.

  6. A Support Vector Machine Hydrometeor Classification Algorithm for Dual-Polarization Radar

    Directory of Open Access Journals (Sweden)

    Nicoletta Roberto

    2017-07-01

    Full Text Available An algorithm based on a support vector machine (SVM is proposed for hydrometeor classification. The training phase is driven by the output of a fuzzy logic hydrometeor classification algorithm, i.e., the most popular approach for hydrometer classification algorithms used for ground-based weather radar. The performance of SVM is evaluated by resorting to a weather scenario, generated by a weather model; the corresponding radar measurements are obtained by simulation and by comparing results of SVM classification with those obtained by a fuzzy logic classifier. Results based on the weather model and simulations show a higher accuracy of the SVM classification. Objective comparison of the two classifiers applied to real radar data shows that SVM classification maps are spatially more homogenous (textural indices, energy, and homogeneity increases by 21% and 12% respectively and do not present non-classified data. The improvements found by SVM classifier, even though it is applied pixel-by-pixel, can be attributed to its ability to learn from the entire hyperspace of radar measurements and to the accurate training. The reliability of results and higher computing performance make SVM attractive for some challenging tasks such as its implementation in Decision Support Systems for helping pilots to make optimal decisions about changes inthe flight route caused by unexpected adverse weather.

  7. Semi-supervised learning of hyperspectral image segmentation applied to vine tomatoes and table grapes

    Directory of Open Access Journals (Sweden)

    Jeroen van Roy

    2018-03-01

    Full Text Available Nowadays, quality inspection of fruit and vegetables is typically accomplished through visual inspection. Automation of this inspection is desirable to make it more objective. For this, hyperspectral imaging has been identified as a promising technique. When the field of view includes multiple objects, hypercubes should be segmented to assign individual pixels to different objects. Unsupervised and supervised methods have been proposed. While the latter are labour intensive as they require masking of the training images, the former are too computationally intensive for in-line use and may provide different results for different hypercubes. Therefore, a semi-supervised method is proposed to train a computationally efficient segmentation algorithm with minimal human interaction. As a first step, an unsupervised classification model is used to cluster spectra in similar groups. In the second step, a pixel selection algorithm applied to the output of the unsupervised classification is used to build a supervised model which is fast enough for in-line use. To evaluate this approach, it is applied to hypercubes of vine tomatoes and table grapes. After first derivative spectral preprocessing to remove intensity variation due to curvature and gloss effects, the unsupervised models segmented 86.11% of the vine tomato images correctly. Considering overall accuracy, sensitivity, specificity and time needed to segment one hypercube, partial least squares discriminant analysis (PLS-DA was found to be the best choice for in-line use, when using one training image. By adding a second image, the segmentation results improved considerably, yielding an overall accuracy of 96.95% for segmentation of vine tomatoes and 98.52% for segmentation of table grapes, demonstrating the added value of the learning phase in the algorithm.

  8. Classification algorithms using adaptive partitioning

    KAUST Repository

    Binev, Peter; Cohen, Albert; Dahmen, Wolfgang; DeVore, Ronald

    2014-01-01

    © 2014 Institute of Mathematical Statistics. Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piecewise constant approximation on the generated partition [IEEE Trans. Inform. Theory 52 (2006) 1335.1353; Mach. Learn. 66 (2007) 209.242], we consider decorated trees, which allow us to derive higher order methods. Convergence rates for these methods are derived in terms the parameter - of margin conditions and a rate s of best approximation of the Bayes set by decorated adaptive partitions. They can also be expressed in terms of the Besov smoothness β of the regression function that governs its approximability by piecewise polynomials on adaptive partition. The execution of the algorithms does not require knowledge of the smoothness or margin conditions. Besov smoothness conditions are weaker than the commonly used Holder conditions, which govern approximation by nonadaptive partitions, and therefore for a given regression function can result in a higher rate of convergence. This in turn mitigates the compatibility conflict between smoothness and margin parameters.

  9. Classification algorithms using adaptive partitioning

    KAUST Repository

    Binev, Peter

    2014-12-01

    © 2014 Institute of Mathematical Statistics. Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piecewise constant approximation on the generated partition [IEEE Trans. Inform. Theory 52 (2006) 1335.1353; Mach. Learn. 66 (2007) 209.242], we consider decorated trees, which allow us to derive higher order methods. Convergence rates for these methods are derived in terms the parameter - of margin conditions and a rate s of best approximation of the Bayes set by decorated adaptive partitions. They can also be expressed in terms of the Besov smoothness β of the regression function that governs its approximability by piecewise polynomials on adaptive partition. The execution of the algorithms does not require knowledge of the smoothness or margin conditions. Besov smoothness conditions are weaker than the commonly used Holder conditions, which govern approximation by nonadaptive partitions, and therefore for a given regression function can result in a higher rate of convergence. This in turn mitigates the compatibility conflict between smoothness and margin parameters.

  10. Assessing Electronic Cigarette-Related Tweets for Sentiment and Content Using Supervised Machine Learning

    OpenAIRE

    Cole-Lewis, Heather; Varghese, Arun; Sanders, Amy; Schwarz, Mary; Pugatch, Jillian; Augustson, Erik

    2015-01-01

    Background Electronic cigarettes (e-cigarettes) continue to be a growing topic among social media users, especially on Twitter. The ability to analyze conversations about e-cigarettes in real-time can provide important insight into trends in the public?s knowledge, attitudes, and beliefs surrounding e-cigarettes, and subsequently guide public health interventions. Objective Our aim was to establish a supervised machine learning algorithm to build predictive classification models that assess T...

  11. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    Science.gov (United States)

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Objectness Supervised Merging Algorithm for Color Image Segmentation

    Directory of Open Access Journals (Sweden)

    Haifeng Sima

    2016-01-01

    Full Text Available Ideal color image segmentation needs both low-level cues and high-level semantic features. This paper proposes a two-hierarchy segmentation model based on merging homogeneous superpixels. First, a region growing strategy is designed for producing homogenous and compact superpixels in different partitions. Total variation smoothing features are adopted in the growing procedure for locating real boundaries. Before merging, we define a combined color-texture histogram feature for superpixels description and, meanwhile, a novel objectness feature is proposed to supervise the region merging procedure for reliable segmentation. Both color-texture histograms and objectness are computed to measure regional similarities between region pairs, and the mixed standard deviation of the union features is exploited to make stop criteria for merging process. Experimental results on the popular benchmark dataset demonstrate the better segmentation performance of the proposed model compared to other well-known segmentation algorithms.

  13. Conditional High-Order Boltzmann Machines for Supervised Relation Learning.

    Science.gov (United States)

    Huang, Yan; Wang, Wei; Wang, Liang; Tan, Tieniu

    2017-09-01

    Relation learning is a fundamental problem in many vision tasks. Recently, high-order Boltzmann machine and its variants have shown their great potentials in learning various types of data relation in a range of tasks. But most of these models are learned in an unsupervised way, i.e., without using relation class labels, which are not very discriminative for some challenging tasks, e.g., face verification. In this paper, with the goal to perform supervised relation learning, we introduce relation class labels into conventional high-order multiplicative interactions with pairwise input samples, and propose a conditional high-order Boltzmann Machine (CHBM), which can learn to classify the data relation in a binary classification way. To be able to deal with more complex data relation, we develop two improved variants of CHBM: 1) latent CHBM, which jointly performs relation feature learning and classification, by using a set of latent variables to block the pathway from pairwise input samples to output relation labels and 2) gated CHBM, which untangles factors of variation in data relation, by exploiting a set of latent variables to multiplicatively gate the classification of CHBM. To reduce the large number of model parameters generated by the multiplicative interactions, we approximately factorize high-order parameter tensors into multiple matrices. Then, we develop efficient supervised learning algorithms, by first pretraining the models using joint likelihood to provide good parameter initialization, and then finetuning them using conditional likelihood to enhance the discriminant ability. We apply the proposed models to a series of tasks including invariant recognition, face verification, and action similarity labeling. Experimental results demonstrate that by exploiting supervised relation labels, our models can greatly improve the performance.

  14. Optimization of Neuro-Fuzzy System Using Genetic Algorithm for Chromosome Classification

    Directory of Open Access Journals (Sweden)

    M. Sarosa

    2013-09-01

    Full Text Available Neuro-fuzzy system has been shown to provide a good performance on chromosome classification but does not offer a simple method to obtain the accurate parameter values required to yield the best recognition rate. This paper presents a neuro-fuzzy system where its parameters can be automatically adjusted using genetic algorithms. The approach combines the advantages of fuzzy logic theory, neural networks, and genetic algorithms. The structure consists of a four layer feed-forward neural network that uses a GBell membership function as the output function. The proposed methodology has been applied and tested on banded chromosome classification from the Copenhagen Chromosome Database. Simulation result showed that the proposed neuro-fuzzy system optimized by genetic algorithms offers advantages in setting the parameter values, improves the recognition rate significantly and decreases the training/testing time which makes genetic neuro-fuzzy system suitable for chromosome classification.

  15. Comparison analysis for classification algorithm in data mining and the study of model use

    Science.gov (United States)

    Chen, Junde; Zhang, Defu

    2018-04-01

    As a key technique in data mining, classification algorithm was received extensive attention. Through an experiment of classification algorithm in UCI data set, we gave a comparison analysis method for the different algorithms and the statistical test was used here. Than that, an adaptive diagnosis model for preventive electricity stealing and leakage was given as a specific case in the paper.

  16. Detecting Hijacked Journals by Using Classification Algorithms.

    Science.gov (United States)

    Andoohgin Shahri, Mona; Jazi, Mohammad Davarpanah; Borchardt, Glenn; Dadkhah, Mehdi

    2018-04-01

    Invalid journals are recent challenges in the academic world and many researchers are unacquainted with the phenomenon. The number of victims appears to be accelerating. Researchers might be suspicious of predatory journals because they have unfamiliar names, but hijacked journals are imitations of well-known, reputable journals whose websites have been hijacked. Hijacked journals issue calls for papers via generally laudatory emails that delude researchers into paying exorbitant page charges for publication in a nonexistent journal. This paper presents a method for detecting hijacked journals by using a classification algorithm. The number of published articles exposing hijacked journals is limited and most of them use simple techniques that are limited to specific journals. Hence we needed to amass Internet addresses and pertinent data for analyzing this type of attack. We inspected the websites of 104 scientific journals by using a classification algorithm that used criteria common to reputable journals. We then prepared a decision tree that we used to test five journals we knew were authentic and five we knew were hijacked.

  17. Automated labelling of cancer textures in colorectal histopathology slides using quasi-supervised learning.

    Science.gov (United States)

    Onder, Devrim; Sarioglu, Sulen; Karacali, Bilge

    2013-04-01

    Quasi-supervised learning is a statistical learning algorithm that contrasts two datasets by computing estimate for the posterior probability of each sample in either dataset. This method has not been applied to histopathological images before. The purpose of this study is to evaluate the performance of the method to identify colorectal tissues with or without adenocarcinoma. Light microscopic digital images from histopathological sections were obtained from 30 colorectal radical surgery materials including adenocarcinoma and non-neoplastic regions. The texture features were extracted by using local histograms and co-occurrence matrices. The quasi-supervised learning algorithm operates on two datasets, one containing samples of normal tissues labelled only indirectly, and the other containing an unlabeled collection of samples of both normal and cancer tissues. As such, the algorithm eliminates the need for manually labelled samples of normal and cancer tissues for conventional supervised learning and significantly reduces the expert intervention. Several texture feature vector datasets corresponding to different extraction parameters were tested within the proposed framework. The Independent Component Analysis dimensionality reduction approach was also identified as the one improving the labelling performance evaluated in this series. In this series, the proposed method was applied to the dataset of 22,080 vectors with reduced dimensionality 119 from 132. Regions containing cancer tissue could be identified accurately having false and true positive rates up to 19% and 88% respectively without using manually labelled ground-truth datasets in a quasi-supervised strategy. The resulting labelling performances were compared to that of a conventional powerful supervised classifier using manually labelled ground-truth data. The supervised classifier results were calculated as 3.5% and 95% for the same case. The results in this series in comparison with the benchmark

  18. Multi-sparse dictionary colorization algorithm based on the feature classification and detail enhancement

    Science.gov (United States)

    Yan, Dan; Bai, Lianfa; Zhang, Yi; Han, Jing

    2018-02-01

    For the problems of missing details and performance of the colorization based on sparse representation, we propose a conceptual model framework for colorizing gray-scale images, and then a multi-sparse dictionary colorization algorithm based on the feature classification and detail enhancement (CEMDC) is proposed based on this framework. The algorithm can achieve a natural colorized effect for a gray-scale image, and it is consistent with the human vision. First, the algorithm establishes a multi-sparse dictionary classification colorization model. Then, to improve the accuracy rate of the classification, the corresponding local constraint algorithm is proposed. Finally, we propose a detail enhancement based on Laplacian Pyramid, which is effective in solving the problem of missing details and improving the speed of image colorization. In addition, the algorithm not only realizes the colorization of the visual gray-scale image, but also can be applied to the other areas, such as color transfer between color images, colorizing gray fusion images, and infrared images.

  19. An evaluation of classification algorithms for intrusion detection ...

    African Journals Online (AJOL)

    An evaluation of classification algorithms for intrusion detection. ... Log in or Register to get access to full text downloads. ... Most of the available IDSs use all the 41 features in the network to evaluate and search for intrusive pattern in which ...

  20. Using Hierarchical Time Series Clustering Algorithm and Wavelet Classifier for Biometric Voice Classification

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2012-01-01

    Full Text Available Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers’ gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm.

  1. Quasi-supervised scoring of human sleep in polysomnograms using augmented input variables.

    Science.gov (United States)

    Yaghouby, Farid; Sunderam, Sridhar

    2015-04-01

    The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18-79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models-specifically Gaussian mixtures and hidden Markov models--are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's Κ statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Identification of Village Building via Google Earth Images and Supervised Machine Learning Methods

    Directory of Open Access Journals (Sweden)

    Zhiling Guo

    2016-03-01

    Full Text Available In this study, a method based on supervised machine learning is proposed to identify village buildings from open high-resolution remote sensing images. We select Google Earth (GE RGB images to perform the classification in order to examine its suitability for village mapping, and investigate the feasibility of using machine learning methods to provide automatic classification in such fields. By analyzing the characteristics of GE images, we design different features on the basis of two kinds of supervised machine learning methods for classification: adaptive boosting (AdaBoost and convolutional neural networks (CNN. To recognize village buildings via their color and texture information, the RGB color features and a large number of Haar-like features in a local window are utilized in the AdaBoost method; with multilayer trained networks based on gradient descent algorithms and back propagation, CNN perform the identification by mining deeper information from buildings and their neighborhood. Experimental results from the testing area at Savannakhet province in Laos show that our proposed AdaBoost method achieves an overall accuracy of 96.22% and the CNN method is also competitive with an overall accuracy of 96.30%.

  3. Structure-Based Algorithms for Microvessel Classification

    KAUST Repository

    Smith, Amy F.

    2015-02-01

    © 2014 The Authors. Microcirculation published by John Wiley & Sons Ltd. Objective: Recent developments in high-resolution imaging techniques have enabled digital reconstruction of three-dimensional sections of microvascular networks down to the capillary scale. To better interpret these large data sets, our goal is to distinguish branching trees of arterioles and venules from capillaries. Methods: Two novel algorithms are presented for classifying vessels in microvascular anatomical data sets without requiring flow information. The algorithms are compared with a classification based on observed flow directions (considered the gold standard), and with an existing resistance-based method that relies only on structural data. Results: The first algorithm, developed for networks with one arteriolar and one venular tree, performs well in identifying arterioles and venules and is robust to parameter changes, but incorrectly labels a significant number of capillaries as arterioles or venules. The second algorithm, developed for networks with multiple inlets and outlets, correctly identifies more arterioles and venules, but is more sensitive to parameter changes. Conclusions: The algorithms presented here can be used to classify microvessels in large microvascular data sets lacking flow information. This provides a basis for analyzing the distinct geometrical properties and modelling the functional behavior of arterioles, capillaries, and venules.

  4. Polarimetric SAR Image Classification Using Multiple-feature Fusion and Ensemble Learning

    Directory of Open Access Journals (Sweden)

    Sun Xun

    2016-12-01

    Full Text Available In this paper, we propose a supervised classification algorithm for Polarimetric Synthetic Aperture Radar (PolSAR images using multiple-feature fusion and ensemble learning. First, we extract different polarimetric features, including extended polarimetric feature space, Hoekman, Huynen, H/alpha/A, and fourcomponent scattering features of PolSAR images. Next, we randomly select two types of features each time from all feature sets to guarantee the reliability and diversity of later ensembles and use a support vector machine as the basic classifier for predicting classification results. Finally, we concatenate all prediction probabilities of basic classifiers as the final feature representation and employ the random forest method to obtain final classification results. Experimental results at the pixel and region levels show the effectiveness of the proposed algorithm.

  5. Analysis and Evaluation of IKONOS Image Fusion Algorithm Based on Land Cover Classification

    Institute of Scientific and Technical Information of China (English)

    Xia; JING; Yan; BAO

    2015-01-01

    Different fusion algorithm has its own advantages and limitations,so it is very difficult to simply evaluate the good points and bad points of the fusion algorithm. Whether an algorithm was selected to fuse object images was also depended upon the sensor types and special research purposes. Firstly,five fusion methods,i. e. IHS,Brovey,PCA,SFIM and Gram-Schmidt,were briefly described in the paper. And then visual judgment and quantitative statistical parameters were used to assess the five algorithms. Finally,in order to determine which one is the best suitable fusion method for land cover classification of IKONOS image,the maximum likelihood classification( MLC) was applied using the above five fusion images. The results showed that the fusion effect of SFIM transform and Gram-Schmidt transform were better than the other three image fusion methods in spatial details improvement and spectral information fidelity,and Gram-Schmidt technique was superior to SFIM transform in the aspect of expressing image details. The classification accuracy of the fused image using Gram-Schmidt and SFIM algorithms was higher than that of the other three image fusion methods,and the overall accuracy was greater than 98%. The IHS-fused image classification accuracy was the lowest,the overall accuracy and kappa coefficient were 83. 14% and 0. 76,respectively. Thus the IKONOS fusion images obtained by the Gram-Schmidt and SFIM were better for improving the land cover classification accuracy.

  6. A Novel Algorithm for Imbalance Data Classification Based on Neighborhood Hypergraph

    Directory of Open Access Journals (Sweden)

    Feng Hu

    2014-01-01

    Full Text Available The classification problem for imbalance data is paid more attention to. So far, many significant methods are proposed and applied to many fields. But more efficient methods are needed still. Hypergraph may not be powerful enough to deal with the data in boundary region, although it is an efficient tool to knowledge discovery. In this paper, the neighborhood hypergraph is presented, combining rough set theory and hypergraph. After that, a novel classification algorithm for imbalance data based on neighborhood hypergraph is developed, which is composed of three steps: initialization of hyperedge, classification of training data set, and substitution of hyperedge. After conducting an experiment of 10-fold cross validation on 18 data sets, the proposed algorithm has higher average accuracy than others.

  7. Classifier Directed Data Hybridization for Geographic Sample Supervised Segment Generation

    Directory of Open Access Journals (Sweden)

    Christoff Fourie

    2014-11-01

    Full Text Available Quality segment generation is a well-known challenge and research objective within Geographic Object-based Image Analysis (GEOBIA. Although methodological avenues within GEOBIA are diverse, segmentation commonly plays a central role in most approaches, influencing and being influenced by surrounding processes. A general approach using supervised quality measures, specifically user provided reference segments, suggest casting the parameters of a given segmentation algorithm as a multidimensional search problem. In such a sample supervised segment generation approach, spatial metrics observing the user provided reference segments may drive the search process. The search is commonly performed by metaheuristics. A novel sample supervised segment generation approach is presented in this work, where the spectral content of provided reference segments is queried. A one-class classification process using spectral information from inside the provided reference segments is used to generate a probability image, which in turn is employed to direct a hybridization of the original input imagery. Segmentation is performed on such a hybrid image. These processes are adjustable, interdependent and form a part of the search problem. Results are presented detailing the performances of four method variants compared to the generic sample supervised segment generation approach, under various conditions in terms of resultant segment quality, required computing time and search process characteristics. Multiple metrics, metaheuristics and segmentation algorithms are tested with this approach. Using the spectral data contained within user provided reference segments to tailor the output generally improves the results in the investigated problem contexts, but at the expense of additional required computing time.

  8. A novel Neuro-fuzzy classification technique for data mining

    Directory of Open Access Journals (Sweden)

    Soumadip Ghosh

    2014-11-01

    Full Text Available In our study, we proposed a novel Neuro-fuzzy classification technique for data mining. The inputs to the Neuro-fuzzy classification system were fuzzified by applying generalized bell-shaped membership function. The proposed method utilized a fuzzification matrix in which the input patterns were associated with a degree of membership to different classes. Based on the value of degree of membership a pattern would be attributed to a specific category or class. We applied our method to ten benchmark data sets from the UCI machine learning repository for classification. Our objective was to analyze the proposed method and, therefore compare its performance with two powerful supervised classification algorithms Radial Basis Function Neural Network (RBFNN and Adaptive Neuro-fuzzy Inference System (ANFIS. We assessed the performance of these classification methods in terms of different performance measures such as accuracy, root-mean-square error, kappa statistic, true positive rate, false positive rate, precision, recall, and f-measure. In every aspect the proposed method proved to be superior to RBFNN and ANFIS algorithms.

  9. An Accurate CT Saturation Classification Using a Deep Learning Approach Based on Unsupervised Feature Extraction and Supervised Fine-Tuning Strategy

    Directory of Open Access Journals (Sweden)

    Muhammad Ali

    2017-11-01

    Full Text Available Current transformer (CT saturation is one of the significant problems for protection engineers. If CT saturation is not tackled properly, it can cause a disastrous effect on the stability of the power system, and may even create a complete blackout. To cope with CT saturation properly, an accurate detection or classification should be preceded. Recently, deep learning (DL methods have brought a subversive revolution in the field of artificial intelligence (AI. This paper presents a new DL classification method based on unsupervised feature extraction and supervised fine-tuning strategy to classify the saturated and unsaturated regions in case of CT saturation. In other words, if protection system is subjected to a CT saturation, proposed method will correctly classify the different levels of saturation with a high accuracy. Traditional AI methods are mostly based on supervised learning and rely heavily on human crafted features. This paper contributes to an unsupervised feature extraction, using autoencoders and deep neural networks (DNNs to extract features automatically without prior knowledge of optimal features. To validate the effectiveness of proposed method, a variety of simulation tests are conducted, and classification results are analyzed using standard classification metrics. Simulation results confirm that proposed method classifies the different levels of CT saturation with a remarkable accuracy and has unique feature extraction capabilities. Lastly, we provided a potential future research direction to conclude this paper.

  10. Establishing a Supervised Classification of Global Blue Carbon Mangrove Ecosystems

    Science.gov (United States)

    Baltezar, P.

    2016-12-01

    Understanding change in mangroves over time will aid forest management systems working to protect them from over exploitation. Mangroves are one of the most carbon dense terrestrial ecosystems on the planet and are therefore a high priority for sustainable forest management. Although they represent 1% of terrestrial cover, they could account for about 10% of global carbon emissions. The foundation of this analysis uses remote sensing to establish a supervised classification of mangrove forests for discrete regions in the Zambezi Delta of Mozambique and the Rufiji Delta of Tanzania. Open-source mapping platforms provided a dynamic space for analyzing satellite imagery in the Google Earth Engine (GEE) coding environment. C-Band Synthetic Aperture Radar data from Sentinel 1 was used in the model as a mask by optimizing SAR parameters. Exclusion metrics identified within Global Land Surface Temperature data from MODIS and the Shuttle Radar Topography Mission were used to accentuate mangrove features. Variance was accounted for in exclusion metrics by statistically calculating thresholds for radar, thermal, and elevation data. Optical imagery from the Landsat 8 archive aided a quality mosaic in extracting the highest spectral index values most appropriate for vegetative mapping. The enhanced radar, thermal, and digital elevation imagery were then incorporated into the quality mosaic. Training sites were selected from Google Earth imagery and used in the classification with a resulting output of four mangrove cover map models for each site. The model was assessed for accuracy by observing the differences between the mangrove classification models to the reference maps. Although the model was over predicting mangroves in non-mangrove regions, it was more accurately classifying mangrove regions established by the references. Future refinements will expand the model with an objective degree of accuracy.

  11. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  12. A method for classification of network traffic based on C5.0 Machine Learning Algorithm

    DEFF Research Database (Denmark)

    Bujlow, Tomasz; Riaz, M. Tahir; Pedersen, Jens Myrup

    2012-01-01

    current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of statistical traffic information received from volunteers and C5.0 algorithm we constructed a boosted classifier, which was shown...... and classification, an algorithm for recognizing flow direction and the C5.0 itself. Classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming and SSH. We performed subsequent tries using different sets of parameters and both training and classification options...

  13. A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification

    CERN Document Server

    Weber, I; Henzinger, M; Baykan, E

    2011-01-01

    Given only the URL of a Web page, can we identify its topic? We study this problem in detail by exploring a large number of different feature sets and algorithms on several datasets. We also show that the inherent overlap between topics and the sparsity of the information in URLs makes this a very challenging problem. Web page classification without a page's content is desirable when the content is not available at all, when a classification is needed before obtaining the content, or when classification speed is of utmost importance. For our experiments we used five different corpora comprising a total of about 3 million (URL, classification) pairs. We evaluated several techniques for feature generation and classification algorithms. The individual binary classifiers were then combined via boosting into metabinary classifiers. We achieve typical F-measure values between 80 and 85, and a typical precision of around 86. The precision can be pushed further over 90 while maintaining a typical level of recall betw...

  14. Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer

    Science.gov (United States)

    Ruske, Simon; Topping, David O.; Foot, Virginia E.; Kaye, Paul H.; Stanley, Warren R.; Crawford, Ian; Morse, Andrew P.; Gallagher, Martin W.

    2017-03-01

    Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was

  15. Novel Approaches for Diagnosing Melanoma Skin Lesions Through Supervised and Deep Learning Algorithms.

    Science.gov (United States)

    Premaladha, J; Ravichandran, K S

    2016-04-01

    Dermoscopy is a technique used to capture the images of skin, and these images are useful to analyze the different types of skin diseases. Malignant melanoma is a kind of skin cancer whose severity even leads to death. Earlier detection of melanoma prevents death and the clinicians can treat the patients to increase the chances of survival. Only few machine learning algorithms are developed to detect the melanoma using its features. This paper proposes a Computer Aided Diagnosis (CAD) system which equips efficient algorithms to classify and predict the melanoma. Enhancement of the images are done using Contrast Limited Adaptive Histogram Equalization technique (CLAHE) and median filter. A new segmentation algorithm called Normalized Otsu's Segmentation (NOS) is implemented to segment the affected skin lesion from the normal skin, which overcomes the problem of variable illumination. Fifteen features are derived and extracted from the segmented images are fed into the proposed classification techniques like Deep Learning based Neural Networks and Hybrid Adaboost-Support Vector Machine (SVM) algorithms. The proposed system is tested and validated with nearly 992 images (malignant & benign lesions) and it provides a high classification accuracy of 93 %. The proposed CAD system can assist the dermatologists to confirm the decision of the diagnosis and to avoid excisional biopsies.

  16. Supervised Classification Processes for the Characterization of Heritage Elements, Case Study: Cuenca-Ecuador

    Science.gov (United States)

    Briones, J. C.; Heras, V.; Abril, C.; Sinchi, E.

    2017-08-01

    The proper control of built heritage entails many challenges related to the complexity of heritage elements and the extent of the area to be managed, for which the available resources must be efficiently used. In this scenario, the preventive conservation approach, based on the concept that prevent is better than cure, emerges as a strategy to avoid the progressive and imminent loss of monuments and heritage sites. Regular monitoring appears as a key tool to identify timely changes in heritage assets. This research demonstrates that the supervised learning model (Support Vector Machines - SVM) is an ideal tool that supports the monitoring process detecting visible elements in aerial images such as roofs structures, vegetation and pavements. The linear, gaussian and polynomial kernel functions were tested; the lineal function provided better results over the other functions. It is important to mention that due to the high level of segmentation generated by the classification procedure, it was necessary to apply a generalization process through opening a mathematical morphological operation, which simplified the over classification for the monitored elements.

  17. Can Automatic Classification Help to Increase Accuracy in Data Collection?

    Directory of Open Access Journals (Sweden)

    Frederique Lang

    2016-09-01

    Full Text Available Purpose: The authors aim at testing the performance of a set of machine learning algorithms that could improve the process of data cleaning when building datasets. Design/methodology/approach: The paper is centered on cleaning datasets gathered from publishers and online resources by the use of specific keywords. In this case, we analyzed data from the Web of Science. The accuracy of various forms of automatic classification was tested here in comparison with manual coding in order to determine their usefulness for data collection and cleaning. We assessed the performance of seven supervised classification algorithms (Support Vector Machine (SVM, Scaled Linear Discriminant Analysis, Lasso and elastic-net regularized generalized linear models, Maximum Entropy, Regression Tree, Boosting, and Random Forest and analyzed two properties: accuracy and recall. We assessed not only each algorithm individually, but also their combinations through a voting scheme. We also tested the performance of these algorithms with different sizes of training data. When assessing the performance of different combinations, we used an indicator of coverage to account for the agreement and disagreement on classification between algorithms. Findings: We found that the performance of the algorithms used vary with the size of the sample for training. However, for the classification exercise in this paper the best performing algorithms were SVM and Boosting. The combination of these two algorithms achieved a high agreement on coverage and was highly accurate. This combination performs well with a small training dataset (10%, which may reduce the manual work needed for classification tasks. Research limitations: The dataset gathered has significantly more records related to the topic of interest compared to unrelated topics. This may affect the performance of some algorithms, especially in their identification of unrelated papers. Practical implications: Although the

  18. Supervised versus unsupervised categorization: two sides of the same coin?

    Science.gov (United States)

    Pothos, Emmanuel M; Edwards, Darren J; Perlman, Amotz

    2011-09-01

    Supervised and unsupervised categorization have been studied in separate research traditions. A handful of studies have attempted to explore a possible convergence between the two. The present research builds on these studies, by comparing the unsupervised categorization results of Pothos et al. ( 2011 ; Pothos et al., 2008 ) with the results from two procedures of supervised categorization. In two experiments, we tested 375 participants with nine different stimulus sets and examined the relation between ease of learning of a classification, memory for a classification, and spontaneous preference for a classification. After taking into account the role of the number of category labels (clusters) in supervised learning, we found the three variables to be closely associated with each other. Our results provide encouragement for researchers seeking unified theoretical explanations for supervised and unsupervised categorization, but raise a range of challenging theoretical questions.

  19. Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance

    Science.gov (United States)

    Ruan, Yue; Xue, Xiling; Liu, Heng; Tan, Jianing; Li, Xi

    2017-11-01

    K-nearest neighbors (KNN) algorithm is a common algorithm used for classification, and also a sub-routine in various complicated machine learning tasks. In this paper, we presented a quantum algorithm (QKNN) for implementing this algorithm based on the metric of Hamming distance. We put forward a quantum circuit for computing Hamming distance between testing sample and each feature vector in the training set. Taking advantage of this method, we realized a good analog for classical KNN algorithm by setting a distance threshold value t to select k - n e a r e s t neighbors. As a result, QKNN achieves O( n 3) performance which is only relevant to the dimension of feature vectors and high classification accuracy, outperforms Llyod's algorithm (Lloyd et al. 2013) and Wiebe's algorithm (Wiebe et al. 2014).

  20. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis.

    Science.gov (United States)

    Al-Rajab, Murad; Lu, Joan; Xu, Qiang

    2017-07-01

    This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2011-01-01

    Full Text Available Packet classification plays a crucial role for a number of network services such as policy-based routing, firewalls, and traffic billing, to name a few. However, classification can be a bottleneck in the above-mentioned applications if not implemented properly and efficiently. In this paper, we propose PCIU, a novel classification algorithm, which improves upon previously published work. PCIU provides lower preprocessing time, lower memory consumption, ease of incremental rule update, and reasonable classification time compared to state-of-the-art algorithms. The proposed algorithm was evaluated and compared to RFC and HiCut using several benchmarks. Results obtained indicate that PCIU outperforms these algorithms in terms of speed, memory usage, incremental update capability, and preprocessing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Two such implementations are detailed and discussed in this paper. The results indicate that a hardware/software codesign approach results in a slower, but easier to optimize and improve within time constraints, PCIU solution. A hardware accelerator based on an ESL approach using Handel-C, on the other hand, resulted in a 31x speed-up over a pure software implementation running on a state of the art Xeon processor.

  2. Automatic classification of time-variable X-ray sources

    Energy Technology Data Exchange (ETDEWEB)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M. [Sydney Institute for Astronomy, School of Physics, The University of Sydney, Sydney, NSW 2006 (Australia)

    2014-05-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  3. Automatic classification of time-variable X-ray sources

    International Nuclear Information System (INIS)

    Lo, Kitty K.; Farrell, Sean; Murphy, Tara; Gaensler, B. M.

    2014-01-01

    To maximize the discovery potential of future synoptic surveys, especially in the field of transient science, it will be necessary to use automatic classification to identify some of the astronomical sources. The data mining technique of supervised classification is suitable for this problem. Here, we present a supervised learning method to automatically classify variable X-ray sources in the Second XMM-Newton Serendipitous Source Catalog (2XMMi-DR2). Random Forest is our classifier of choice since it is one of the most accurate learning algorithms available. Our training set consists of 873 variable sources and their features are derived from time series, spectra, and other multi-wavelength contextual information. The 10 fold cross validation accuracy of the training data is ∼97% on a 7 class data set. We applied the trained classification model to 411 unknown variable 2XMM sources to produce a probabilistically classified catalog. Using the classification margin and the Random Forest derived outlier measure, we identified 12 anomalous sources, of which 2XMM J180658.7–500250 appears to be the most unusual source in the sample. Its X-ray spectra is suggestive of a ultraluminous X-ray source but its variability makes it highly unusual. Machine-learned classification and anomaly detection will facilitate scientific discoveries in the era of all-sky surveys.

  4. Semi-supervised learning for ordinal Kernel Discriminant Analysis.

    Science.gov (United States)

    Pérez-Ortiz, M; Gutiérrez, P A; Carbonero-Ruz, M; Hervás-Martínez, C

    2016-12-01

    Ordinal classification considers those classification problems where the labels of the variable to predict follow a given order. Naturally, labelled data is scarce or difficult to obtain in this type of problems because, in many cases, ordinal labels are given by a user or expert (e.g. in recommendation systems). Firstly, this paper develops a new strategy for ordinal classification where both labelled and unlabelled data are used in the model construction step (a scheme which is referred to as semi-supervised learning). More specifically, the ordinal version of kernel discriminant learning is extended for this setting considering the neighbourhood information of unlabelled data, which is proposed to be computed in the feature space induced by the kernel function. Secondly, a new method for semi-supervised kernel learning is devised in the context of ordinal classification, which is combined with our developed classification strategy to optimise the kernel parameters. The experiments conducted compare 6 different approaches for semi-supervised learning in the context of ordinal classification in a battery of 30 datasets, showing (1) the good synergy of the ordinal version of discriminant analysis and the use of unlabelled data and (2) the advantage of computing distances in the feature space induced by the kernel function. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Semi-supervised weighted kernel clustering based on gravitational search for fault diagnosis.

    Science.gov (United States)

    Li, Chaoshun; Zhou, Jianzhong

    2014-09-01

    Supervised learning method, like support vector machine (SVM), has been widely applied in diagnosing known faults, however this kind of method fails to work correctly when new or unknown fault occurs. Traditional unsupervised kernel clustering can be used for unknown fault diagnosis, but it could not make use of the historical classification information to improve diagnosis accuracy. In this paper, a semi-supervised kernel clustering model is designed to diagnose known and unknown faults. At first, a novel semi-supervised weighted kernel clustering algorithm based on gravitational search (SWKC-GS) is proposed for clustering of dataset composed of labeled and unlabeled fault samples. The clustering model of SWKC-GS is defined based on wrong classification rate of labeled samples and fuzzy clustering index on the whole dataset. Gravitational search algorithm (GSA) is used to solve the clustering model, while centers of clusters, feature weights and parameter of kernel function are selected as optimization variables. And then, new fault samples are identified and diagnosed by calculating the weighted kernel distance between them and the fault cluster centers. If the fault samples are unknown, they will be added in historical dataset and the SWKC-GS is used to partition the mixed dataset and update the clustering results for diagnosing new fault. In experiments, the proposed method has been applied in fault diagnosis for rotatory bearing, while SWKC-GS has been compared not only with traditional clustering methods, but also with SVM and neural network, for known fault diagnosis. In addition, the proposed method has also been applied in unknown fault diagnosis. The results have shown effectiveness of the proposed method in achieving expected diagnosis accuracy for both known and unknown faults of rotatory bearing. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  6. Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

    Directory of Open Access Journals (Sweden)

    Jie Hu

    2018-02-01

    Full Text Available Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA based on the distributed Skip-gram model for patent classification. We also develop a set of quantitative performance measures for keyword extraction evaluation based on information gain and cross-validation, based on Support Vector Machine (SVM classification, which are valuable when human-annotated keywords are not available. We used a standard benchmark dataset and a homemade patent dataset to evaluate the performance of PKEA. Our patent dataset includes 2500 patents from five distinct technological fields related to autonomous cars (GPS systems, lidar systems, object recognition systems, radar systems, and vehicle control systems. We compared our method with Frequency, Term Frequency-Inverse Document Frequency (TF-IDF, TextRank and Rapid Automatic Keyword Extraction (RAKE. The experimental results show that our proposed algorithm provides a promising way to extract keywords from patent texts for patent classification.

  7. Automated Glioblastoma Segmentation Based on a Multiparametric Structured Unsupervised Classification

    Science.gov (United States)

    Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V.; Robles, Montserrat; Aparici, F.; Martí-Bonmatí, L.; García-Gómez, Juan M.

    2015-01-01

    Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation. PMID:25978453

  8. Classification of EEG Signals using adaptive weighted distance nearest neighbor algorithm

    Directory of Open Access Journals (Sweden)

    E. Parvinnia

    2014-01-01

    Full Text Available Electroencephalogram (EEG signals are often used to diagnose diseases such as seizure, alzheimer, and schizophrenia. One main problem with the recorded EEG samples is that they are not equally reliable due to the artifacts at the time of recording. EEG signal classification algorithms should have a mechanism to handle this issue. It seems that using adaptive classifiers can be useful for the biological signals such as EEG. In this paper, a general adaptive method named weighted distance nearest neighbor (WDNN is applied for EEG signal classification to tackle this problem. This classification algorithm assigns a weight to each training sample to control its influence in classifying test samples. The weights of training samples are used to find the nearest neighbor of an input query pattern. To assess the performance of this scheme, EEG signals of thirteen schizophrenic patients and eighteen normal subjects are analyzed for the classification of these two groups. Several features including, fractal dimension, band power and autoregressive (AR model are extracted from EEG signals. The classification results are evaluated using Leave one (subject out cross validation for reliable estimation. The results indicate that combination of WDNN and selected features can significantly outperform the basic nearest-neighbor and the other methods proposed in the past for the classification of these two groups. Therefore, this method can be a complementary tool for specialists to distinguish schizophrenia disorder.

  9. HClass: Automatic classification tool for health pathologies using artificial intelligence techniques.

    Science.gov (United States)

    Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya

    2015-01-01

    The classification of subjects' pathologies enables a rigorousness to be applied to the treatment of certain pathologies, as doctors on occasions play with so many variables that they can end up confusing some illnesses with others. Thanks to Machine Learning techniques applied to a health-record database, it is possible to make using our algorithm. hClass contains a non-linear classification of either a supervised, non-supervised or semi-supervised type. The machine is configured using other techniques such as validation of the set to be classified (cross-validation), reduction in features (PCA) and committees for assessing the various classifiers. The tool is easy to use, and the sample matrix and features that one wishes to classify, the number of iterations and the subjects who are going to be used to train the machine all need to be introduced as inputs. As a result, the success rate is shown either via a classifier or via a committee if one has been formed. A 90% success rate is obtained in the ADABoost classifier and 89.7% in the case of a committee (comprising three classifiers) when PCA is applied. This tool can be expanded to allow the user to totally characterise the classifiers by adjusting them to each classification use.

  10. Classification algorithm of Web document in ionization radiation

    International Nuclear Information System (INIS)

    Geng Zengmin; Liu Wanchun

    2005-01-01

    Resources in the Internet is numerous. It is one of research directions of Web mining (WM) how to mine the resource of some calling or trade more efficiently. The paper studies the classification of Web document in ionization radiation (IR) based on the algorithm of Bayes, Rocchio, Widrow-Hoff, and analyses the result of trial effect. (authors)

  11. Hyperspectral Image Classification With Markov Random Fields and a Convolutional Neural Network

    Science.gov (United States)

    Cao, Xiangyong; Zhou, Feng; Xu, Lin; Meng, Deyu; Xu, Zongben; Paisley, John

    2018-05-01

    This paper presents a new supervised classification algorithm for remotely sensed hyperspectral image (HSI) which integrates spectral and spatial information in a unified Bayesian framework. First, we formulate the HSI classification problem from a Bayesian perspective. Then, we adopt a convolutional neural network (CNN) to learn the posterior class distributions using a patch-wise training strategy to better use the spatial information. Next, spatial information is further considered by placing a spatial smoothness prior on the labels. Finally, we iteratively update the CNN parameters using stochastic gradient decent (SGD) and update the class labels of all pixel vectors using an alpha-expansion min-cut-based algorithm. Compared with other state-of-the-art methods, the proposed classification method achieves better performance on one synthetic dataset and two benchmark HSI datasets in a number of experimental settings.

  12. Gaia eclipsing binary and multiple systems. Supervised classification and self-organizing maps

    Science.gov (United States)

    Süveges, M.; Barblan, F.; Lecoeur-Taïbi, I.; Prša, A.; Holl, B.; Eyer, L.; Kochoska, A.; Mowlavi, N.; Rimoldini, L.

    2017-07-01

    Context. Large surveys producing tera- and petabyte-scale databases require machine-learning and knowledge discovery methods to deal with the overwhelming quantity of data and the difficulties of extracting concise, meaningful information with reliable assessment of its uncertainty. This study investigates the potential of a few machine-learning methods for the automated analysis of eclipsing binaries in the data of such surveys. Aims: We aim to aid the extraction of samples of eclipsing binaries from such databases and to provide basic information about the objects. We intend to estimate class labels according to two different, well-known classification systems, one based on the light curve morphology (EA/EB/EW classes) and the other based on the physical characteristics of the binary system (system morphology classes; detached through overcontact systems). Furthermore, we explore low-dimensional surfaces along which the light curves of eclipsing binaries are concentrated, and consider their use in the characterization of the binary systems and in the exploration of biases of the full unknown Gaia data with respect to the training sets. Methods: We have explored the performance of principal component analysis (PCA), linear discriminant analysis (LDA), Random Forest classification and self-organizing maps (SOM) for the above aims. We pre-processed the photometric time series by combining a double Gaussian profile fit and a constrained smoothing spline, in order to de-noise and interpolate the observed light curves. We achieved further denoising, and selected the most important variability elements from the light curves using PCA. Supervised classification was performed using Random Forest and LDA based on the PC decomposition, while SOM gives a continuous 2-dimensional manifold of the light curves arranged by a few important features. We estimated the uncertainty of the supervised methods due to the specific finite training set using ensembles of models constructed

  13. Parameter tuning in the support vector machine and random forest and their performances in cross- and same-year crop classification using TerraSAR-X

    OpenAIRE

    Sonobe, Rei; Tani, Hiroshi; Wang, Xiufeng; Kobayashi, Nobuyuki; Shimamura, Hideki

    2014-01-01

    This article describes the comparison of three different classification algorithms for mapping crops in Hokkaido, Japan, using TerraSAR-X data. In the study area, beans, beets, grasslands, maize, potatoes, and winter wheat were cultivated. Although classification maps are required for both management and estimation of agricultural disaster compensation, those techniques have yet to be established. Some supervised learning models may allow accurate classification. Therefore, comparisons among ...

  14. Land-cover classification with an expert classification algorithm using digital aerial photographs

    Directory of Open Access Journals (Sweden)

    José L. de la Cruz

    2010-05-01

    Full Text Available The purpose of this study was to evaluate the usefulness of the spectral information of digital aerial sensors in determining land-cover classification using new digital techniques. The land covers that have been evaluated are the following, (1 bare soil, (2 cereals, including maize (Zea mays L., oats (Avena sativa L., rye (Secale cereale L., wheat (Triticum aestivum L. and barley (Hordeun vulgare L., (3 high protein crops, such as peas (Pisum sativum L. and beans (Vicia faba L., (4 alfalfa (Medicago sativa L., (5 woodlands and scrublands, including holly oak (Quercus ilex L. and common retama (Retama sphaerocarpa L., (6 urban soil, (7 olive groves (Olea europaea L. and (8 burnt crop stubble. The best result was obtained using an expert classification algorithm, achieving a reliability rate of 95%. This result showed that the images of digital airborne sensors hold considerable promise for the future in the field of digital classifications because these images contain valuable information that takes advantage of the geometric viewpoint. Moreover, new classification techniques reduce problems encountered using high-resolution images; while reliabilities are achieved that are better than those achieved with traditional methods.

  15. Classification of Polarimetric SAR Data Using Dictionary Learning

    DEFF Research Database (Denmark)

    Vestergaard, Jacob Schack; Nielsen, Allan Aasbjerg; Dahl, Anders Lindbjerg

    2012-01-01

    This contribution deals with classification of multilook fully polarimetric synthetic aperture radar (SAR) data by learning a dictionary of crop types present in the Foulum test site. The Foulum test site contains a large number of agricultural fields, as well as lakes, forests, natural vegetation......, grasslands and urban areas, which make it ideally suited for evaluation of classification algorithms. Dictionary learning centers around building a collection of image patches typical for the classification problem at hand. This requires initial manual labeling of the classes present in the data and is thus...... a method for supervised classification. Sparse coding of these image patches aims to maintain a proficient number of typical patches and associated labels. Data is consecutively classified by a nearest neighbor search of the dictionary elements and labeled with probabilities of each class. Each dictionary...

  16. Classification of underground pipe scanned images using feature extraction and neuro-fuzzy algorithm.

    Science.gov (United States)

    Sinha, S K; Karray, F

    2002-01-01

    Pipeline surface defects such as holes and cracks cause major problems for utility managers, particularly when the pipeline is buried under the ground. Manual inspection for surface defects in the pipeline has a number of drawbacks, including subjectivity, varying standards, and high costs. Automatic inspection system using image processing and artificial intelligence techniques can overcome many of these disadvantages and offer utility managers an opportunity to significantly improve quality and reduce costs. A recognition and classification of pipe cracks using images analysis and neuro-fuzzy algorithm is proposed. In the preprocessing step the scanned images of pipe are analyzed and crack features are extracted. In the classification step the neuro-fuzzy algorithm is developed that employs a fuzzy membership function and error backpropagation algorithm. The idea behind the proposed approach is that the fuzzy membership function will absorb variation of feature values and the backpropagation network, with its learning ability, will show good classification efficiency.

  17. An arrhythmia classification algorithm using a dedicated wavelet adapted to different subjects.

    Science.gov (United States)

    Kim, Jinkwon; Min, Se Dong; Lee, Myoungho

    2011-06-27

    Numerous studies have been conducted regarding a heartbeat classification algorithm over the past several decades. However, many algorithms have also been studied to acquire robust performance, as biosignals have a large amount of variation among individuals. Various methods have been proposed to reduce the differences coming from personal characteristics, but these expand the differences caused by arrhythmia. In this paper, an arrhythmia classification algorithm using a dedicated wavelet adapted to individual subjects is proposed. We reduced the performance variation using dedicated wavelets, as in the ECG morphologies of the subjects. The proposed algorithm utilizes morphological filtering and a continuous wavelet transform with a dedicated wavelet. A principal component analysis and linear discriminant analysis were utilized to compress the morphological data transformed by the dedicated wavelets. An extreme learning machine was used as a classifier in the proposed algorithm. A performance evaluation was conducted with the MIT-BIH arrhythmia database. The results showed a high sensitivity of 97.51%, specificity of 85.07%, accuracy of 97.94%, and a positive predictive value of 97.26%. The proposed algorithm achieves better accuracy than other state-of-the-art algorithms with no intrasubject between the training and evaluation datasets. And it significantly reduces the amount of intervention needed by physicians.

  18. An arrhythmia classification algorithm using a dedicated wavelet adapted to different subjects

    Directory of Open Access Journals (Sweden)

    Min Se Dong

    2011-06-01

    Full Text Available Abstract Background Numerous studies have been conducted regarding a heartbeat classification algorithm over the past several decades. However, many algorithms have also been studied to acquire robust performance, as biosignals have a large amount of variation among individuals. Various methods have been proposed to reduce the differences coming from personal characteristics, but these expand the differences caused by arrhythmia. Methods In this paper, an arrhythmia classification algorithm using a dedicated wavelet adapted to individual subjects is proposed. We reduced the performance variation using dedicated wavelets, as in the ECG morphologies of the subjects. The proposed algorithm utilizes morphological filtering and a continuous wavelet transform with a dedicated wavelet. A principal component analysis and linear discriminant analysis were utilized to compress the morphological data transformed by the dedicated wavelets. An extreme learning machine was used as a classifier in the proposed algorithm. Results A performance evaluation was conducted with the MIT-BIH arrhythmia database. The results showed a high sensitivity of 97.51%, specificity of 85.07%, accuracy of 97.94%, and a positive predictive value of 97.26%. Conclusions The proposed algorithm achieves better accuracy than other state-of-the-art algorithms with no intrasubject between the training and evaluation datasets. And it significantly reduces the amount of intervention needed by physicians.

  19. Optimization of an NLEO-based algorithm for automated detection of spontaneous activity transients in early preterm EEG

    International Nuclear Information System (INIS)

    Palmu, Kirsi; Vanhatalo, Sampsa; Stevenson, Nathan; Wikström, Sverre; Hellström-Westas, Lena; Palva, J Matias

    2010-01-01

    We propose here a simple algorithm for automated detection of spontaneous activity transients (SATs) in early preterm electroencephalography (EEG). The parameters of the algorithm were optimized by supervised learning using a gold standard created from visual classification data obtained from three human raters. The generalization performance of the algorithm was estimated by leave-one-out cross-validation. The mean sensitivity of the optimized algorithm was 97% (range 91–100%) and specificity 95% (76–100%). The optimized algorithm makes it possible to systematically study brain state fluctuations of preterm infants. (note)

  20. Automatic modulation classification principles, algorithms and applications

    CERN Document Server

    Zhu, Zhechen

    2014-01-01

    Automatic Modulation Classification (AMC) has been a key technology in many military, security, and civilian telecommunication applications for decades. In military and security applications, modulation often serves as another level of encryption; in modern civilian applications, multiple modulation types can be employed by a signal transmitter to control the data rate and link reliability. This book offers comprehensive documentation of AMC models, algorithms and implementations for successful modulation recognition. It provides an invaluable theoretical and numerical comparison of AMC algo

  1. A probablistic neural network classification system for signal and image processing

    Energy Technology Data Exchange (ETDEWEB)

    Bowman, B. [Lawrence Livermore National Lab., CA (United States)

    1994-11-15

    The Acoustical Heart Valve Analysis Package is a system for signal and image processing and classification. It is being developed in both Matlab and C, to provide an interactive, interpreted environment, and has been optimized for large scale matrix operations. It has been used successfully to classify acoustic signals from implanted prosthetic heart valves in human patients, and will be integrated into a commercial Heart Valve Screening Center. The system uses several standard signal processing algorithms, as well as supervised learning techniques using the probabilistic neural network (PNN). Although currently used for the acoustic heart valve application, the algorithms and modular design allow it to be used for other applications, as well. We will describe the signal classification system, and show results from a set of test valves.

  2. Algorithms for the Automatic Classification and Sorting of Conifers in the Garden Nursery Industry

    DEFF Research Database (Denmark)

    Petri, Stig

    with the classification and sorting of plants using machine vision have been discussed as an introduction to the work reported here. The use of Nordmann firs as a basis for evaluating the developed algorithms naturally introduces a bias towards this species in the algorithms, but steps have been taken throughout...... was used as the basis for evaluating the constructed feature extraction algorithms. Through an analysis of the construction of a machine vision system suitable for classifying and sorting plants, the needs with regard to physical frame, lighting system, camera and software algorithms have been uncovered......The ultimate purpose of this work is the development of general feature extraction algorithms useful for the classification and sorting of plants in the garden nursery industry. Narrowing the area of focus to bare-root plants, more specifically Nordmann firs, the scientific literature dealing...

  3. Combined use of two supervised learning algorithms to model sea turtle behaviours from tri-axial acceleration data.

    Science.gov (United States)

    Jeantet, L; Dell'Amico, F; Forin-Wiart, M-A; Coutant, M; Bonola, M; Etienne, D; Gresser, J; Regis, S; Lecerf, N; Lefebvre, F; de Thoisy, B; Le Maho, Y; Brucker, M; Châtelain, N; Laesser, R; Crenner, F; Handrich, Y; Wilson, R; Chevallier, D

    2018-05-23

    Accelerometers are becoming ever more important sensors in animal-attached technology, providing data that allow determination of body posture and movement and thereby helping to elucidate behaviour in animals that are difficult to observe. We sought to validate the identification of sea turtle behaviours from accelerometer signals by deploying tags on the carapace of a juvenile loggerhead ( Caretta caretta ), an adult hawksbill ( Eretmochelys imbricata ) and an adult green turtle ( Chelonia mydas ) at Aquarium La Rochelle, France. We recorded tri-axial acceleration at 50 Hz for each species for a full day while two fixed cameras recorded their behaviours. We identified behaviours from the acceleration data using two different supervised learning algorithms, Random Forest and Classification And Regression Tree (CART), treating the data from the adult animals as separate from the juvenile data. We achieved a global accuracy of 81.30% for the adult hawksbill and green turtle CART model and 71.63% for the juvenile loggerhead, identifying 10 and 12 different behaviours, respectively. Equivalent figures were 86.96% for the adult hawksbill and green turtle Random Forest model and 79.49% for the juvenile loggerhead, for the same behaviours. The use of Random Forest combined with CART algorithms allowed us to understand the decision rules implicated in behaviour discrimination, and thus remove or group together some 'confused' or under--represented behaviours in order to get the most accurate models. This study is the first to validate accelerometer data to identify turtle behaviours and the approach can now be tested on other captive sea turtle species. © 2018. Published by The Company of Biologists Ltd.

  4. Implementation of several mathematical algorithms to breast tissue density classification

    Science.gov (United States)

    Quintana, C.; Redondo, M.; Tirao, G.

    2014-02-01

    The accuracy of mammographic abnormality detection methods is strongly dependent on breast tissue characteristics, where a dense breast tissue can hide lesions causing cancer to be detected at later stages. In addition, breast tissue density is widely accepted to be an important risk indicator for the development of breast cancer. This paper presents the implementation and the performance of different mathematical algorithms designed to standardize the categorization of mammographic images, according to the American College of Radiology classifications. These mathematical techniques are based on intrinsic properties calculations and on comparison with an ideal homogeneous image (joint entropy, mutual information, normalized cross correlation and index Q) as categorization parameters. The algorithms evaluation was performed on 100 cases of the mammographic data sets provided by the Ministerio de Salud de la Provincia de Córdoba, Argentina—Programa de Prevención del Cáncer de Mama (Department of Public Health, Córdoba, Argentina, Breast Cancer Prevention Program). The obtained breast classifications were compared with the expert medical diagnostics, showing a good performance. The implemented algorithms revealed a high potentiality to classify breasts into tissue density categories.

  5. A Chinese text classification system based on Naive Bayes algorithm

    Directory of Open Access Journals (Sweden)

    Cui Wei

    2016-01-01

    Full Text Available In this paper, aiming at the characteristics of Chinese text classification, using the ICTCLAS(Chinese lexical analysis system of Chinese academy of sciences for document segmentation, and for data cleaning and filtering the Stop words, using the information gain and document frequency feature selection algorithm to document feature selection. Based on this, based on the Naive Bayesian algorithm implemented text classifier , and use Chinese corpus of Fudan University has carried on the experiment and analysis on the system.

  6. A review of classification algorithms for EEG-based brain-computer interfaces: a 10 year update.

    Science.gov (United States)

    Lotte, F; Bougrain, L; Cichocki, A; Clerc, M; Congedo, M; Rakotomamonjy, A; Yger, F

    2018-06-01

    Most current electroencephalography (EEG)-based brain-computer interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field, as described in our 2007 review paper. Now, approximately ten years after this review publication, many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs. We surveyed the BCI and machine learning literature from 2007 to 2017 to identify the new classification approaches that have been investigated to design BCIs. We synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons. We found that the recently designed classification algorithms for EEG-based BCIs can be divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning and deep learning, plus a few other miscellaneous classifiers. Among these, adaptive classifiers were demonstrated to be generally superior to static ones, even with unsupervised adaptation. Transfer learning can also prove useful although the benefits of transfer learning remain unpredictable. Riemannian geometry-based methods have reached state-of-the-art performances on multiple BCI problems and deserve to be explored more thoroughly, along with tensor-based methods. Shrinkage linear discriminant analysis and random forests also appear particularly useful for small training samples settings. On the other hand, deep learning methods have not yet shown convincing improvement over state-of-the-art BCI methods. This paper provides a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these methods and guidelines on when and how to use them. It also identifies a number of challenges to further advance EEG classification in BCI.

  7. A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update

    Science.gov (United States)

    Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F.

    2018-06-01

    Objective. Most current electroencephalography (EEG)-based brain–computer interfaces (BCIs) are based on machine learning algorithms. There is a large diversity of classifier types that are used in this field, as described in our 2007 review paper. Now, approximately ten years after this review publication, many new algorithms have been developed and tested to classify EEG signals in BCIs. The time is therefore ripe for an updated review of EEG classification algorithms for BCIs. Approach. We surveyed the BCI and machine learning literature from 2007 to 2017 to identify the new classification approaches that have been investigated to design BCIs. We synthesize these studies in order to present such algorithms, to report how they were used for BCIs, what were the outcomes, and to identify their pros and cons. Main results. We found that the recently designed classification algorithms for EEG-based BCIs can be divided into four main categories: adaptive classifiers, matrix and tensor classifiers, transfer learning and deep learning, plus a few other miscellaneous classifiers. Among these, adaptive classifiers were demonstrated to be generally superior to static ones, even with unsupervised adaptation. Transfer learning can also prove useful although the benefits of transfer learning remain unpredictable. Riemannian geometry-based methods have reached state-of-the-art performances on multiple BCI problems and deserve to be explored more thoroughly, along with tensor-based methods. Shrinkage linear discriminant analysis and random forests also appear particularly useful for small training samples settings. On the other hand, deep learning methods have not yet shown convincing improvement over state-of-the-art BCI methods. Significance. This paper provides a comprehensive overview of the modern classification algorithms used in EEG-based BCIs, presents the principles of these methods and guidelines on when and how to use them. It also identifies a number of challenges

  8. Constrained Deep Weak Supervision for Histopathology Image Segmentation.

    Science.gov (United States)

    Jia, Zhipeng; Huang, Xingyi; Chang, Eric I-Chao; Xu, Yan

    2017-11-01

    In this paper, we develop a new weakly supervised learning algorithm to learn to segment cancerous regions in histopathology images. This paper is under a multiple instance learning (MIL) framework with a new formulation, deep weak supervision (DWS); we also propose an effective way to introduce constraints to our neural networks to assist the learning process. The contributions of our algorithm are threefold: 1) we build an end-to-end learning system that segments cancerous regions with fully convolutional networks (FCNs) in which image-to-image weakly-supervised learning is performed; 2) we develop a DWS formulation to exploit multi-scale learning under weak supervision within FCNs; and 3) constraints about positive instances are introduced in our approach to effectively explore additional weakly supervised information that is easy to obtain and enjoy a significant boost to the learning process. The proposed algorithm, abbreviated as DWS-MIL, is easy to implement and can be trained efficiently. Our system demonstrates the state-of-the-art results on large-scale histopathology image data sets and can be applied to various applications in medical imaging beyond histopathology images, such as MRI, CT, and ultrasound images.

  9. Supervised Machine Learning for Regionalization of Environmental Data: Distribution of Uranium in Groundwater in Ukraine

    Science.gov (United States)

    Govorov, Michael; Gienko, Gennady; Putrenko, Viktor

    2018-05-01

    In this paper, several supervised machine learning algorithms were explored to define homogeneous regions of con-centration of uranium in surface waters in Ukraine using multiple environmental parameters. The previous study was focused on finding the primary environmental parameters related to uranium in ground waters using several methods of spatial statistics and unsupervised classification. At this step, we refined the regionalization using Artifi-cial Neural Networks (ANN) techniques including Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Convolutional Neural Network (CNN). The study is focused on building local ANN models which may significantly improve the prediction results of machine learning algorithms by taking into considerations non-stationarity and autocorrelation in spatial data.

  10. Comparison of some classification algorithms based on deterministic and nondeterministic decision rules

    KAUST Repository

    Delimata, Paweł

    2010-01-01

    We discuss two, in a sense extreme, kinds of nondeterministic rules in decision tables. The first kind of rules, called as inhibitory rules, are blocking only one decision value (i.e., they have all but one decisions from all possible decisions on their right hand sides). Contrary to this, any rule of the second kind, called as a bounded nondeterministic rule, can have on the right hand side only a few decisions. We show that both kinds of rules can be used for improving the quality of classification. In the paper, two lazy classification algorithms of polynomial time complexity are considered. These algorithms are based on deterministic and inhibitory decision rules, but the direct generation of rules is not required. Instead of this, for any new object the considered algorithms extract from a given decision table efficiently some information about the set of rules. Next, this information is used by a decision-making procedure. The reported results of experiments show that the algorithms based on inhibitory decision rules are often better than those based on deterministic decision rules. We also present an application of bounded nondeterministic rules in construction of rule based classifiers. We include the results of experiments showing that by combining rule based classifiers based on minimal decision rules with bounded nondeterministic rules having confidence close to 1 and sufficiently large support, it is possible to improve the classification quality. © 2010 Springer-Verlag.

  11. Optimum supervision intervals and order of supervision in nuclear reactor protective systems

    International Nuclear Information System (INIS)

    Kontoleon, J.M.

    1978-01-01

    The optimum inspection strategy of an m-out-of-n:G nuclear reactor protective system with nonidentical units is analyzed. A 2-out-of-4:G system is used to formulate a multi-variable optimization problem to determine (a) the optimum order of supervision of the units and (b) the optimum supervision intervals between units. The case of systems with identical units is a special case of the above. Numerical results are derived using a computer algorithm

  12. New Dandelion Algorithm Optimizes Extreme Learning Machine for Biomedical Classification Problems

    Directory of Open Access Journals (Sweden)

    Xiguang Li

    2017-01-01

    Full Text Available Inspired by the behavior of dandelion sowing, a new novel swarm intelligence algorithm, namely, dandelion algorithm (DA, is proposed for global optimization of complex functions in this paper. In DA, the dandelion population will be divided into two subpopulations, and different subpopulations will undergo different sowing behaviors. Moreover, another sowing method is designed to jump out of local optimum. In order to demonstrate the validation of DA, we compare the proposed algorithm with other existing algorithms, including bat algorithm, particle swarm optimization, and enhanced fireworks algorithm. Simulations show that the proposed algorithm seems much superior to other algorithms. At the same time, the proposed algorithm can be applied to optimize extreme learning machine (ELM for biomedical classification problems, and the effect is considerable. At last, we use different fusion methods to form different fusion classifiers, and the fusion classifiers can achieve higher accuracy and better stability to some extent.

  13. SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES

    Directory of Open Access Journals (Sweden)

    Hugo Leonardo Pereira Rufino

    2016-04-01

    Full Text Available Most classification tools assume that data distribution be balanced or with similar costs, when not properly classified. Nevertheless, in practical terms, the existence of database where unbalanced classes occur is commonplace, such as in the diagnosis of diseases, in which the confirmed cases are usually rare when compared with a healthy population. Other examples are the detection of fraudulent calls and the detection of system intruders. In these cases, the improper classification of a minority class (for instance, to diagnose a person with cancer as healthy may result in more serious consequences that incorrectly classify a majority class. Therefore, it is important to treat the database where unbalanced classes occur. This paper presents the SMOTE_Easy algorithm, which can classify data, even if there is a high level of unbalancing between different classes. In order to prove its efficiency, a comparison with the main algorithms to treat classification issues was made, where unbalanced data exist. This process was successful in nearly all tested databases

  14. Change classification in SAR time series: a functional approach

    Science.gov (United States)

    Boldt, Markus; Thiele, Antje; Schulz, Karsten; Hinz, Stefan

    2017-10-01

    Change detection represents a broad field of research in SAR remote sensing, consisting of many different approaches. Besides the simple recognition of change areas, the analysis of type, category or class of the change areas is at least as important for creating a comprehensive result. Conventional strategies for change classification are based on supervised or unsupervised landuse / landcover classifications. The main drawback of such approaches is that the quality of the classification result directly depends on the selection of training and reference data. Additionally, supervised processing methods require an experienced operator who capably selects the training samples. This training step is not necessary when using unsupervised strategies, but nevertheless meaningful reference data must be available for identifying the resulting classes. Consequently, an experienced operator is indispensable. In this study, an innovative concept for the classification of changes in SAR time series data is proposed. Regarding the drawbacks of traditional strategies given above, it copes without using any training data. Moreover, the method can be applied by an operator, who does not have detailed knowledge about the available scenery yet. This knowledge is provided by the algorithm. The final step of the procedure, which main aspect is given by the iterative optimization of an initial class scheme with respect to the categorized change objects, is represented by the classification of these objects to the finally resulting classes. This assignment step is subject of this paper.

  15. Fully Decentralized Semi-supervised Learning via Privacy-preserving Matrix Completion.

    Science.gov (United States)

    Fierimonte, Roberto; Scardapane, Simone; Uncini, Aurelio; Panella, Massimo

    2016-08-26

    Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal.

  16. Support Vector Machines for Hyperspectral Remote Sensing Classification

    Science.gov (United States)

    Gualtieri, J. Anthony; Cromp, R. F.

    1998-01-01

    The Support Vector Machine provides a new way to design classification algorithms which learn from examples (supervised learning) and generalize when applied to new data. We demonstrate its success on a difficult classification problem from hyperspectral remote sensing, where we obtain performances of 96%, and 87% correct for a 4 class problem, and a 16 class problem respectively. These results are somewhat better than other recent results on the same data. A key feature of this classifier is its ability to use high-dimensional data without the usual recourse to a feature selection step to reduce the dimensionality of the data. For this application, this is important, as hyperspectral data consists of several hundred contiguous spectral channels for each exemplar. We provide an introduction to this new approach, and demonstrate its application to classification of an agriculture scene.

  17. Energy-efficient algorithm for classification of states of wireless sensor network using machine learning methods

    Science.gov (United States)

    Yuldashev, M. N.; Vlasov, A. I.; Novikov, A. N.

    2018-05-01

    This paper focuses on the development of an energy-efficient algorithm for classification of states of a wireless sensor network using machine learning methods. The proposed algorithm reduces energy consumption by: 1) elimination of monitoring of parameters that do not affect the state of the sensor network, 2) reduction of communication sessions over the network (the data are transmitted only if their values can affect the state of the sensor network). The studies of the proposed algorithm have shown that at classification accuracy close to 100%, the number of communication sessions can be reduced by 80%.

  18. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling

    Directory of Open Access Journals (Sweden)

    Hala Alshamlan

    2015-01-01

    Full Text Available An artificial bee colony (ABC is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR, and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO. The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  19. mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling.

    Science.gov (United States)

    Alshamlan, Hala; Badr, Ghada; Alohali, Yousef

    2015-01-01

    An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.

  20. Weakly supervised semantic segmentation using fore-background priors

    Science.gov (United States)

    Han, Zheng; Xiao, Zhitao; Yu, Mingjun

    2017-07-01

    Weakly-supervised semantic segmentation is a challenge in the field of computer vision. Most previous works utilize the labels of the whole training set and thereby need the construction of a relationship graph about image labels, thus result in expensive computation. In this study, we tackle this problem from a different perspective. We proposed a novel semantic segmentation algorithm based on background priors, which avoids the construction of a huge graph in whole training dataset. Specifically, a random forest classifier is obtained using weakly supervised training data .Then semantic texton forest (STF) feature is extracted from image superpixels. Finally, a CRF based optimization algorithm is proposed. The unary potential of CRF derived from the outputting probability of random forest classifier and the robust saliency map as background prior. Experiments on the MSRC21 dataset show that the new algorithm outperforms some previous influential weakly-supervised segmentation algorithms. Furthermore, the use of efficient decision forests classifier and parallel computing of saliency map significantly accelerates the implementation.

  1. Multi-objective evolutionary algorithms for fuzzy classification in survival prediction.

    Science.gov (United States)

    Jiménez, Fernando; Sánchez, Gracia; Juárez, José M

    2014-03-01

    This paper presents a novel rule-based fuzzy classification methodology for survival/mortality prediction in severe burnt patients. Due to the ethical aspects involved in this medical scenario, physicians tend not to accept a computer-based evaluation unless they understand why and how such a recommendation is given. Therefore, any fuzzy classifier model must be both accurate and interpretable. The proposed methodology is a three-step process: (1) multi-objective constrained optimization of a patient's data set, using Pareto-based elitist multi-objective evolutionary algorithms to maximize accuracy and minimize the complexity (number of rules) of classifiers, subject to interpretability constraints; this step produces a set of alternative (Pareto) classifiers; (2) linguistic labeling, which assigns a linguistic label to each fuzzy set of the classifiers; this step is essential to the interpretability of the classifiers; (3) decision making, whereby a classifier is chosen, if it is satisfactory, according to the preferences of the decision maker. If no classifier is satisfactory for the decision maker, the process starts again in step (1) with a different input parameter set. The performance of three multi-objective evolutionary algorithms, niched pre-selection multi-objective algorithm, elitist Pareto-based multi-objective evolutionary algorithm for diversity reinforcement (ENORA) and the non-dominated sorting genetic algorithm (NSGA-II), was tested using a patient's data set from an intensive care burn unit and a standard machine learning data set from an standard machine learning repository. The results are compared using the hypervolume multi-objective metric. Besides, the results have been compared with other non-evolutionary techniques and validated with a multi-objective cross-validation technique. Our proposal improves the classification rate obtained by other non-evolutionary techniques (decision trees, artificial neural networks, Naive Bayes, and case

  2. [Automatic Sleep Stage Classification Based on an Improved K-means Clustering Algorithm].

    Science.gov (United States)

    Xiao, Shuyuan; Wang, Bei; Zhang, Jian; Zhang, Qunfeng; Zou, Junzhong

    2016-10-01

    Sleep stage scoring is a hotspot in the field of medicine and neuroscience.Visual inspection of sleep is laborious and the results may be subjective to different clinicians.Automatic sleep stage classification algorithm can be used to reduce the manual workload.However,there are still limitations when it encounters complicated and changeable clinical cases.The purpose of this paper is to develop an automatic sleep staging algorithm based on the characteristics of actual sleep data.In the proposed improved K-means clustering algorithm,points were selected as the initial centers by using a concept of density to avoid the randomness of the original K-means algorithm.Meanwhile,the cluster centers were updated according to the‘Three-Sigma Rule’during the iteration to abate the influence of the outliers.The proposed method was tested and analyzed on the overnight sleep data of the healthy persons and patients with sleep disorders after continuous positive airway pressure(CPAP)treatment.The automatic sleep stage classification results were compared with the visual inspection by qualified clinicians and the averaged accuracy reached 76%.With the analysis of morphological diversity of sleep data,it was proved that the proposed improved K-means algorithm was feasible and valid for clinical practice.

  3. Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

    Directory of Open Access Journals (Sweden)

    Saadia Zahid

    2015-01-01

    Full Text Available Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs with artificial neural networks (ANNs. Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and lastly, speech segment is classified into silence and pure-speech segments on the basis of rule-based classifier. Minimum data is used for training classifier; ensemble methods are used for minimizing misclassification rate and approximately 98% accurate segments are obtained. A fast and efficient algorithm is designed that can be used with real-time multimedia applications.

  4. Machine learning algorithms for meteorological event classification in the coastal area using in-situ data

    Science.gov (United States)

    Sokolov, Anton; Gengembre, Cyril; Dmitriev, Egor; Delbarre, Hervé

    2017-04-01

    The problem is considered of classification of local atmospheric meteorological events in the coastal area such as sea breezes, fogs and storms. The in-situ meteorological data as wind speed and direction, temperature, humidity and turbulence are used as predictors. Local atmospheric events of 2013-2014 were analysed manually to train classification algorithms in the coastal area of English Channel in Dunkirk (France). Then, ultrasonic anemometer data and LIDAR wind profiler data were used as predictors. A few algorithms were applied to determine meteorological events by local data such as a decision tree, the nearest neighbour classifier, a support vector machine. The comparison of classification algorithms was carried out, the most important predictors for each event type were determined. It was shown that in more than 80 percent of the cases machine learning algorithms detect the meteorological class correctly. We expect that this methodology could be applied also to classify events by climatological in-situ data or by modelling data. It allows estimating frequencies of each event in perspective of climate change.

  5. A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data.

    Directory of Open Access Journals (Sweden)

    David Stephens

    Full Text Available Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i the two primary features of bathymetry and backscatter, ii a subset of the features chosen by a feature selection process and iii all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well

  6. Building an asynchronous web-based tool for machine learning classification.

    Science.gov (United States)

    Weber, Griffin; Vinterbo, Staal; Ohno-Machado, Lucila

    2002-01-01

    Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not been conducted. We developed free software that implements logistic regression with stepwise variable selection as a quick and simple method for initial exploration of important genetic markers in disease classification. To implement the algorithm and allow our collaborators in remote locations to evaluate and compare its results against those of other methods, we developed a user-friendly asynchronous web-based application with a minimal amount of programming using free, downloadable software tools. With this program, we show that classification using logistic regression can perform as well as other more sophisticated algorithms, and it has the advantages of being easy to interpret and reproduce. By making the tool freely and easily available, we hope to promote the comparison of classification methods. In addition, we believe our web application can be used as a model for other bioinformatics laboratories that need to develop web-based analysis tools in a short amount of time and on a limited budget.

  7. Automatic classification of endogenous seismic sources within a landslide body using random forest algorithm

    Science.gov (United States)

    Provost, Floriane; Hibert, Clément; Malet, Jean-Philippe; Stumpf, André; Doubre, Cécile

    2016-04-01

    Different studies have shown the presence of microseismic activity in soft-rock landslides. The seismic signals exhibit significantly different features in the time and frequency domains which allow their classification and interpretation. Most of the classes could be associated with different mechanisms of deformation occurring within and at the surface (e.g. rockfall, slide-quake, fissure opening, fluid circulation). However, some signals remain not fully understood and some classes contain few examples that prevent any interpretation. To move toward a more complete interpretation of the links between the dynamics of soft-rock landslides and the physical processes controlling their behaviour, a complete catalog of the endogeneous seismicity is needed. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forest classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied to the seismicity recorded at the Super-Sauze landslide between 2013 and 2015. We selected a dozen of seismic signal features that characterize precisely its spectral content (e.g. central frequency, spectrum width, energy in several frequency bands, spectrogram shape, spectrum local and global maxima

  8. Neighborhood Hypergraph Based Classification Algorithm for Incomplete Information System

    Directory of Open Access Journals (Sweden)

    Feng Hu

    2015-01-01

    Full Text Available The problem of classification in incomplete information system is a hot issue in intelligent information processing. Hypergraph is a new intelligent method for machine learning. However, it is hard to process the incomplete information system by the traditional hypergraph, which is due to two reasons: (1 the hyperedges are generated randomly in traditional hypergraph model; (2 the existing methods are unsuitable to deal with incomplete information system, for the sake of missing values in incomplete information system. In this paper, we propose a novel classification algorithm for incomplete information system based on hypergraph model and rough set theory. Firstly, we initialize the hypergraph. Second, we classify the training set by neighborhood hypergraph. Third, under the guidance of rough set, we replace the poor hyperedges. After that, we can obtain a good classifier. The proposed approach is tested on 15 data sets from UCI machine learning repository. Furthermore, it is compared with some existing methods, such as C4.5, SVM, NavieBayes, and KNN. The experimental results show that the proposed algorithm has better performance via Precision, Recall, AUC, and F-measure.

  9. Transfer learning improves supervised image segmentation across imaging protocols.

    Science.gov (United States)

    van Opbroek, Annegreet; Ikram, M Arfan; Vernooij, Meike W; de Bruijne, Marleen

    2015-05-01

    The variation between images obtained with different scanners or different imaging protocols presents a major challenge in automatic segmentation of biomedical images. This variation especially hampers the application of otherwise successful supervised-learning techniques which, in order to perform well, often require a large amount of labeled training data that is exactly representative of the target data. We therefore propose to use transfer learning for image segmentation. Transfer-learning techniques can cope with differences in distributions between training and target data, and therefore may improve performance over supervised learning for segmentation across scanners and scan protocols. We present four transfer classifiers that can train a classification scheme with only a small amount of representative training data, in addition to a larger amount of other training data with slightly different characteristics. The performance of the four transfer classifiers was compared to that of standard supervised classification on two magnetic resonance imaging brain-segmentation tasks with multi-site data: white matter, gray matter, and cerebrospinal fluid segmentation; and white-matter-/MS-lesion segmentation. The experiments showed that when there is only a small amount of representative training data available, transfer learning can greatly outperform common supervised-learning approaches, minimizing classification errors by up to 60%.

  10. Unsupervised classification of multivariate geostatistical data: Two algorithms

    Science.gov (United States)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  11. Classification of Aerosol Retrievals from Spaceborne Polarimetry Using a Multiparameter Algorithm

    Science.gov (United States)

    Russell, Philip B.; Kacenelenbogen, Meloe; Livingston, John M.; Hasekamp, Otto P.; Burton, Sharon P.; Schuster, Gregory L.; Johnson, Matthew S.; Knobelspiesse, Kirk D.; Redemann, Jens; Ramachandran, S.; hide

    2013-01-01

    In this presentation, we demonstrate application of a new aerosol classification algorithm to retrievals from the POLDER-3 polarimter on the PARASOL spacecraft. Motivation and method: Since the development of global aerosol measurements by satellites and AERONET, classification of observed aerosols into several types (e.g., urban-industrial, biomass burning, mineral dust, maritime, and various subtypes or mixtures of these) has proven useful to: understanding aerosol sources, transformations, effects, and feedback mechanisms; improving accuracy of satellite retrievals and quantifying assessments of aerosol radiative impacts on climate.

  12. Weakly Supervised Dictionary Learning

    Science.gov (United States)

    You, Zeyu; Raich, Raviv; Fern, Xiaoli Z.; Kim, Jinsub

    2018-05-01

    We present a probabilistic modeling and inference framework for discriminative analysis dictionary learning under a weak supervision setting. Dictionary learning approaches have been widely used for tasks such as low-level signal denoising and restoration as well as high-level classification tasks, which can be applied to audio and image analysis. Synthesis dictionary learning aims at jointly learning a dictionary and corresponding sparse coefficients to provide accurate data representation. This approach is useful for denoising and signal restoration, but may lead to sub-optimal classification performance. By contrast, analysis dictionary learning provides a transform that maps data to a sparse discriminative representation suitable for classification. We consider the problem of analysis dictionary learning for time-series data under a weak supervision setting in which signals are assigned with a global label instead of an instantaneous label signal. We propose a discriminative probabilistic model that incorporates both label information and sparsity constraints on the underlying latent instantaneous label signal using cardinality control. We present the expectation maximization (EM) procedure for maximum likelihood estimation (MLE) of the proposed model. To facilitate a computationally efficient E-step, we propose both a chain and a novel tree graph reformulation of the graphical model. The performance of the proposed model is demonstrated on both synthetic and real-world data.

  13. Supervised Transfer Sparse Coding

    KAUST Repository

    Al-Shedivat, Maruan

    2014-07-27

    A combination of the sparse coding and transfer learn- ing techniques was shown to be accurate and robust in classification tasks where training and testing objects have a shared feature space but are sampled from differ- ent underlying distributions, i.e., belong to different do- mains. The key assumption in such case is that in spite of the domain disparity, samples from different domains share some common hidden factors. Previous methods often assumed that all the objects in the target domain are unlabeled, and thus the training set solely comprised objects from the source domain. However, in real world applications, the target domain often has some labeled objects, or one can always manually label a small num- ber of them. In this paper, we explore such possibil- ity and show how a small number of labeled data in the target domain can significantly leverage classifica- tion accuracy of the state-of-the-art transfer sparse cod- ing methods. We further propose a unified framework named supervised transfer sparse coding (STSC) which simultaneously optimizes sparse representation, domain transfer and classification. Experimental results on three applications demonstrate that a little manual labeling and then learning the model in a supervised fashion can significantly improve classification accuracy.

  14. Comparison between Possibilistic c-Means (PCM and Artificial Neural Network (ANN Classification Algorithms in Land use/ Land cover Classification

    Directory of Open Access Journals (Sweden)

    Ganchimeg Ganbold

    2017-03-01

    Full Text Available There are several statistical classification algorithms available for landuse/land cover classification. However, each has a certain bias orcompromise. Some methods like the parallel piped approach in supervisedclassification, cannot classify continuous regions within a feature. Onthe other hand, while unsupervised classification method takes maximumadvantage of spectral variability in an image, the maximally separableclusters in spectral space may not do much for our perception of importantclasses in a given study area. In this research, the output of an ANNalgorithm was compared with the Possibilistic c-Means an improvementof the fuzzy c-Means on both moderate resolutions Landsat8 and a highresolution Formosat 2 images. The Formosat 2 image comes with an8m spectral resolution on the multispectral data. This multispectral imagedata was resampled to 10m in order to maintain a uniform ratio of1:3 against Landsat 8 image. Six classes were chosen for analysis including:Dense forest, eucalyptus, water, grassland, wheat and riverine sand. Using a standard false color composite (FCC, the six features reflecteddifferently in the infrared region with wheat producing the brightestpixel values. Signature collection per class was therefore easily obtainedfor all classifications. The output of both ANN and FCM, were analyzedseparately for accuracy and an error matrix generated to assess the qualityand accuracy of the classification algorithms. When you compare theresults of the two methods on a per-class-basis, ANN had a crisperoutput compared to PCM which yielded clusters with pixels especiallyon the moderate resolution Landsat 8 imagery.

  15. Sampling algorithms for validation of supervised learning models for Ising-like systems

    Science.gov (United States)

    Portman, Nataliya; Tamblyn, Isaac

    2017-12-01

    In this paper, we build and explore supervised learning models of ferromagnetic system behavior, using Monte-Carlo sampling of the spin configuration space generated by the 2D Ising model. Given the enormous size of the space of all possible Ising model realizations, the question arises as to how to choose a reasonable number of samples that will form physically meaningful and non-intersecting training and testing datasets. Here, we propose a sampling technique called ;ID-MH; that uses the Metropolis-Hastings algorithm creating Markov process across energy levels within the predefined configuration subspace. We show that application of this method retains phase transitions in both training and testing datasets and serves the purpose of validation of a machine learning algorithm. For larger lattice dimensions, ID-MH is not feasible as it requires knowledge of the complete configuration space. As such, we develop a new ;block-ID; sampling strategy: it decomposes the given structure into square blocks with lattice dimension N ≤ 5 and uses ID-MH sampling of candidate blocks. Further comparison of the performance of commonly used machine learning methods such as random forests, decision trees, k nearest neighbors and artificial neural networks shows that the PCA-based Decision Tree regressor is the most accurate predictor of magnetizations of the Ising model. For energies, however, the accuracy of prediction is not satisfactory, highlighting the need to consider more algorithmically complex methods (e.g., deep learning).

  16. A novel evaluation of two related and two independent algorithms for eye movement classification during reading.

    Science.gov (United States)

    Friedman, Lee; Rigas, Ioannis; Abdulin, Evgeny; Komogortsev, Oleg V

    2018-05-15

    Nystrӧm and Holmqvist have published a method for the classification of eye movements during reading (ONH) (Nyström & Holmqvist, 2010). When we applied this algorithm to our data, the results were not satisfactory, so we modified the algorithm (now the MNH) to better classify our data. The changes included: (1) reducing the amount of signal filtering, (2) excluding a new type of noise, (3) removing several adaptive thresholds and replacing them with fixed thresholds, (4) changing the way that the start and end of each saccade was determined, (5) employing a new algorithm for detecting PSOs, and (6) allowing a fixation period to either begin or end with noise. A new method for the evaluation of classification algorithms is presented. It was designed to provide comprehensive feedback to an algorithm developer, in a time-efficient manner, about the types and numbers of classification errors that an algorithm produces. This evaluation was conducted by three expert raters independently, across 20 randomly chosen recordings, each classified by both algorithms. The MNH made many fewer errors in determining when saccades start and end, and it also detected some fixations and saccades that the ONH did not. The MNH fails to detect very small saccades. We also evaluated two additional algorithms: the EyeLink Parser and a more current, machine-learning-based algorithm. The EyeLink Parser tended to find more saccades that ended too early than did the other methods, and we found numerous problems with the output of the machine-learning-based algorithm.

  17. Study and development of equipment supervision technique system and its management software for nuclear electricity production

    International Nuclear Information System (INIS)

    Zhang Liying; Zou Pingguo; Zhu Chenghu; Lu Haoliang; Wu Jie

    2008-01-01

    The equipment supervision technique system, which standardized the behavior of supervision organizations in planning and implementing of equipment supervision, is built up based on equipment supervision technique documents, such as Quality Supervision Classifications, Special Supervision Plans and Supervision Guides. Furthermore, based on the research, the equipment supervision management information system is developed by Object Oriented Programming, which consists of supervision information, supervision technique, supervision implementation, quality statistics and analysis module. (authors)

  18. A Classification Framework Applied to Cancer Gene Expression Profiles

    Directory of Open Access Journals (Sweden)

    Hussein Hijazi

    2013-01-01

    Full Text Available Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM, bagging, and random forest on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression increase the prediction accuracy as compared to using gene expression alone.

  19. Predicting protein complexes using a supervised learning method combined with local structural information.

    Science.gov (United States)

    Dong, Yadong; Sun, Yongqi; Qin, Chao

    2018-01-01

    The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.

  20. Experiments in Discourse Analysis Impact on Information Classification and Retrieval Algorithms.

    Science.gov (United States)

    Morato, Jorge; Llorens, J.; Genova, G.; Moreiro, J. A.

    2003-01-01

    Discusses the inclusion of contextual information in indexing and retrieval systems to improve results and the ability to carry out text analysis by means of linguistic knowledge. Presents research that investigated whether discourse variables have an impact on information and retrieval and classification algorithms. (Author/LRW)

  1. Does a Diagnostic Classification Algorithm Help to Predict the Course of Low Back Pain?

    DEFF Research Database (Denmark)

    Hartvigsen, Lisbeth; Kongsted, Alice; Vach, Werner

    2018-01-01

    ). Objectives To investigate if a diagnostic classification algorithm is associated with activity limitation and LBP intensity at 2-week and 3-month follow up, and 1-year trajectories of LBP intensity, and if it improves prediction of outcome when added to a set of known predictors. Methods 934 consecutive......Study Design A prospective observational study. Background A diagnostic classification algorithm was developed by Petersen et al., consisting of 12 categories based on a standardized examination protocol with the primary purpose of identifying clinically homogeneous subgroups of low back pain (LBP...... adult patients, with new episodes of LBP, who were visiting chiropractic practices in primary care were categorized according to the Petersen classification. Outcomes were disability and pain intensity measured at 2 weeks and 3 months, and 1-year trajectories of LBP based on weekly responses to text...

  2. An up-to-date comparison of state-of-the-art classification algorithms

    KAUST Repository

    Zhang, Chongsheng

    2017-04-05

    Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

  3. An up-to-date comparison of state-of-the-art classification algorithms

    KAUST Repository

    Zhang, Chongsheng; Liu, Changchang; Zhang, Xiangliang; Almpanidis, George

    2017-01-01

    Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

  4. Accurate Detection of Dysmorphic Nuclei Using Dynamic Programming and Supervised Classification.

    Science.gov (United States)

    Verschuuren, Marlies; De Vylder, Jonas; Catrysse, Hannes; Robijns, Joke; Philips, Wilfried; De Vos, Winnok H

    2017-01-01

    A vast array of pathologies is typified by the presence of nuclei with an abnormal morphology. Dysmorphic nuclear phenotypes feature dramatic size changes or foldings, but also entail much subtler deviations such as nuclear protrusions called blebs. Due to their unpredictable size, shape and intensity, dysmorphic nuclei are often not accurately detected in standard image analysis routines. To enable accurate detection of dysmorphic nuclei in confocal and widefield fluorescence microscopy images, we have developed an automated segmentation algorithm, called Blebbed Nuclei Detector (BleND), which relies on two-pass thresholding for initial nuclear contour detection, and an optimal path finding algorithm, based on dynamic programming, for refining these contours. Using a robust error metric, we show that our method matches manual segmentation in terms of precision and outperforms state-of-the-art nuclear segmentation methods. Its high performance allowed for building and integrating a robust classifier that recognizes dysmorphic nuclei with an accuracy above 95%. The combined segmentation-classification routine is bound to facilitate nucleus-based diagnostics and enable real-time recognition of dysmorphic nuclei in intelligent microscopy workflows.

  5. Accurate Detection of Dysmorphic Nuclei Using Dynamic Programming and Supervised Classification.

    Directory of Open Access Journals (Sweden)

    Marlies Verschuuren

    Full Text Available A vast array of pathologies is typified by the presence of nuclei with an abnormal morphology. Dysmorphic nuclear phenotypes feature dramatic size changes or foldings, but also entail much subtler deviations such as nuclear protrusions called blebs. Due to their unpredictable size, shape and intensity, dysmorphic nuclei are often not accurately detected in standard image analysis routines. To enable accurate detection of dysmorphic nuclei in confocal and widefield fluorescence microscopy images, we have developed an automated segmentation algorithm, called Blebbed Nuclei Detector (BleND, which relies on two-pass thresholding for initial nuclear contour detection, and an optimal path finding algorithm, based on dynamic programming, for refining these contours. Using a robust error metric, we show that our method matches manual segmentation in terms of precision and outperforms state-of-the-art nuclear segmentation methods. Its high performance allowed for building and integrating a robust classifier that recognizes dysmorphic nuclei with an accuracy above 95%. The combined segmentation-classification routine is bound to facilitate nucleus-based diagnostics and enable real-time recognition of dysmorphic nuclei in intelligent microscopy workflows.

  6. A Region-Based GeneSIS Segmentation Algorithm for the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    Stelios K. Mylonas

    2015-03-01

    Full Text Available This paper proposes an object-based segmentation/classification scheme for remotely sensed images, based on a novel variant of the recently proposed Genetic Sequential Image Segmentation (GeneSIS algorithm. GeneSIS segments the image in an iterative manner, whereby at each iteration a single object is extracted via a genetic-based object extraction algorithm. Contrary to the previous pixel-based GeneSIS where the candidate objects to be extracted were evaluated through the fuzzy content of their included pixels, in the newly developed region-based GeneSIS algorithm, a watershed-driven fine segmentation map is initially obtained from the original image, which serves as the basis for the forthcoming GeneSIS segmentation. Furthermore, in order to enhance the spatial search capabilities, we introduce a more descriptive encoding scheme in the object extraction algorithm, where the structural search modules are represented by polygonal shapes. Our objectives in the new framework are posed as follows: enhance the flexibility of the algorithm in extracting more flexible object shapes, assure high level classification accuracies, and reduce the execution time of the segmentation, while at the same time preserving all the inherent attributes of the GeneSIS approach. Finally, exploiting the inherent attribute of GeneSIS to produce multiple segmentations, we also propose two segmentation fusion schemes that operate on the ensemble of segmentations generated by GeneSIS. Our approaches are tested on an urban and two agricultural images. The results show that region-based GeneSIS has considerably lower computational demands compared to the pixel-based one. Furthermore, the suggested methods achieve higher classification accuracies and good segmentation maps compared to a series of existing algorithms.

  7. The surgical algorithm for the AOSpine thoracolumbar spine injury classification system

    NARCIS (Netherlands)

    Vaccaro, Alexander R.; Schroeder, Gregory D.; Kepler, Christopher K.; Cumhur Oner, F.; Vialle, Luiz R.; Kandziora, Frank; Koerner, John D.; Kurd, Mark F.; Reinhold, Max; Schnake, Klaus J.; Chapman, Jens; Aarabi, Bizhan; Fehlings, Michael G.; Dvorak, Marcel F.

    2016-01-01

    Purpose: The goal of the current study is to establish a surgical algorithm to accompany the AOSpine thoracolumbar spine injury classification system. Methods: A survey was sent to AOSpine members from the six AO regions of the world, and surgeons were asked if a patient should undergo an initial

  8. Training ANFIS structure using genetic algorithm for liver cancer classification based on microarray gene expression data

    Directory of Open Access Journals (Sweden)

    Bülent Haznedar

    2017-02-01

    Full Text Available Classification is an important data mining technique, which is used in many fields mostly exemplified as medicine, genetics and biomedical engineering. The number of studies about classification of the datum on DNA microarray gene expression is specifically increased in recent years. However, because of the reasons as the abundance of gene numbers in the datum as microarray gene expressions and the nonlinear relations mostly across those datum, the success of conventional classification algorithms can be limited. Because of these reasons, the interest on classification methods which are based on artificial intelligence to solve the problem on classification has been gradually increased in recent times. In this study, a hybrid approach which is based on Adaptive Neuro-Fuzzy Inference System (ANFIS and Genetic Algorithm (GA are suggested in order to classify liver microarray cancer data set. Simulation results are compared with the results of other methods. According to the results obtained, it is seen that the recommended method is better than the other methods.

  9. Classification and authentication of unknown water samples using machine learning algorithms.

    Science.gov (United States)

    Kundu, Palash K; Panchariya, P C; Kundu, Madhusree

    2011-07-01

    This paper proposes the development of water sample classification and authentication, in real life which is based on machine learning algorithms. The proposed techniques used experimental measurements from a pulse voltametry method which is based on an electronic tongue (E-tongue) instrumentation system with silver and platinum electrodes. E-tongue include arrays of solid state ion sensors, transducers even of different types, data collectors and data analysis tools, all oriented to the classification of liquid samples and authentication of unknown liquid samples. The time series signal and the corresponding raw data represent the measurement from a multi-sensor system. The E-tongue system, implemented in a laboratory environment for 6 numbers of different ISI (Bureau of Indian standard) certified water samples (Aquafina, Bisleri, Kingfisher, Oasis, Dolphin, and McDowell) was the data source for developing two types of machine learning algorithms like classification and regression. A water data set consisting of 6 numbers of sample classes containing 4402 numbers of features were considered. A PCA (principal component analysis) based classification and authentication tool was developed in this study as the machine learning component of the E-tongue system. A proposed partial least squares (PLS) based classifier, which was dedicated as well; to authenticate a specific category of water sample evolved out as an integral part of the E-tongue instrumentation system. The developed PCA and PLS based E-tongue system emancipated an overall encouraging authentication percentage accuracy with their excellent performances for the aforesaid categories of water samples. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.

  10. Semi-supervised Learning for Phenotyping Tasks.

    Science.gov (United States)

    Dligach, Dmitriy; Miller, Timothy; Savova, Guergana K

    2015-01-01

    Supervised learning is the dominant approach to automatic electronic health records-based phenotyping, but it is expensive due to the cost of manual chart review. Semi-supervised learning takes advantage of both scarce labeled and plentiful unlabeled data. In this work, we study a family of semi-supervised learning algorithms based on Expectation Maximization (EM) in the context of several phenotyping tasks. We first experiment with the basic EM algorithm. When the modeling assumptions are violated, basic EM leads to inaccurate parameter estimation. Augmented EM attenuates this shortcoming by introducing a weighting factor that downweights the unlabeled data. Cross-validation does not always lead to the best setting of the weighting factor and other heuristic methods may be preferred. We show that accurate phenotyping models can be trained with only a few hundred labeled (and a large number of unlabeled) examples, potentially providing substantial savings in the amount of the required manual chart review.

  11. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis

    NARCIS (Netherlands)

    Cheplygina, Veronika; de Bruijne, Marleen; Pluim, Josien P. W.

    2018-01-01

    Machine learning (ML) algorithms have made a tremendous impact in the field of medical imaging. While medical imaging datasets have been growing in size, a challenge for supervised ML algorithms that is frequently mentioned is the lack of annotated data. As a result, various methods which can learn

  12. Hybrid Model Based on Genetic Algorithms and SVM Applied to Variable Selection within Fruit Juice Classification

    Directory of Open Access Journals (Sweden)

    C. Fernandez-Lozano

    2013-01-01

    Full Text Available Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM. Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA, the most representative variables for a specific classification problem can be selected.

  13. Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA – A statistical learning approach

    Directory of Open Access Journals (Sweden)

    R. Jegadeeshwaran

    2015-03-01

    Full Text Available In automobile, brake system is an essential part responsible for control of the vehicle. Any failure in the brake system impacts the vehicle's motion. It will generate frequent catastrophic effects on the vehicle cum passenger's safety. Thus the brake system plays a vital role in an automobile and hence condition monitoring of the brake system is essential. Vibration based condition monitoring using machine learning techniques are gaining momentum. This study is one such attempt to perform the condition monitoring of a hydraulic brake system through vibration analysis. In this research, the performance of a Clonal Selection Classification Algorithm (CSCA for brake fault diagnosis has been reported. A hydraulic brake system test rig was fabricated. Under good and faulty conditions of a brake system, the vibration signals were acquired using a piezoelectric transducer. The statistical parameters were extracted from the vibration signal. The best feature set was identified for classification using attribute evaluator. The selected features were then classified using CSCA. The classification accuracy of such artificial intelligence technique has been compared with other machine learning approaches and discussed. The Clonal Selection Classification Algorithm performs better and gives the maximum classification accuracy (96% for the fault diagnosis of a hydraulic brake system.

  14. Semi-supervised learning and domain adaptation in natural language processing

    CERN Document Server

    Søgaard, Anders

    2013-01-01

    This book introduces basic supervised learning algorithms applicable to natural language processing (NLP) and shows how the performance of these algorithms can often be improved by exploiting the marginal distribution of large amounts of unlabeled data. One reason for that is data sparsity, i.e., the limited amounts of data we have available in NLP. However, in most real-world NLP applications our labeled data is also heavily biased. This book introduces extensions of supervised learning algorithms to cope with data sparsity and different kinds of sampling bias.This book is intended to be both

  15. Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2013-01-01

    Full Text Available Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP implementation and two pure Register-Transfer Level (RTL implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.

  16. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    Science.gov (United States)

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for

  17. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    Directory of Open Access Journals (Sweden)

    Dawyndt Peter

    2010-01-01

    Full Text Available Abstract Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the

  18. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    Science.gov (United States)

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial

  19. Kernel Clustering with a Differential Harmony Search Algorithm for Scheme Classification

    Directory of Open Access Journals (Sweden)

    Yu Feng

    2017-01-01

    Full Text Available This paper presents a kernel fuzzy clustering with a novel differential harmony search algorithm to coordinate with the diversion scheduling scheme classification. First, we employed a self-adaptive solution generation strategy and differential evolution-based population update strategy to improve the classical harmony search. Second, we applied the differential harmony search algorithm to the kernel fuzzy clustering to help the clustering method obtain better solutions. Finally, the combination of the kernel fuzzy clustering and the differential harmony search is applied for water diversion scheduling in East Lake. A comparison of the proposed method with other methods has been carried out. The results show that the kernel clustering with the differential harmony search algorithm has good performance to cooperate with the water diversion scheduling problems.

  20. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata.

    Science.gov (United States)

    Liu, Aiming; Chen, Kun; Liu, Quan; Ai, Qingsong; Xie, Yi; Chen, Anqi

    2017-11-08

    Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain-computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain-computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain-computer interface systems.

  1. Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata

    Directory of Open Access Journals (Sweden)

    Aiming Liu

    2017-11-01

    Full Text Available Motor Imagery (MI electroencephalography (EEG is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP and local characteristic-scale decomposition (LCD algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems.

  2. Improved RMR Rock Mass Classification Using Artificial Intelligence Algorithms

    Science.gov (United States)

    Gholami, Raoof; Rasouli, Vamegh; Alimoradi, Andisheh

    2013-09-01

    Rock mass classification systems such as rock mass rating (RMR) are very reliable means to provide information about the quality of rocks surrounding a structure as well as to propose suitable support systems for unstable regions. Many correlations have been proposed to relate measured quantities such as wave velocity to rock mass classification systems to limit the associated time and cost of conducting the sampling and mechanical tests conventionally used to calculate RMR values. However, these empirical correlations have been found to be unreliable, as they usually overestimate or underestimate the RMR value. The aim of this paper is to compare the results of RMR classification obtained from the use of empirical correlations versus machine-learning methodologies based on artificial intelligence algorithms. The proposed methods were verified based on two case studies located in northern Iran. Relevance vector regression (RVR) and support vector regression (SVR), as two robust machine-learning methodologies, were used to predict the RMR for tunnel host rocks. RMR values already obtained by sampling and site investigation at one tunnel were taken into account as the output of the artificial networks during training and testing phases. The results reveal that use of empirical correlations overestimates the predicted RMR values. RVR and SVR, however, showed more reliable results, and are therefore suggested for use in RMR classification for design purposes of rock structures.

  3. Fatigue Level Estimation of Bill Based on Acoustic Signal Feature by Supervised SOM

    Science.gov (United States)

    Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa

    Fatigued bills have harmful influence on daily operation of Automated Teller Machine(ATM). To make the fatigued bills classification more efficient, development of an automatic fatigued bill classification method is desired. We propose a new method to estimate bending rigidity of bill from acoustic signal feature of banking machines. The estimated bending rigidities are used as continuous fatigue level for classification of fatigued bill. By using the supervised Self-Organizing Map(supervised SOM), we estimate the bending rigidity from only the acoustic energy pattern effectively. The experimental result with real bill samples shows the effectiveness of the proposed method.

  4. Determining effects of non-synonymous SNPs on protein-protein interactions using supervised and semi-supervised learning.

    Directory of Open Access Journals (Sweden)

    Nan Zhao

    2014-05-01

    Full Text Available Single nucleotide polymorphisms (SNPs are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs have been found near or inside the protein-protein interaction (PPI interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor. Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1 a 2-class problem (strengthening/weakening PPI mutations, (2 another 2-class problem (mutations that disrupt/preserve a PPI, and (3 a 3-class classification (detrimental/neutral/beneficial mutation effects. In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the

  5. Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning

    Science.gov (United States)

    Zhao, Nan; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

    2014-01-01

    Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of

  6. Improved algorithms for the classification of rough rice using a bionic electronic nose based on PCA and the Wilks distribution.

    Science.gov (United States)

    Xu, Sai; Zhou, Zhiyan; Lu, Huazhong; Luo, Xiwen; Lan, Yubin

    2014-03-19

    Principal Component Analysis (PCA) is one of the main methods used for electronic nose pattern recognition. However, poor classification performance is common in classification and recognition when using regular PCA. This paper aims to improve the classification performance of regular PCA based on the existing Wilks Λ-statistic (i.e., combined PCA with the Wilks distribution). The improved algorithms, which combine regular PCA with the Wilks Λ-statistic, were developed after analysing the functionality and defects of PCA. Verification tests were conducted using a PEN3 electronic nose. The collected samples consisted of the volatiles of six varieties of rough rice (Zhongxiang1, Xiangwan13, Yaopingxiang, WufengyouT025, Pin 36, and Youyou122), grown in same area and season. The first two principal components used as analysis vectors cannot perform the rough rice varieties classification task based on a regular PCA. Using the improved algorithms, which combine the regular PCA with the Wilks Λ-statistic, many different principal components were selected as analysis vectors. The set of data points of the Mahalanobis distance between each of the varieties of rough rice was selected to estimate the performance of the classification. The result illustrates that the rough rice varieties classification task is achieved well using the improved algorithm. A Probabilistic Neural Networks (PNN) was also established to test the effectiveness of the improved algorithms. The first two principal components (namely PC1 and PC2) and the first and fifth principal component (namely PC1 and PC5) were selected as the inputs of PNN for the classification of the six rough rice varieties. The results indicate that the classification accuracy based on the improved algorithm was improved by 6.67% compared to the results of the regular method. These results prove the effectiveness of using the Wilks Λ-statistic to improve the classification accuracy of the regular PCA approach. The results

  7. Comparison of Unsupervised Vegetation Classification Methods from Vhr Images after Shadows Removal by Innovative Algorithms

    Science.gov (United States)

    Movia, A.; Beinat, A.; Crosilla, F.

    2015-04-01

    The recognition of vegetation by the analysis of very high resolution (VHR) aerial images provides meaningful information about environmental features; nevertheless, VHR images frequently contain shadows that generate significant problems for the classification of the image components and for the extraction of the needed information. The aim of this research is to classify, from VHR aerial images, vegetation involved in the balance process of the environmental biochemical cycle, and to discriminate it with respect to urban and agricultural features. Three classification algorithms have been experimented in order to better recognize vegetation, and compared to NDVI index; unfortunately all these methods are conditioned by the presence of shadows on the images. Literature presents several algorithms to detect and remove shadows in the scene: most of them are based on the RGB to HSI transformations. In this work some of them have been implemented and compared with one based on RGB bands. Successively, in order to remove shadows and restore brightness on the images, some innovative algorithms, based on Procrustes theory, have been implemented and applied. Among these, we evaluate the capability of the so called "not-centered oblique Procrustes" and "anisotropic Procrustes" methods to efficiently restore brightness with respect to a linear correlation correction based on the Cholesky decomposition. Some experimental results obtained by different classification methods after shadows removal carried out with the innovative algorithms are presented and discussed.

  8. Remote Sensing Image Classification Based on Stacked Denoising Autoencoder

    Directory of Open Access Journals (Sweden)

    Peng Liang

    2017-12-01

    Full Text Available Focused on the issue that conventional remote sensing image classification methods have run into the bottlenecks in accuracy, a new remote sensing image classification method inspired by deep learning is proposed, which is based on Stacked Denoising Autoencoder. First, the deep network model is built through the stacked layers of Denoising Autoencoder. Then, with noised input, the unsupervised Greedy layer-wise training algorithm is used to train each layer in turn for more robust expressing, characteristics are obtained in supervised learning by Back Propagation (BP neural network, and the whole network is optimized by error back propagation. Finally, Gaofen-1 satellite (GF-1 remote sensing data are used for evaluation, and the total accuracy and kappa accuracy reach 95.7% and 0.955, respectively, which are higher than that of the Support Vector Machine and Back Propagation neural network. The experiment results show that the proposed method can effectively improve the accuracy of remote sensing image classification.

  9. Semi-supervised clustering methods.

    Science.gov (United States)

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

  10. Classification of Internet banking customers using data mining algorithms

    Directory of Open Access Journals (Sweden)

    Reza Radfar

    2014-03-01

    Full Text Available Classifying customers using data mining algorithms, enables banks to keep old customers loyality while attracting new ones. Using decision tree as a data mining technique, we can optimize customer classification provided that the appropriate decision tree is selected. In this article we have presented an appropriate model to classify customers who use internet banking service. The model is developed based on CRISP-DM standard and we have used real data of Sina bank’s Internet bank. In compare to other decision trees, ours is based on both optimization and accuracy factors that recognizes new potential internet banking customers using a three level classification, which is low/medium and high. This is a practical, documentary-based research. Mining customer rules enables managers to make policies based on found out patterns in order to have a better perception of what customers really desire.

  11. Automated detection and classification of cryptographic algorithms in binary programs through machine learning

    OpenAIRE

    Hosfelt, Diane Duros

    2015-01-01

    Threats from the internet, particularly malicious software (i.e., malware) often use cryptographic algorithms to disguise their actions and even to take control of a victim's system (as in the case of ransomware). Malware and other threats proliferate too quickly for the time-consuming traditional methods of binary analysis to be effective. By automating detection and classification of cryptographic algorithms, we can speed program analysis and more efficiently combat malware. This thesis wil...

  12. Semi-Supervised Transductive Hot Spot Predictor Working on Multiple Assumptions

    KAUST Repository

    Wang, Jim Jing-Yan; Almasri, Islam; Shi, Yuexiang; Gao, Xin

    2014-01-01

    of the transductive semi-supervised algorithms takes all the three semisupervised assumptions, i.e., smoothness, cluster and manifold assumptions, together into account during learning. In this paper, we propose a novel semi-supervised method for hot spot residue

  13. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis.

    Science.gov (United States)

    Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H

    2016-01-01

    Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.

  14. FROM2D to 3d Supervised Segmentation and Classification for Cultural Heritage Applications

    Science.gov (United States)

    Grilli, E.; Dininno, D.; Petrucci, G.; Remondino, F.

    2018-05-01

    The digital management of architectural heritage information is still a complex problem, as a heritage object requires an integrated representation of various types of information in order to develop appropriate restoration or conservation strategies. Currently, there is extensive research focused on automatic procedures of segmentation and classification of 3D point clouds or meshes, which can accelerate the study of a monument and integrate it with heterogeneous information and attributes, useful to characterize and describe the surveyed object. The aim of this study is to propose an optimal, repeatable and reliable procedure to manage various types of 3D surveying data and associate them with heterogeneous information and attributes to characterize and describe the surveyed object. In particular, this paper presents an approach for classifying 3D heritage models, starting from the segmentation of their textures based on supervised machine learning methods. Experimental results run on three different case studies demonstrate that the proposed approach is effective and with many further potentials.

  15. Linear Subpixel Learning Algorithm for Land Cover Classification from WELD using High Performance Computing

    Science.gov (United States)

    Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.

    2017-12-01

    In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.

  16. A Cross-Correlated Delay Shift Supervised Learning Method for Spiking Neurons with Application to Interictal Spike Detection in Epilepsy.

    Science.gov (United States)

    Guo, Lilin; Wang, Zhenzhong; Cabrerizo, Mercedes; Adjouadi, Malek

    2017-05-01

    This study introduces a novel learning algorithm for spiking neurons, called CCDS, which is able to learn and reproduce arbitrary spike patterns in a supervised fashion allowing the processing of spatiotemporal information encoded in the precise timing of spikes. Unlike the Remote Supervised Method (ReSuMe), synapse delays and axonal delays in CCDS are variants which are modulated together with weights during learning. The CCDS rule is both biologically plausible and computationally efficient. The properties of this learning rule are investigated extensively through experimental evaluations in terms of reliability, adaptive learning performance, generality to different neuron models, learning in the presence of noise, effects of its learning parameters and classification performance. Results presented show that the CCDS learning method achieves learning accuracy and learning speed comparable with ReSuMe, but improves classification accuracy when compared to both the Spike Pattern Association Neuron (SPAN) learning rule and the Tempotron learning rule. The merit of CCDS rule is further validated on a practical example involving the automated detection of interictal spikes in EEG records of patients with epilepsy. Results again show that with proper encoding, the CCDS rule achieves good recognition performance.

  17. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification

    Directory of Open Access Journals (Sweden)

    D. Ramyachitra

    2015-09-01

    Full Text Available Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM, K-nearest neighbor (KNN, Interval Valued Classification (IVC and the improvised Interval Value based Particle Swarm Optimization (IVPSO algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  18. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification.

    Science.gov (United States)

    Ramyachitra, D; Sofia, M; Manikandan, P

    2015-09-01

    Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM), K-nearest neighbor (KNN), Interval Valued Classification (IVC) and the improvised Interval Value based Particle Swarm Optimization (IVPSO) algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  19. Pervasive Sound Sensing: A Weakly Supervised Training Approach.

    Science.gov (United States)

    Kelly, Daniel; Caulfield, Brian

    2016-01-01

    Modern smartphones present an ideal device for pervasive sensing of human behavior. Microphones have the potential to reveal key information about a person's behavior. However, they have been utilized to a significantly lesser extent than other smartphone sensors in the context of human behavior sensing. We postulate that, in order for microphones to be useful in behavior sensing applications, the analysis techniques must be flexible and allow easy modification of the types of sounds to be sensed. A simplification of the training data collection process could allow a more flexible sound classification framework. We hypothesize that detailed training, a prerequisite for the majority of sound sensing techniques, is not necessary and that a significantly less detailed and time consuming data collection process can be carried out, allowing even a nonexpert to conduct the collection, labeling, and training process. To test this hypothesis, we implement a diverse density-based multiple instance learning framework, to identify a target sound, and a bag trimming algorithm, which, using the target sound, automatically segments weakly labeled sound clips to construct an accurate training set. Experiments reveal that our hypothesis is a valid one and results show that classifiers, trained using the automatically segmented training sets, were able to accurately classify unseen sound samples with accuracies comparable to supervised classifiers, achieving an average F -measure of 0.969 and 0.87 for two weakly supervised datasets.

  20. Laser-induced breakdown spectroscopy-based investigation and classification of pharmaceutical tablets using multivariate chemometric analysis

    Science.gov (United States)

    Myakalwar, Ashwin Kumar; Sreedhar, S.; Barman, Ishan; Dingari, Narahara Chari; Rao, S. Venugopal; Kiran, P. Prem; Tewari, Surya P.; Kumar, G. Manoj

    2012-01-01

    We report the effectiveness of laser-induced breakdown spectroscopy (LIBS) in probing the content of pharmaceutical tablets and also investigate its feasibility for routine classification. This method is particularly beneficial in applications where its exquisite chemical specificity and suitability for remote and on site characterization significantly improves the speed and accuracy of quality control and assurance process. Our experiments reveal that in addition to the presence of carbon, hydrogen, nitrogen and oxygen, which can be primarily attributed to the active pharmaceutical ingredients, specific inorganic atoms were also present in all the tablets. Initial attempts at classification by a ratiometric approach using oxygen to nitrogen compositional values yielded an optimal value (at 746.83 nm) with the least relative standard deviation but nevertheless failed to provide an acceptable classification. To overcome this bottleneck in the detection process, two chemometric algorithms, i.e. principal component analysis (PCA) and soft independent modeling of class analogy (SIMCA), were implemented to exploit the multivariate nature of the LIBS data demonstrating that LIBS has the potential to differentiate and discriminate among pharmaceutical tablets. We report excellent prospective classification accuracy using supervised classification via the SIMCA algorithm, demonstrating its potential for future applications in process analytical technology, especially for fast on-line process control monitoring applications in the pharmaceutical industry. PMID:22099648

  1. Application of semi-supervised deep learning to lung sound analysis.

    Science.gov (United States)

    Chamberlain, Daniel; Kodgule, Rahul; Ganelin, Daniela; Miglani, Vivek; Fletcher, Richard Ribon

    2016-08-01

    The analysis of lung sounds, collected through auscultation, is a fundamental component of pulmonary disease diagnostics for primary care and general patient monitoring for telemedicine. Despite advances in computation and algorithms, the goal of automated lung sound identification and classification has remained elusive. Over the past 40 years, published work in this field has demonstrated only limited success in identifying lung sounds, with most published studies using only a small numbers of patients (typically Ndeep learning algorithm for automatically classify lung sounds from a relatively large number of patients (N=284). Focusing on the two most common lung sounds, wheeze and crackle, we present results from 11,627 sound files recorded from 11 different auscultation locations on these 284 patients with pulmonary disease. 890 of these sound files were labeled to evaluate the model, which is significantly larger than previously published studies. Data was collected with a custom mobile phone application and a low-cost (US$30) electronic stethoscope. On this data set, our algorithm achieves ROC curves with AUCs of 0.86 for wheeze and 0.74 for crackle. Most importantly, this study demonstrates how semi-supervised deep learning can be used with larger data sets without requiring extensive labeling of data.

  2. Pixel Classification of SAR ice images using ANFIS-PSO Classifier

    Directory of Open Access Journals (Sweden)

    G. Vasumathi

    2016-12-01

    Full Text Available Synthetic Aperture Radar (SAR is playing a vital role in taking extremely high resolution radar images. It is greatly used to monitor the ice covered ocean regions. Sea monitoring is important for various purposes which includes global climate systems and ship navigation. Classification on the ice infested area gives important features which will be further useful for various monitoring process around the ice regions. Main objective of this paper is to classify the SAR ice image that helps in identifying the regions around the ice infested areas. In this paper three stages are considered in classification of SAR ice images. It starts with preprocessing in which the speckled SAR ice images are denoised using various speckle removal filters; comparison is made on all these filters to find the best filter in speckle removal. Second stage includes segmentation in which different regions are segmented using K-means and watershed segmentation algorithms; comparison is made between these two algorithms to find the best in segmenting SAR ice images. The last stage includes pixel based classification which identifies and classifies the segmented regions using various supervised learning classifiers. The algorithms includes Back propagation neural networks (BPN, Fuzzy Classifier, Adaptive Neuro Fuzzy Inference Classifier (ANFIS classifier and proposed ANFIS with Particle Swarm Optimization (PSO classifier; comparison is made on all these classifiers to propose which classifier is best suitable for classifying the SAR ice image. Various evaluation metrics are performed separately at all these three stages.

  3. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy

    Science.gov (United States)

    2017-01-01

    Background Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor’s activity for the purposes of quality assurance, safety, and continuing professional development. Objective The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors’ professional performance in the United Kingdom. Methods We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians’ colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Results Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to “popular” (recall=.97), “innovator” (recall=.98), and “respected” (recall=.87) codes and was lower for the “interpersonal” (recall=.80) and “professional” (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as “respected,” “professional,” and “interpersonal” related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Conclusions Machine learning algorithms can classify open-text feedback

  4. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy.

    Science.gov (United States)

    Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John

    2017-03-15

    Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (P.05). Machine learning algorithms can classify open-text feedback of doctor performance into multiple themes derived by human raters with high

  5. Algorithmic acquisition of diagnostic patterns in district heating billing system

    International Nuclear Information System (INIS)

    Kiluk, Sebastian

    2012-01-01

    An application of algorithmic exploration of billing data is examined for fault detection, diagnosis (FDD) based on evaluation of present state and detection of unexpected changes in energy efficiency of buildings. Large data sets from district heating (DH) billing systems are used for construction of feature space, diagnostic rules and classification of the buildings according to their energy efficiency properties. The algorithmic approach automates discovering knowledge about common, thus accepted changes in buildings’ properties, in equipment and in habitants’ behavior reflecting progress in technology and life style. In this article implementation of Data Mining and Knowledge Discovery (DMKD) method in supervision system with exemplary results based on real data is presented. Crucial steps of data processing influencing diagnostic results are described in details.

  6. Latent class models for classification

    NARCIS (Netherlands)

    Vermunt, J.K.; Magidson, J.

    2003-01-01

    An overview is provided of recent developments in the use of latent class (LC) and other types of finite mixture models for classification purposes. Several extensions of existing models are presented. Two basic types of LC models for classification are defined: supervised and unsupervised

  7. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    Science.gov (United States)

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  8. Automated classification of cell morphology by coherence-controlled holographic microscopy

    Science.gov (United States)

    Strbkova, Lenka; Zicha, Daniel; Vesely, Pavel; Chmelik, Radim

    2017-08-01

    In the last few years, classification of cells by machine learning has become frequently used in biology. However, most of the approaches are based on morphometric (MO) features, which are not quantitative in terms of cell mass. This may result in poor classification accuracy. Here, we study the potential contribution of coherence-controlled holographic microscopy enabling quantitative phase imaging for the classification of cell morphologies. We compare our approach with the commonly used method based on MO features. We tested both classification approaches in an experiment with nutritionally deprived cancer tissue cells, while employing several supervised machine learning algorithms. Most of the classifiers provided higher performance when quantitative phase features were employed. Based on the results, it can be concluded that the quantitative phase features played an important role in improving the performance of the classification. The methodology could be valuable help in refining the monitoring of live cells in an automated fashion. We believe that coherence-controlled holographic microscopy, as a tool for quantitative phase imaging, offers all preconditions for the accurate automated analysis of live cell behavior while enabling noninvasive label-free imaging with sufficient contrast and high-spatiotemporal phase sensitivity.

  9. Sequential Classification of Palm Gestures Based on A* Algorithm and MLP Neural Network for Quadrocopter Control

    Directory of Open Access Journals (Sweden)

    Wodziński Marek

    2017-06-01

    Full Text Available This paper presents an alternative approach to the sequential data classification, based on traditional machine learning algorithms (neural networks, principal component analysis, multivariate Gaussian anomaly detector and finding the shortest path in a directed acyclic graph, using A* algorithm with a regression-based heuristic. Palm gestures were used as an example of the sequential data and a quadrocopter was the controlled object. The study includes creation of a conceptual model and practical construction of a system using the GPU to ensure the realtime operation. The results present the classification accuracy of chosen gestures and comparison of the computation time between the CPU- and GPU-based solutions.

  10. A Constructive Data Classification Version of the Particle Swarm Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Alexandre Szabo

    2013-01-01

    Full Text Available The particle swarm optimization algorithm was originally introduced to solve continuous parameter optimization problems. It was soon modified to solve other types of optimization tasks and also to be applied to data analysis. In the latter case, however, there are few works in the literature that deal with the problem of dynamically building the architecture of the system. This paper introduces new particle swarm algorithms specifically designed to solve classification problems. The first proposal, named Particle Swarm Classifier (PSClass, is a derivation of a particle swarm clustering algorithm and its architecture, as in most classifiers, is pre-defined. The second proposal, named Constructive Particle Swarm Classifier (cPSClass, uses ideas from the immune system to automatically build the swarm. A sensitivity analysis of the growing procedure of cPSClass and an investigation into a proposed pruning procedure for this algorithm are performed. The proposals were applied to a wide range of databases from the literature and the results show that they are competitive in relation to other approaches, with the advantage of having a dynamically constructed architecture.

  11. Semi-supervised clustering methods

    Science.gov (United States)

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830

  12. Arrangement and Applying of Movement Patterns in the Cerebellum Based on Semi-supervised Learning.

    Science.gov (United States)

    Solouki, Saeed; Pooyan, Mohammad

    2016-06-01

    Biological control systems have long been studied as a possible inspiration for the construction of robotic controllers. The cerebellum is known to be involved in the production and learning of smooth, coordinated movements. Therefore, highly regular structure of the cerebellum has been in the core of attention in theoretical and computational modeling. However, most of these models reflect some special features of the cerebellum without regarding the whole motor command computational process. In this paper, we try to make a logical relation between the most significant models of the cerebellum and introduce a new learning strategy to arrange the movement patterns: cerebellar modular arrangement and applying of movement patterns based on semi-supervised learning (CMAPS). We assume here the cerebellum like a big archive of patterns that has an efficient organization to classify and recall them. The main idea is to achieve an optimal use of memory locations by more than just a supervised learning and classification algorithm. Surely, more experimental and physiological researches are needed to confirm our hypothesis.

  13. Semi-supervised sparse coding

    KAUST Repository

    Wang, Jim Jing-Yan; Gao, Xin

    2014-01-01

    Sparse coding approximates the data sample as a sparse linear combination of some basic codewords and uses the sparse codes as new presentations. In this paper, we investigate learning discriminative sparse codes by sparse coding in a semi-supervised manner, where only a few training samples are labeled. By using the manifold structure spanned by the data set of both labeled and unlabeled samples and the constraints provided by the labels of the labeled samples, we learn the variable class labels for all the samples. Furthermore, to improve the discriminative ability of the learned sparse codes, we assume that the class labels could be predicted from the sparse codes directly using a linear classifier. By solving the codebook, sparse codes, class labels and classifier parameters simultaneously in a unified objective function, we develop a semi-supervised sparse coding algorithm. Experiments on two real-world pattern recognition problems demonstrate the advantage of the proposed methods over supervised sparse coding methods on partially labeled data sets.

  14. Semi-supervised sparse coding

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-07-06

    Sparse coding approximates the data sample as a sparse linear combination of some basic codewords and uses the sparse codes as new presentations. In this paper, we investigate learning discriminative sparse codes by sparse coding in a semi-supervised manner, where only a few training samples are labeled. By using the manifold structure spanned by the data set of both labeled and unlabeled samples and the constraints provided by the labels of the labeled samples, we learn the variable class labels for all the samples. Furthermore, to improve the discriminative ability of the learned sparse codes, we assume that the class labels could be predicted from the sparse codes directly using a linear classifier. By solving the codebook, sparse codes, class labels and classifier parameters simultaneously in a unified objective function, we develop a semi-supervised sparse coding algorithm. Experiments on two real-world pattern recognition problems demonstrate the advantage of the proposed methods over supervised sparse coding methods on partially labeled data sets.

  15. Development of an algorithm for heartbeats detection and classification in Holter records based on temporal and morphological features

    International Nuclear Information System (INIS)

    García, A; Romano, H; Laciar, E; Correa, R

    2011-01-01

    In this work a detection and classification algorithm for heartbeats analysis in Holter records was developed. First, a QRS complexes detector was implemented and their temporal and morphological characteristics were extracted. A vector was built with these features; this vector is the input of the classification module, based on discriminant analysis. The beats were classified in three groups: Premature Ventricular Contraction beat (PVC), Atrial Premature Contraction beat (APC) and Normal Beat (NB). These beat categories represent the most important groups of commercial Holter systems. The developed algorithms were evaluated in 76 ECG records of two validated open-access databases 'arrhythmias MIT BIH database' and M IT BIH supraventricular arrhythmias database . A total of 166343 beats were detected and analyzed, where the QRS detection algorithm provides a sensitivity of 99.69 % and a positive predictive value of 99.84 %. The classification stage gives sensitivities of 97.17% for NB, 97.67% for PCV and 92.78% for APC.

  16. Preliminary hard and soft bottom seafloor substrate map derived from an supervised classification of bathymetry derived from multispectral World View-2 satellite imagery of Ni'ihau Island, Territory of Main Hawaiian Islands, USA

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Preliminary hard and soft seafloor substrate map derived from a supervised classification from multispectral World View-2 satellite imagery of Ni'ihau Island,...

  17. Feature-space transformation improves supervised segmentation across scanners

    DEFF Research Database (Denmark)

    van Opbroek, Annegreet; Achterberg, Hakim C.; de Bruijne, Marleen

    2015-01-01

    Image-segmentation techniques based on supervised classification generally perform well on the condition that training and test samples have the same feature distribution. However, if training and test images are acquired with different scanners or scanning parameters, their feature distributions...

  18. Classification and data acquisition with incomplete data

    Science.gov (United States)

    Williams, David P.

    closely related problem of active data acquisition, which develops a strategy to acquire missing features and labels that will most benefit the classification task. We first address the general problem of classification with incomplete data, maintaining the view that all data (i.e., information) is valuable. We employ a logistic regression framework within which we formulate a supervised classification algorithm for incomplete data. This principled, yet flexible, framework permits several interesting extensions that allow all available data to be utilized. One extension incorporates labeling error, which permits the usage of potentially imperfectly labeled data in learning a classifier. A second major extension converts the proposed algorithm to a semi-supervised approach by utilizing unlabeled data via graph-based regularization. Finally, the classification algorithm is extended to the case in which (image) data---from which features are extracted---are available from multiple resolutions. Taken together, this family of incomplete-data classification algorithms exploits all available data in a principled manner by avoiding explicit imputation. Instead, missing data is integrated out analytically with the aid of an estimated conditional density function (conditioned on the observed features). This feat is accomplished by invoking only mild assumptions. We also address the problem of active data acquisition by determining which missing data should be acquired to most improve performance. Specifically, we examine this data acquisition task when the data to be acquired can be either labels or features. The proposed approach is based on a criterion that accounts for the expected benefit of the acquisition. This approach, which is applicable for any general missing data problem, exploits the incomplete-data classification framework introduced in the first part of this dissertation. This data acquisition approach allows for the acquisition of both labels and features. Moreover

  19. Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification.

    Science.gov (United States)

    Wen, Zaidao; Hou, Biao; Jiao, Licheng

    2017-05-03

    Linear synthesis model based dictionary learning framework has achieved remarkable performances in image classification in the last decade. Behaved as a generative feature model, it however suffers from some intrinsic deficiencies. In this paper, we propose a novel parametric nonlinear analysis cosparse model (NACM) with which a unique feature vector will be much more efficiently extracted. Additionally, we derive a deep insight to demonstrate that NACM is capable of simultaneously learning the task adapted feature transformation and regularization to encode our preferences, domain prior knowledge and task oriented supervised information into the features. The proposed NACM is devoted to the classification task as a discriminative feature model and yield a novel discriminative nonlinear analysis operator learning framework (DNAOL). The theoretical analysis and experimental performances clearly demonstrate that DNAOL will not only achieve the better or at least competitive classification accuracies than the state-of-the-art algorithms but it can also dramatically reduce the time complexities in both training and testing phases.

  20. Spectral Learning for Supervised Topic Models.

    Science.gov (United States)

    Ren, Yong; Wang, Yining; Zhu, Jun

    2018-03-01

    Supervised topic models simultaneously model the latent topic structure of large collections of documents and a response variable associated with each document. Existing inference methods are based on variational approximation or Monte Carlo sampling, which often suffers from the local minimum defect. Spectral methods have been applied to learn unsupervised topic models, such as latent Dirichlet allocation (LDA), with provable guarantees. This paper investigates the possibility of applying spectral methods to recover the parameters of supervised LDA (sLDA). We first present a two-stage spectral method, which recovers the parameters of LDA followed by a power update method to recover the regression model parameters. Then, we further present a single-phase spectral algorithm to jointly recover the topic distribution matrix as well as the regression weights. Our spectral algorithms are provably correct and computationally efficient. We prove a sample complexity bound for each algorithm and subsequently derive a sufficient condition for the identifiability of sLDA. Thorough experiments on synthetic and real-world datasets verify the theory and demonstrate the practical effectiveness of the spectral algorithms. In fact, our results on a large-scale review rating dataset demonstrate that our single-phase spectral algorithm alone gets comparable or even better performance than state-of-the-art methods, while previous work on spectral methods has rarely reported such promising performance.

  1. Improving the potential of pixel-based supervised classification in ...

    African Journals Online (AJOL)

    The goal of this paper was to describe the impact of various parameters when applying a supervised Maximum Likelihood Classifier (MLC) to SPOT 5 image analysis in a remote savanna biome. Pair separation indicators and probability thresholds were used to analyse the effect of training area size and heterogeneity as ...

  2. A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms.

    Directory of Open Access Journals (Sweden)

    Amir H Beiki

    Full Text Available Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8 and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13 selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176 induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy.

  3. Multispectral imaging burn wound tissue classification system: a comparison of test accuracies between several common machine learning algorithms

    Science.gov (United States)

    Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.

    2016-03-01

    The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care

  4. Classification and learning using genetic algorithms applications in Bioinformatics and Web Intelligence

    CERN Document Server

    Bandyopadhyay, Sanghamitra

    2007-01-01

    This book provides a unified framework that describes how genetic learning can be used to design pattern recognition and learning systems. It examines how a search technique, the genetic algorithm, can be used for pattern classification mainly through approximating decision boundaries. Coverage also demonstrates the effectiveness of the genetic classifiers vis-à-vis several widely used classifiers, including neural networks.

  5. Photometric classification of type Ia supernovae in the SuperNova Legacy Survey with supervised learning

    Energy Technology Data Exchange (ETDEWEB)

    Möller, A. [Research School of Astronomy and Astrophysics, Australian National University, Canberra, ACT 2611 (Australia); Ruhlmann-Kleider, V.; Leloup, C.; Neveu, J.; Palanque-Delabrouille, N.; Rich, J. [Irfu, SPP, CEA Saclay, F-91191 Gif sur Yvette Cedex (France); Carlberg, R. [Department of Astronomy and Astrophysics, University of Toronto, 50 St. George Street, Toronto, ON M5S 3H8 (Canada); Lidman, C. [Australian Astronomical Observatory, North Ryde, NSW 2113 (Australia); Pritchet, C., E-mail: anais.moller@anu.edu.au, E-mail: vanina.ruhlmann-kleider@cea.fr, E-mail: clement.leloup@cea.fr, E-mail: jneveu@lal.in2p3.fr, E-mail: nathalie.palanque-delabrouille@cea.fr, E-mail: james.rich@cea.fr, E-mail: raymond.carlberg@utoronto.ca, E-mail: chris.lidman@aao.gov.au, E-mail: pritchet@uvic.ca [Department of Physics and Astronomy, University of Victoria, P.O. Box 3055, Victoria, BC V8W 3P6 (Canada)

    2016-12-01

    In the era of large astronomical surveys, photometric classification of supernovae (SNe) has become an important research field due to limited spectroscopic resources for candidate follow-up and classification. In this work, we present a method to photometrically classify type Ia supernovae based on machine learning with redshifts that are derived from the SN light-curves. This method is implemented on real data from the SNLS deferred pipeline, a purely photometric pipeline that identifies SNe Ia at high-redshifts (0.2 < z < 1.1). Our method consists of two stages: feature extraction (obtaining the SN redshift from photometry and estimating light-curve shape parameters) and machine learning classification. We study the performance of different algorithms such as Random Forest and Boosted Decision Trees. We evaluate the performance using SN simulations and real data from the first 3 years of the Supernova Legacy Survey (SNLS), which contains large spectroscopically and photometrically classified type Ia samples. Using the Area Under the Curve (AUC) metric, where perfect classification is given by 1, we find that our best-performing classifier (Extreme Gradient Boosting Decision Tree) has an AUC of 0.98.We show that it is possible to obtain a large photometrically selected type Ia SN sample with an estimated contamination of less than 5%. When applied to data from the first three years of SNLS, we obtain 529 events. We investigate the differences between classifying simulated SNe, and real SN survey data. In particular, we find that applying a thorough set of selection cuts to the SN sample is essential for good classification. This work demonstrates for the first time the feasibility of machine learning classification in a high- z SN survey with application to real SN data.

  6. Photometric classification of type Ia supernovae in the SuperNova Legacy Survey with supervised learning

    International Nuclear Information System (INIS)

    Möller, A.; Ruhlmann-Kleider, V.; Leloup, C.; Neveu, J.; Palanque-Delabrouille, N.; Rich, J.; Carlberg, R.; Lidman, C.; Pritchet, C.

    2016-01-01

    In the era of large astronomical surveys, photometric classification of supernovae (SNe) has become an important research field due to limited spectroscopic resources for candidate follow-up and classification. In this work, we present a method to photometrically classify type Ia supernovae based on machine learning with redshifts that are derived from the SN light-curves. This method is implemented on real data from the SNLS deferred pipeline, a purely photometric pipeline that identifies SNe Ia at high-redshifts (0.2 < z < 1.1). Our method consists of two stages: feature extraction (obtaining the SN redshift from photometry and estimating light-curve shape parameters) and machine learning classification. We study the performance of different algorithms such as Random Forest and Boosted Decision Trees. We evaluate the performance using SN simulations and real data from the first 3 years of the Supernova Legacy Survey (SNLS), which contains large spectroscopically and photometrically classified type Ia samples. Using the Area Under the Curve (AUC) metric, where perfect classification is given by 1, we find that our best-performing classifier (Extreme Gradient Boosting Decision Tree) has an AUC of 0.98.We show that it is possible to obtain a large photometrically selected type Ia SN sample with an estimated contamination of less than 5%. When applied to data from the first three years of SNLS, we obtain 529 events. We investigate the differences between classifying simulated SNe, and real SN survey data. In particular, we find that applying a thorough set of selection cuts to the SN sample is essential for good classification. This work demonstrates for the first time the feasibility of machine learning classification in a high- z SN survey with application to real SN data.

  7. Data classification using metaheuristic Cuckoo Search technique for Levenberg Marquardt back propagation (CSLM) algorithm

    Science.gov (United States)

    Nawi, Nazri Mohd.; Khan, Abdullah; Rehman, M. Z.

    2015-05-01

    A nature inspired behavior metaheuristic techniques which provide derivative-free solutions to solve complex problems. One of the latest additions to the group of nature inspired optimization procedure is Cuckoo Search (CS) algorithm. Artificial Neural Network (ANN) training is an optimization task since it is desired to find optimal weight set of a neural network in training process. Traditional training algorithms have some limitation such as getting trapped in local minima and slow convergence rate. This study proposed a new technique CSLM by combining the best features of two known algorithms back-propagation (BP) and Levenberg Marquardt algorithm (LM) for improving the convergence speed of ANN training and avoiding local minima problem by training this network. Some selected benchmark classification datasets are used for simulation. The experiment result show that the proposed cuckoo search with Levenberg Marquardt algorithm has better performance than other algorithm used in this study.

  8. Unraveling cognitive traits using the Morris water maze unbiased strategy classification (MUST-C) algorithm.

    Science.gov (United States)

    Illouz, Tomer; Madar, Ravit; Louzon, Yoram; Griffioen, Kathleen J; Okun, Eitan

    2016-02-01

    The assessment of spatial cognitive learning in rodents is a central approach in neuroscience, as it enables one to assess and quantify the effects of treatments and genetic manipulations from a broad perspective. Although the Morris water maze (MWM) is a well-validated paradigm for testing spatial learning abilities, manual categorization of performance in the MWM into behavioral strategies is subject to individual interpretation, and thus to biases. Here we offer a support vector machine (SVM) - based, automated, MWM unbiased strategy classification (MUST-C) algorithm, as well as a cognitive score scale. This model was examined and validated by analyzing data obtained from five MWM experiments with changing platform sizes, revealing a limitation in the spatial capacity of the hippocampus. We have further employed this algorithm to extract novel mechanistic insights on the impact of members of the Toll-like receptor pathway on cognitive spatial learning and memory. The MUST-C algorithm can greatly benefit MWM users as it provides a standardized method of strategy classification as well as a cognitive scoring scale, which cannot be derived from typical analysis of MWM data. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  9. A Hybrid Multiobjective Differential Evolution Algorithm and Its Application to the Optimization of Grinding and Classification

    Directory of Open Access Journals (Sweden)

    Yalin Wang

    2013-01-01

    Full Text Available The grinding-classification is the prerequisite process for full recovery of the nonrenewable minerals with both production quality and quantity objectives concerned. Its natural formulation is a constrained multiobjective optimization problem of complex expression since the process is composed of one grinding machine and two classification machines. In this paper, a hybrid differential evolution (DE algorithm with multi-population is proposed. Some infeasible solutions with better performance are allowed to be saved, and they participate randomly in the evolution. In order to exploit the meaningful infeasible solutions, a functionally partitioned multi-population mechanism is designed to find an optimal solution from all possible directions. Meanwhile, a simplex method for local search is inserted into the evolution process to enhance the searching strategy in the optimization process. Simulation results from the test of some benchmark problems indicate that the proposed algorithm tends to converge quickly and effectively to the Pareto frontier with better distribution. Finally, the proposed algorithm is applied to solve a multiobjective optimization model of a grinding and classification process. Based on the technique for order performance by similarity to ideal solution (TOPSIS, the satisfactory solution is obtained by using a decision-making method for multiple attributes.

  10. Subsampled Hessian Newton Methods for Supervised Learning.

    Science.gov (United States)

    Wang, Chien-Chih; Huang, Chun-Heng; Lin, Chih-Jen

    2015-08-01

    Newton methods can be applied in many supervised learning approaches. However, for large-scale data, the use of the whole Hessian matrix can be time-consuming. Recently, subsampled Newton methods have been proposed to reduce the computational time by using only a subset of data for calculating an approximation of the Hessian matrix. Unfortunately, we find that in some situations, the running speed is worse than the standard Newton method because cheaper but less accurate search directions are used. In this work, we propose some novel techniques to improve the existing subsampled Hessian Newton method. The main idea is to solve a two-dimensional subproblem per iteration to adjust the search direction to better minimize the second-order approximation of the function value. We prove the theoretical convergence of the proposed method. Experiments on logistic regression, linear SVM, maximum entropy, and deep networks indicate that our techniques significantly reduce the running time of the subsampled Hessian Newton method. The resulting algorithm becomes a compelling alternative to the standard Newton method for large-scale data classification.

  11. Prediction of lung cancer patient survival via supervised machine learning classification techniques.

    Science.gov (United States)

    Lynch, Chip M; Abdollahi, Behnaz; Fuqua, Joshua D; de Carlo, Alexandra R; Bartholomai, James A; Balgemann, Rayeanne N; van Berkel, Victor H; Frieboes, Hermann B

    2017-12-01

    Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time

  12. Performance of fusion algorithms for computer-aided detection and classification of mines in very shallow water obtained from testing in navy Fleet Battle Exercise-Hotel 2000

    Science.gov (United States)

    Ciany, Charles M.; Zurawski, William; Kerfoot, Ian

    2001-10-01

    The performance of Computer Aided Detection/Computer Aided Classification (CAD/CAC) Fusion algorithms on side-scan sonar images was evaluated using data taken at the Navy's's Fleet Battle Exercise-Hotel held in Panama City, Florida, in August 2000. A 2-of-3 binary fusion algorithm is shown to provide robust performance. The algorithm accepts the classification decisions and associated contact locations form three different CAD/CAC algorithms, clusters the contacts based on Euclidian distance, and then declares a valid target when a clustered contact is declared by at least 2 of the 3 individual algorithms. This simple binary fusion provided a 96 percent probability of correct classification at a false alarm rate of 0.14 false alarms per image per side. The performance represented a 3.8:1 reduction in false alarms over the best performing single CAD/CAC algorithm, with no loss in probability of correct classification.

  13. 7 CFR 27.16 - Inspection; weighing; samples; supervision.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Inspection; weighing; samples; supervision. 27.16 Section 27.16 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards, Inspections, Marketing Practices), DEPARTMENT OF AGRICULTURE COMMODITY STANDARDS AND STANDARD CONTAINER REGULATIONS COTTON CLASSIFICATION...

  14. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  15. Fuzzy Expert System based on a Novel Hybrid Stem Cell (HSC) Algorithm for Classification of Micro Array Data.

    Science.gov (United States)

    Vijay, S Arul Antran; GaneshKumar, P

    2018-02-21

    In the growing scenario, microarray data is extensively used since it provides a more comprehensive understanding of genetic variants among diseases. As the gene expression samples have high dimensionality it becomes tedious to analyze the samples manually. Hence an automated system is needed to analyze these samples. The fuzzy expert system offers a clear classification when compared to the machine learning and statistical methodologies. In fuzzy classification, knowledge acquisition would be a major concern. Despite several existing approaches for knowledge acquisition much effort is necessary to enhance the learning process. This paper proposes an innovative Hybrid Stem Cell (HSC) algorithm that utilizes Ant Colony optimization and Stem Cell algorithm for designing fuzzy classification system to extract the informative rules to form the membership functions from the microarray dataset. The HSC algorithm uses a novel Adaptive Stem Cell Optimization (ASCO) to improve the points of membership function and Ant Colony Optimization to produce the near optimum rule set. In order to extract the most informative genes from the large microarray dataset a method called Mutual Information is used. The performance results of the proposed technique evaluated using the five microarray datasets are simulated. These results prove that the proposed Hybrid Stem Cell (HSC) algorithm produces a precise fuzzy system than the existing methodologies.

  16. Geologic mapping using LANDSAT data

    Science.gov (United States)

    Siegal, B. S.; Abrams, M. J.

    1976-01-01

    The feasibility of automated classification for lithologic mapping with LANDSAT digital data was evaluated using three classification algorithms. The two supervised algorithms analyzed, a linear discriminant analysis algorithm and a hybrid algorithm which incorporated the Parallelepiped algorithm and the Bayesian maximum likelihood function, were comparable in terms of accuracy; however, classification was only 50 per cent accurate. The linear discriminant analysis algorithm was three times as efficient as the hybrid approach. The unsupervised classification technique, which incorporated the CLUS algorithm, delineated the major lithologic boundaries and, in general, correctly classified the most prominent geologic units. The unsupervised algorithm was not as efficient nor as accurate as the supervised algorithms. Analysis of spectral data for the lithologic units in the 0.4 to 2.5 microns region indicated that a greater separability of the spectral signatures could be obtained using wavelength bands outside the region sensed by LANDSAT.

  17. Applying Active Learning to Assertion Classification of Concepts in Clinical Text

    Science.gov (United States)

    Chen, Yukun; Mani, Subramani; Xu, Hua

    2012-01-01

    Supervised machine learning methods for clinical natural language processing (NLP) research require a large number of annotated samples, which are very expensive to build because of the involvement of physicians. Active learning, an approach that actively samples from a large pool, provides an alternative solution. Its major goal in classification is to reduce the annotation effort while maintaining the quality of the predictive model. However, few studies have investigated its uses in clinical NLP. This paper reports an application of active learning to a clinical text classification task: to determine the assertion status of clinical concepts. The annotated corpus for the assertion classification task in the 2010 i2b2/VA Clinical NLP Challenge was used in this study. We implemented several existing and newly developed active learning algorithms and assessed their uses. The outcome is reported in the global ALC score, based on the Area under the average Learning Curve of the AUC (Area Under the Curve) score. Results showed that when the same number of annotated samples was used, active learning strategies could generate better classification models (best ALC – 0.7715) than the passive learning method (random sampling) (ALC – 0.7411). Moreover, to achieve the same classification performance, active learning strategies required fewer samples than the random sampling method. For example, to achieve an AUC of 0.79, the random sampling method used 32 samples, while our best active learning algorithm required only 12 samples, a reduction of 62.5% in manual annotation effort. PMID:22127105

  18. SPAM CLASSIFICATION BASED ON SUPERVISED LEARNING USING MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    T. Hamsapriya

    2011-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Spam emails are invading users without their consent and filling their mail boxes. They consume more network capacity as well as time in checking and deleting spam mails. The vast majority of Internet users are outspoken in their disdain for spam, although enough of them respond to commercial offers that spam remains a viable source of income to spammers. While most of the users want to do right think to avoid and get rid of spam, they need clear and simple guidelines on how to behave. In spite of all the measures taken to eliminate spam, they are not yet eradicated. Also when the counter measures are over sensitive, even legitimate emails will be eliminated. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifier-related issues. In recent days, Machine learning for spam classification is an important research issue. The effectiveness of the proposed work is explores and identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysis among the algorithms has also been presented.

  19. Information-theoretic semi-supervised metric learning via entropy regularization.

    Science.gov (United States)

    Niu, Gang; Dai, Bo; Yamada, Makoto; Sugiyama, Masashi

    2014-08-01

    We propose a general information-theoretic approach to semi-supervised metric learning called SERAPH (SEmi-supervised metRic leArning Paradigm with Hypersparsity) that does not rely on the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize its entropy on labeled data and minimize its entropy on unlabeled data following entropy regularization. For metric learning, entropy regularization improves manifold regularization by considering the dissimilarity information of unlabeled data in the unsupervised part, and hence it allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Moreover, we regularize SERAPH by trace-norm regularization to encourage low-dimensional projections associated with the distance metric. The nonconvex optimization problem of SERAPH could be solved efficiently and stably by either a gradient projection algorithm or an EM-like iterative algorithm whose M-step is convex. Experiments demonstrate that SERAPH compares favorably with many well-known metric learning methods, and the learned Mahalanobis distance possesses high discriminability even under noisy environments.

  20. Predicting sample size required for classification performance

    Directory of Open Access Journals (Sweden)

    Figueroa Rosa L

    2012-02-01

    Full Text Available Abstract Background Supervised learning methods need annotated data in order to generate efficient models. Annotated data, however, is a relatively scarce resource and can be expensive to obtain. For both passive and active learning methods, there is a need to estimate the size of the annotated sample required to reach a performance target. Methods We designed and implemented a method that fits an inverse power law model to points of a given learning curve created using a small annotated training set. Fitting is carried out using nonlinear weighted least squares optimization. The fitted model is then used to predict the classifier's performance and confidence interval for larger sample sizes. For evaluation, the nonlinear weighted curve fitting method was applied to a set of learning curves generated using clinical text and waveform classification tasks with active and passive sampling methods, and predictions were validated using standard goodness of fit measures. As control we used an un-weighted fitting method. Results A total of 568 models were fitted and the model predictions were compared with the observed performances. Depending on the data set and sampling method, it took between 80 to 560 annotated samples to achieve mean average and root mean squared error below 0.01. Results also show that our weighted fitting method outperformed the baseline un-weighted method (p Conclusions This paper describes a simple and effective sample size prediction algorithm that conducts weighted fitting of learning curves. The algorithm outperformed an un-weighted algorithm described in previous literature. It can help researchers determine annotation sample size for supervised machine learning.

  1. An evaluation of scanpath-comparison and machine-learning classification algorithms used to study the dynamics of analogy making.

    Science.gov (United States)

    French, Robert M; Glady, Yannick; Thibaut, Jean-Pierre

    2017-08-01

    In recent years, eyetracking has begun to be used to study the dynamics of analogy making. Numerous scanpath-comparison algorithms and machine-learning techniques are available that can be applied to the raw eyetracking data. We show how scanpath-comparison algorithms, combined with multidimensional scaling and a classification algorithm, can be used to resolve an outstanding question in analogy making-namely, whether or not children's and adults' strategies in solving analogy problems are different. (They are.) We show which of these scanpath-comparison algorithms is best suited to the kinds of analogy problems that have formed the basis of much analogy-making research over the years. Furthermore, we use machine-learning classification algorithms to examine the item-to-item saccade vectors making up these scanpaths. We show which of these algorithms best predicts, from very early on in a trial, on the basis of the frequency of various item-to-item saccades, whether a child or an adult is doing the problem. This type of analysis can also be used to predict, on the basis of the item-to-item saccade dynamics in the first third of a trial, whether or not a problem will be solved correctly.

  2. A Comparative Study of Classification and Regression Algorithms for Modelling Students' Academic Performance

    Science.gov (United States)

    Strecht, Pedro; Cruz, Luís; Soares, Carlos; Mendes-Moreira, João; Abreu, Rui

    2015-01-01

    Predicting the success or failure of a student in a course or program is a problem that has recently been addressed using data mining techniques. In this paper we evaluate some of the most popular classification and regression algorithms on this problem. We address two problems: prediction of approval/failure and prediction of grade. The former is…

  3. Columbia Classification Algorithm of Suicide Assessment (C-CASA): classification of suicidal events in the FDA's pediatric suicidal risk analysis of antidepressants.

    Science.gov (United States)

    Posner, Kelly; Oquendo, Maria A; Gould, Madelyn; Stanley, Barbara; Davies, Mark

    2007-07-01

    To evaluate the link between antidepressants and suicidal behavior and ideation (suicidality) in youth, adverse events from pediatric clinical trials were classified in order to identify suicidal events. The authors describe the Columbia Classification Algorithm for Suicide Assessment (C-CASA), a standardized suicidal rating system that provided data for the pediatric suicidal risk analysis of antidepressants conducted by the Food and Drug Administration (FDA). Adverse events (N=427) from 25 pediatric antidepressant clinical trials were systematically identified by pharmaceutical companies. Randomly assigned adverse events were evaluated by three of nine independent expert suicidologists using the Columbia classification algorithm. Reliability of the C-CASA ratings and agreement with pharmaceutical company classification were estimated. Twenty-six new, possibly suicidal events (behavior and ideation) that were not originally identified by pharmaceutical companies were identified in the C-CASA, and 12 events originally labeled as suicidal by pharmaceutical companies were eliminated, which resulted in a total of 38 discrepant ratings. For the specific label of "suicide attempt," a relatively low level of agreement was observed between the C-CASA and pharmaceutical company ratings, with the C-CASA reporting a 50% reduction in ratings. Thus, although the C-CASA resulted in the identification of more suicidal events overall, fewer events were classified as suicide attempts. Additionally, the C-CASA ratings were highly reliable (intraclass correlation coefficient [ICC]=0.89). Utilizing a methodical, anchored approach to categorizing suicidality provides an accurate and comprehensive identification of suicidal events. The FDA's audit of the C-CASA demonstrated excellent transportability of this approach. The Columbia algorithm was used to classify suicidal adverse events in the recent FDA adult antidepressant safety analyses and has also been mandated to be applied to all

  4. pySPACE-a signal processing and classification environment in Python.

    Science.gov (United States)

    Krell, Mario M; Straube, Sirko; Seeland, Anett; Wöhrle, Hendrik; Teiwes, Johannes; Metzen, Jan H; Kirchner, Elsa A; Kirchner, Frank

    2013-01-01

    In neuroscience large amounts of data are recorded to provide insights into cerebral information processing and function. The successful extraction of the relevant signals becomes more and more challenging due to increasing complexities in acquisition techniques and questions addressed. Here, automated signal processing and machine learning tools can help to process the data, e.g., to separate signal and noise. With the presented software pySPACE (http://pyspace.github.io/pyspace), signal processing algorithms can be compared and applied automatically on time series data, either with the aim of finding a suitable preprocessing, or of training supervised algorithms to classify the data. pySPACE originally has been built to process multi-sensor windowed time series data, like event-related potentials from the electroencephalogram (EEG). The software provides automated data handling, distributed processing, modular build-up of signal processing chains and tools for visualization and performance evaluation. Included in the software are various algorithms like temporal and spatial filters, feature generation and selection, classification algorithms, and evaluation schemes. Further, interfaces to other signal processing tools are provided and, since pySPACE is a modular framework, it can be extended with new algorithms according to individual needs. In the presented work, the structural hierarchies are described. It is illustrated how users and developers can interface the software and execute offline and online modes. Configuration of pySPACE is realized with the YAML format, so that programming skills are not mandatory for usage. The concept of pySPACE is to have one comprehensive tool that can be used to perform complete signal processing and classification tasks. It further allows to define own algorithms, or to integrate and use already existing libraries.

  5. Arabic Supervised Learning Method Using N-Gram

    Science.gov (United States)

    Sanan, Majed; Rammal, Mahmoud; Zreik, Khaldoun

    2008-01-01

    Purpose: Recently, classification of Arabic documents is a real problem for juridical centers. In this case, some of the Lebanese official journal documents are classified, and the center has to classify new documents based on these documents. This paper aims to study and explain the useful application of supervised learning method on Arabic texts…

  6. Automated classification of cell morphology by coherence-controlled holographic microscopy.

    Science.gov (United States)

    Strbkova, Lenka; Zicha, Daniel; Vesely, Pavel; Chmelik, Radim

    2017-08-01

    In the last few years, classification of cells by machine learning has become frequently used in biology. However, most of the approaches are based on morphometric (MO) features, which are not quantitative in terms of cell mass. This may result in poor classification accuracy. Here, we study the potential contribution of coherence-controlled holographic microscopy enabling quantitative phase imaging for the classification of cell morphologies. We compare our approach with the commonly used method based on MO features. We tested both classification approaches in an experiment with nutritionally deprived cancer tissue cells, while employing several supervised machine learning algorithms. Most of the classifiers provided higher performance when quantitative phase features were employed. Based on the results, it can be concluded that the quantitative phase features played an important role in improving the performance of the classification. The methodology could be valuable help in refining the monitoring of live cells in an automated fashion. We believe that coherence-controlled holographic microscopy, as a tool for quantitative phase imaging, offers all preconditions for the accurate automated analysis of live cell behavior while enabling noninvasive label-free imaging with sufficient contrast and high-spatiotemporal phase sensitivity. (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).

  7. Better physical activity classification using smartphone acceleration sensor.

    Science.gov (United States)

    Arif, Muhammad; Bilal, Mohsin; Kattan, Ahmed; Ahamed, S Iqbal

    2014-09-01

    Obesity is becoming one of the serious problems for the health of worldwide population. Social interactions on mobile phones and computers via internet through social e-networks are one of the major causes of lack of physical activities. For the health specialist, it is important to track the record of physical activities of the obese or overweight patients to supervise weight loss control. In this study, acceleration sensor present in the smartphone is used to monitor the physical activity of the user. Physical activities including Walking, Jogging, Sitting, Standing, Walking upstairs and Walking downstairs are classified. Time domain features are extracted from the acceleration data recorded by smartphone during different physical activities. Time and space complexity of the whole framework is done by optimal feature subset selection and pruning of instances. Classification results of six physical activities are reported in this paper. Using simple time domain features, 99 % classification accuracy is achieved. Furthermore, attributes subset selection is used to remove the redundant features and to minimize the time complexity of the algorithm. A subset of 30 features produced more than 98 % classification accuracy for the six physical activities.

  8. Seismic waveform classification using deep learning

    Science.gov (United States)

    Kong, Q.; Allen, R. M.

    2017-12-01

    MyShake is a global smartphone seismic network that harnesses the power of crowdsourcing. It has an Artificial Neural Network (ANN) algorithm running on the phone to distinguish earthquake motion from human activities recorded by the accelerometer on board. Once the ANN detects earthquake-like motion, it sends a 5-min chunk of acceleration data back to the server for further analysis. The time-series data collected contains both earthquake data and human activity data that the ANN confused. In this presentation, we will show the Convolutional Neural Network (CNN) we built under the umbrella of supervised learning to find out the earthquake waveform. The waveforms of the recorded motion could treat easily as images, and by taking the advantage of the power of CNN processing the images, we achieved very high successful rate to select the earthquake waveforms out. Since there are many non-earthquake waveforms than the earthquake waveforms, we also built an anomaly detection algorithm using the CNN. Both these two methods can be easily extended to other waveform classification problems.

  9. Classification of EEG-P300 Signals Extracted from Brain Activities in BCI Systems Using ν-SVM and BLDA Algorithms

    Directory of Open Access Journals (Sweden)

    Ali MOMENNEZHAD

    2014-06-01

    Full Text Available In this paper, a linear predictive coding (LPC model is used to improve classification accuracy, convergent speed to maximum accuracy, and maximum bitrates in brain computer interface (BCI system based on extracting EEG-P300 signals. First, EEG signal is filtered in order to eliminate high frequency noise. Then, the parameters of filtered EEG signal are extracted using LPC model. Finally, the samples are reconstructed by LPC coefficients and two classifiers, a Bayesian Linear discriminant analysis (BLDA, and b the υ-support vector machine (υ-SVM are applied in order to classify. The proposed algorithm performance is compared with fisher linear discriminant analysis (FLDA. Results show that the efficiency of our algorithm in improving classification accuracy and convergent speed to maximum accuracy are much better. As example at the proposed algorithms, respectively BLDA with LPC model and υ-SVM with LPC model with8 electrode configuration for subject S1 the total classification accuracy is improved as 9.4% and 1.7%. And also, subject 7 at BLDA and υ-SVM with LPC model algorithms (LPC+BLDA and LPC+ υ-SVM after block 11th converged to maximum accuracy but Fisher Linear Discriminant Analysis (FLDA algorithm did not converge to maximum accuracy (with the same configuration. So, it can be used as a promising tool in designing BCI systems.

  10. Active Metric Learning for Supervised Classification

    OpenAIRE

    Kumaran, Krishnan; Papageorgiou, Dimitri; Chang, Yutong; Li, Minhan; Takáč, Martin

    2018-01-01

    Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. We present mixed-integer optimization approaches to find optimal distance metrics that generalize the Mahalanobis metric extensively studied in the literature. Additionally, we generalize and improve upon leading methods by removing reliance on pre-designated "target neighbors," "triplets," and "similarity pairs." Another salient feature of our method is its ability to en...

  11. Classification of multiple sclerosis lesions using adaptive dictionary learning.

    Science.gov (United States)

    Deshpande, Hrishikesh; Maurel, Pierre; Barillot, Christian

    2015-12-01

    This paper presents a sparse representation and an adaptive dictionary learning based method for automated classification of multiple sclerosis (MS) lesions in magnetic resonance (MR) images. Manual delineation of MS lesions is a time-consuming task, requiring neuroradiology experts to analyze huge volume of MR data. This, in addition to the high intra- and inter-observer variability necessitates the requirement of automated MS lesion classification methods. Among many image representation models and classification methods that can be used for such purpose, we investigate the use of sparse modeling. In the recent years, sparse representation has evolved as a tool in modeling data using a few basis elements of an over-complete dictionary and has found applications in many image processing tasks including classification. We propose a supervised classification approach by learning dictionaries specific to the lesions and individual healthy brain tissues, which include white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF). The size of the dictionaries learned for each class plays a major role in data representation but it is an even more crucial element in the case of competitive classification. Our approach adapts the size of the dictionary for each class, depending on the complexity of the underlying data. The algorithm is validated using 52 multi-sequence MR images acquired from 13 MS patients. The results demonstrate the effectiveness of our approach in MS lesion classification. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Ixtetreco: recent extensions for the supervision of a power network; Ixtetreco: extensions recentes pour la supervision d`un reseau electrique

    Energy Technology Data Exchange (ETDEWEB)

    Despouys, O.

    1998-04-01

    A chronicle model describes a class of evolutions or possible behaviours for a given dynamical system. Ixtetreco uses a reified logics and time constraints to describe these chronicles and algorithms which allow to recognize all instances of these chronicles in a given stream of observations in entry. Ixtetreco is used in several applications for the supervision of complex processes. This paper presents the recent extensions of Ixtetreco which allow to use it in the supervision of power networks. (J.S.) 14 refs.

  13. Predicting disease risk using bootstrap ranking and classification algorithms.

    Directory of Open Access Journals (Sweden)

    Ohad Manor

    Full Text Available Genome-wide association studies (GWAS are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a "black box" in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction typically rank single nucleotide polymorphisms (SNPs by the p-value of their association with the disease, and use the top-associated SNPs as input to a classification algorithm. However, the predictive power of such methods is relatively poor. To improve the predictive power, we devised BootRank, which uses bootstrapping in order to obtain a robust prioritization of SNPs for use in predictive models. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC data and results in a more robust set of SNPs and a larger number of enriched pathways being associated with the different diseases. Finally, we show that combining BootRank with seven different classification algorithms improves performance compared to previous studies that used the WTCCC data. Notably, diseases for which BootRank results in the largest improvements were recently shown to have more heritability than previously thought, likely due to contributions from variants with low minimum allele frequency (MAF, suggesting that BootRank can be beneficial in cases where SNPs affecting the disease are poorly tagged or have low MAF. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.

  14. Statistical classification techniques in high energy physics (SDDT algorithm)

    International Nuclear Information System (INIS)

    Bouř, Petr; Kůs, Václav; Franc, Jiří

    2016-01-01

    We present our proposal of the supervised binary divergence decision tree with nested separation method based on the generalized linear models. A key insight we provide is the clustering driven only by a few selected physical variables. The proper selection consists of the variables achieving the maximal divergence measure between two different classes. Further, we apply our method to Monte Carlo simulations of physics processes corresponding to a data sample of top quark-antiquark pair candidate events in the lepton+jets decay channel. The data sample is produced in pp̅ collisions at √S = 1.96 TeV. It corresponds to an integrated luminosity of 9.7 fb"-"1 recorded with the D0 detector during Run II of the Fermilab Tevatron Collider. The efficiency of our algorithm achieves 90% AUC in separating signal from background. We also briefly deal with the modification of statistical tests applicable to weighted data sets in order to test homogeneity of the Monte Carlo simulations and measured data. The justification of these modified tests is proposed through the divergence tests. (paper)

  15. Supervised Cross-Modal Factor Analysis for Multiple Modal Data Classification

    KAUST Repository

    Wang, Jingbin; Zhou, Yihua; Duan, Kanghong; Wang, Jim Jing-Yan; Bensmail, Halima

    2015-01-01

    . In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor

  16. Improving the Interpretability of Classification Rules Discovered by an Ant Colony Algorithm: Extended Results

    OpenAIRE

    Otero, Fernando E.B.; Freitas, Alex A.

    2016-01-01

    The vast majority of Ant Colony Optimization (ACO) algorithms for inducing classification rules use an ACO-based procedure to create a rule in an one-at-a-time fashion. An improved search strategy has been proposed in the cAnt-MinerPB algorithm, where an ACO-based procedure is used to create a complete list of rules (ordered rules)-i.e., the ACO search is guided by the quality of a list of rules, instead of an individual rule. In this paper we propose an extension of the cAnt-MinerPB algorith...

  17. Algorithm for Optimizing Bipolar Interconnection Weights with Applications in Associative Memories and Multitarget Classification

    Science.gov (United States)

    Chang, Shengjiang; Wong, Kwok-Wo; Zhang, Wenwei; Zhang, Yanxin

    1999-08-01

    An algorithm for optimizing a bipolar interconnection weight matrix with the Hopfield network is proposed. The effectiveness of this algorithm is demonstrated by computer simulation and optical implementation. In the optical implementation of the neural network the interconnection weights are biased to yield a nonnegative weight matrix. Moreover, a threshold subchannel is added so that the system can realize, in real time, the bipolar weighted summation in a single channel. Preliminary experimental results obtained from the applications in associative memories and multitarget classification with rotation invariance are shown.

  18. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification

    Directory of Open Access Journals (Sweden)

    Yin Wang

    2016-01-01

    Full Text Available Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  19. Towards automatic lithological classification from remote sensing data using support vector machines

    Science.gov (United States)

    Yu, Le; Porwal, Alok; Holden, Eun-Jung; Dentith, Michael

    2010-05-01

    Remote sensing data can be effectively used as a mean to build geological knowledge for poorly mapped terrains. Spectral remote sensing data from space- and air-borne sensors have been widely used to geological mapping, especially in areas of high outcrop density in arid regions. However, spectral remote sensing information by itself cannot be efficiently used for a comprehensive lithological classification of an area due to (1) diagnostic spectral response of a rock within an image pixel is conditioned by several factors including the atmospheric effects, spectral and spatial resolution of the image, sub-pixel level heterogeneity in chemical and mineralogical composition of the rock, presence of soil and vegetation cover; (2) only surface information and is therefore highly sensitive to the noise due to weathering, soil cover, and vegetation. Consequently, for efficient lithological classification, spectral remote sensing data needs to be supplemented with other remote sensing datasets that provide geomorphological and subsurface geological information, such as digital topographic model (DEM) and aeromagnetic data. Each of the datasets contain significant information about geology that, in conjunction, can potentially be used for automated lithological classification using supervised machine learning algorithms. In this study, support vector machine (SVM), which is a kernel-based supervised learning method, was applied to automated lithological classification of a study area in northwestern India using remote sensing data, namely, ASTER, DEM and aeromagnetic data. Several digital image processing techniques were used to produce derivative datasets that contained enhanced information relevant to lithological discrimination. A series of SVMs (trained using k-folder cross-validation with grid search) were tested using various combinations of input datasets selected from among 50 datasets including the original 14 ASTER bands and 36 derivative datasets (including 14

  20. spa: Semi-Supervised Semi-Parametric Graph-Based Estimation in R

    Directory of Open Access Journals (Sweden)

    Mark Culp

    2011-04-01

    Full Text Available In this paper, we present an R package that combines feature-based (X data and graph-based (G data for prediction of the response Y . In this particular case, Y is observed for a subset of the observations (labeled and missing for the remainder (unlabeled. We examine an approach for fitting Y = Xβ + f(G where β is a coefficient vector and f is a function over the vertices of the graph. The procedure is semi-supervised in nature (trained on the labeled and unlabeled sets, requiring iterative algorithms for fitting this estimate. The package provides several key functions for fitting and evaluating an estimator of this type. The package is illustrated on a text analysis data set, where the observations are text documents (papers, the response is the category of paper (either applied or theoretical statistics, the X information is the name of the journal in which the paper resides, and the graph is a co-citation network, with each vertex an observation and each edge the number of times that the two papers cite a common paper. An application involving classification of protein location using a protein interaction graph and an application involving classification on a manifold with part of the feature data converted to a graph are also presented.

  1. Semi-Supervised Transductive Hot Spot Predictor Working on Multiple Assumptions

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-05-23

    Protein-protein interactions are critically dependent on just a few residues (“hot spots”) at the interfaces. Hot spots make a dominant contribution to the binding free energy and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there exists a need for accurate and reliable computational hot spot prediction methods. Compared to the supervised hot spot prediction algorithms, the semi-supervised prediction methods can take into consideration both the labeled and unlabeled residues in the dataset during the prediction procedure. The transductive support vector machine has been utilized for this task and demonstrated a better prediction performance. To the best of our knowledge, however, none of the transductive semi-supervised algorithms takes all the three semisupervised assumptions, i.e., smoothness, cluster and manifold assumptions, together into account during learning. In this paper, we propose a novel semi-supervised method for hot spot residue prediction, by considering all the three semisupervised assumptions using nonlinear models. Our algorithm, IterPropMCS, works in an iterative manner. In each iteration, the algorithm first propagates the labels of the labeled residues to the unlabeled ones, along the shortest path between them on a graph, assuming that they lie on a nonlinear manifold. Then it selects the most confident residues as the labeled ones for the next iteration, according to the cluster and smoothness criteria, which is implemented by a nonlinear density estimator. Experiments on a benchmark dataset, using protein structure-based features, demonstrate that our approach is effective in predicting hot spots and compares favorably to other available methods. The results also show that our method outperforms the state-of-the-art transductive learning methods.

  2. Efficient Fingercode Classification

    Science.gov (United States)

    Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang

    In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.

  3. A Fast Algorithm of Convex Hull Vertices Selection for Online Classification.

    Science.gov (United States)

    Ding, Shuguang; Nie, Xiangli; Qiao, Hong; Zhang, Bo

    2018-04-01

    Reducing samples through convex hull vertices selection (CHVS) within each class is an important and effective method for online classification problems, since the classifier can be trained rapidly with the selected samples. However, the process of CHVS is NP-hard. In this paper, we propose a fast algorithm to select the convex hull vertices, based on the convex hull decomposition and the property of projection. In the proposed algorithm, the quadratic minimization problem of computing the distance between a point and a convex hull is converted into a linear equation problem with a low computational complexity. When the data dimension is high, an approximate, instead of exact, convex hull is allowed to be selected by setting an appropriate termination condition in order to delete more nonimportant samples. In addition, the impact of outliers is also considered, and the proposed algorithm is improved by deleting the outliers in the initial procedure. Furthermore, a dimension convention technique via the kernel trick is used to deal with nonlinearly separable problems. An upper bound is theoretically proved for the difference between the support vector machines based on the approximate convex hull vertices selected and all the training samples. Experimental results on both synthetic and real data sets show the effectiveness and validity of the proposed algorithm.

  4. Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

    Science.gov (United States)

    Guo, Yiqing; Jia, Xiuping; Paull, David

    2018-06-01

    The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.

  5. Rice-planted area extraction by time series analysis of ENVISAT ASAR WS data using a phenology-based classification approach: A case study for Red River Delta, Vietnam

    Science.gov (United States)

    Nguyen, D.; Wagner, W.; Naeimi, V.; Cao, S.

    2015-04-01

    Recent studies have shown the potential of Synthetic Aperture Radars (SAR) for mapping of rice fields and some other vegetation types. For rice field classification, conventional classification techniques have been mostly used including manual threshold-based and supervised classification approaches. The challenge of the threshold-based approach is to find acceptable thresholds to be used for each individual SAR scene. Furthermore, the influence of local incidence angle on backscatter hinders using a single threshold for the entire scene. Similarly, the supervised classification approach requires different training samples for different output classes. In case of rice crop, supervised classification using temporal data requires different training datasets to perform classification procedure which might lead to inconsistent mapping results. In this study we present an automatic method to identify rice crop areas by extracting phonological parameters after performing an empirical regression-based normalization of the backscatter to a reference incidence angle. The method is evaluated in the Red River Delta (RRD), Vietnam using the time series of ENVISAT Advanced SAR (ASAR) Wide Swath (WS) mode data. The results of rice mapping algorithm compared to the reference data indicate the Completeness (User accuracy), Correctness (Producer accuracy) and Quality (Overall accuracies) of 88.8%, 92.5 % and 83.9 % respectively. The total area of the classified rice fields corresponds to the total rice cultivation areas given by the official statistics in Vietnam (R2  0.96). The results indicates that applying a phenology-based classification approach using backscatter time series in optimal incidence angle normalization can achieve high classification accuracies. In addition, the method is not only useful for large scale early mapping of rice fields in the Red River Delta using the current and future C-band Sentinal-1A&B backscatter data but also might be applied for other rice

  6. A COMPARISON OF HAZE REMOVAL ALGORITHMS AND THEIR IMPACTS ON CLASSIFICATION ACCURACY FOR LANDSAT IMAGERY

    Directory of Open Access Journals (Sweden)

    Yang Xiao

    Full Text Available The quality of Landsat images in humid areas is considerably degraded by haze in terms of their spectral response pattern, which limits the possibility of their application in using visible and near-infrared bands. A variety of haze removal algorithms have been proposed to correct these unsatisfactory illumination effects caused by the haze contamination. The purpose of this study was to illustrate the difference of two major algorithms (the improved homomorphic filtering (HF and the virtual cloud point (VCP for their effectiveness in solving spatially varying haze contamination, and to evaluate the impacts of haze removal on land cover classification. A case study with exploiting large quantities of Landsat TM images and climates (clear and haze in the most humid areas in China proved that these haze removal algorithms both perform well in processing Landsat images contaminated by haze. The outcome of the application of VCP appears to be more similar to the reference images compared to HF. Moreover, the Landsat image with VCP haze removal can improve the classification accuracy effectively in comparison to that without haze removal, especially in the cloudy contaminated area

  7. Impact of data transformation and preprocessing in supervised ...

    African Journals Online (AJOL)

    Impact of data transformation and preprocessing in supervised learning ... Nowadays, the ideas of integrating machine learning techniques in power system has ... The proposed algorithm used Python-based split train and k-fold model ...

  8. An Improved Cloud Classification Algorithm for China's FY-2C Multi-Channel Images Using Artificial Neural Network.

    Science.gov (United States)

    Liu, Yu; Xia, Jun; Shi, Chun-Xiang; Hong, Yang

    2009-01-01

    The crowning objective of this research was to identify a better cloud classification method to upgrade the current window-based clustering algorithm used operationally for China's first operational geostationary meteorological satellite FengYun-2C (FY-2C) data. First, the capabilities of six widely-used Artificial Neural Network (ANN) methods are analyzed, together with the comparison of two other methods: Principal Component Analysis (PCA) and a Support Vector Machine (SVM), using 2864 cloud samples manually collected by meteorologists in June, July, and August in 2007 from three FY-2C channel (IR1, 10.3-11.3 μm; IR2, 11.5-12.5 μm and WV 6.3-7.6 μm) imagery. The result shows that: (1) ANN approaches, in general, outperformed the PCA and the SVM given sufficient training samples and (2) among the six ANN networks, higher cloud classification accuracy was obtained with the Self-Organizing Map (SOM) and Probabilistic Neural Network (PNN). Second, to compare the ANN methods to the present FY-2C operational algorithm, this study implemented SOM, one of the best ANN network identified from this study, as an automated cloud classification system for the FY-2C multi-channel data. It shows that SOM method has improved the results greatly not only in pixel-level accuracy but also in cloud patch-level classification by more accurately identifying cloud types such as cumulonimbus, cirrus and clouds in high latitude. Findings of this study suggest that the ANN-based classifiers, in particular the SOM, can be potentially used as an improved Automated Cloud Classification Algorithm to upgrade the current window-based clustering method for the FY-2C operational products.

  9. Semi-supervised manifold learning with affinity regularization for Alzheimer's disease identification using positron emission tomography imaging.

    Science.gov (United States)

    Lu, Shen; Xia, Yong; Cai, Tom Weidong; Feng, David Dagan

    2015-01-01

    Dementia, Alzheimer's disease (AD) in particular is a global problem and big threat to the aging population. An image based computer-aided dementia diagnosis method is needed to providing doctors help during medical image examination. Many machine learning based dementia classification methods using medical imaging have been proposed and most of them achieve accurate results. However, most of these methods make use of supervised learning requiring fully labeled image dataset, which usually is not practical in real clinical environment. Using large amount of unlabeled images can improve the dementia classification performance. In this study we propose a new semi-supervised dementia classification method based on random manifold learning with affinity regularization. Three groups of spatial features are extracted from positron emission tomography (PET) images to construct an unsupervised random forest which is then used to regularize the manifold learning objective function. The proposed method, stat-of-the-art Laplacian support vector machine (LapSVM) and supervised SVM are applied to classify AD and normal controls (NC). The experiment results show that learning with unlabeled images indeed improves the classification performance. And our method outperforms LapSVM on the same dataset.

  10. Three-dimensional textural features of conventional MRI improve diagnostic classification of childhood brain tumours.

    Science.gov (United States)

    Fetit, Ahmed E; Novak, Jan; Peet, Andrew C; Arvanitits, Theodoros N

    2015-09-01

    The aim of this study was to assess the efficacy of three-dimensional texture analysis (3D TA) of conventional MR images for the classification of childhood brain tumours in a quantitative manner. The dataset comprised pre-contrast T1 - and T2-weighted MRI series obtained from 48 children diagnosed with brain tumours (medulloblastoma, pilocytic astrocytoma and ependymoma). 3D and 2D TA were carried out on the images using first-, second- and higher order statistical methods. Six supervised classification algorithms were trained with the most influential 3D and 2D textural features, and their performances in the classification of tumour types, using the two feature sets, were compared. Model validation was carried out using the leave-one-out cross-validation (LOOCV) approach, as well as stratified 10-fold cross-validation, in order to provide additional reassurance. McNemar's test was used to test the statistical significance of any improvements demonstrated by 3D-trained classifiers. Supervised learning models trained with 3D textural features showed improved classification performances to those trained with conventional 2D features. For instance, a neural network classifier showed 12% improvement in area under the receiver operator characteristics curve (AUC) and 19% in overall classification accuracy. These improvements were statistically significant for four of the tested classifiers, as per McNemar's tests. This study shows that 3D textural features extracted from conventional T1 - and T2-weighted images can improve the diagnostic classification of childhood brain tumours. Long-term benefits of accurate, yet non-invasive, diagnostic aids include a reduction in surgical procedures, improvement in surgical and therapy planning, and support of discussions with patients' families. It remains necessary, however, to extend the analysis to a multicentre cohort in order to assess the scalability of the techniques used. Copyright © 2015 John Wiley & Sons, Ltd.

  11. Safe semi-supervised learning based on weighted likelihood.

    Science.gov (United States)

    Kawakita, Masanori; Takeuchi, Jun'ichi

    2014-05-01

    We are interested in developing a safe semi-supervised learning that works in any situation. Semi-supervised learning postulates that n(') unlabeled data are available in addition to n labeled data. However, almost all of the previous semi-supervised methods require additional assumptions (not only unlabeled data) to make improvements on supervised learning. If such assumptions are not met, then the methods possibly perform worse than supervised learning. Sokolovska, Cappé, and Yvon (2008) proposed a semi-supervised method based on a weighted likelihood approach. They proved that this method asymptotically never performs worse than supervised learning (i.e., it is safe) without any assumption. Their method is attractive because it is easy to implement and is potentially general. Moreover, it is deeply related to a certain statistical paradox. However, the method of Sokolovska et al. (2008) assumes a very limited situation, i.e., classification, discrete covariates, n(')→∞ and a maximum likelihood estimator. In this paper, we extend their method by modifying the weight. We prove that our proposal is safe in a significantly wide range of situations as long as n≤n('). Further, we give a geometrical interpretation of the proof of safety through the relationship with the above-mentioned statistical paradox. Finally, we show that the above proposal is asymptotically safe even when n(')

  12. MAXIMUM LIKELIHOOD CLASSIFICATION OF HIGH-RESOLUTION SAR IMAGES IN URBAN AREA

    Directory of Open Access Journals (Sweden)

    M. Soheili Majd

    2012-09-01

    Full Text Available In this work, we propose a state-of-the-art on statistical analysis of polarimetric synthetic aperture radar (SAR data, through the modeling of several indices. We concentrate on eight ground classes which have been carried out from amplitudes, co-polarisation ratio, depolarization ratios, and other polarimetric descriptors. To study their different statistical behaviours, we consider Gauss, log- normal, Beta I, Weibull, Gamma, and Fisher statistical models and estimate their parameters using three methods: method of moments (MoM, maximum-likelihood (ML methodology, and log-cumulants method (MoML. Then, we study the opportunity of introducing this information in an adapted supervised classification scheme based on Maximum–Likelihood and Fisher pdf. Our work relies on an image of a suburban area, acquired by the airborne RAMSES SAR sensor of ONERA. The results prove the potential of such data to discriminate urban surfaces and show the usefulness of adapting any classical classification algorithm however classification maps present a persistant class confusion between flat gravelled or concrete roofs and trees.

  13. Modified CC-LR algorithm with three diverse feature sets for motor imagery tasks classification in EEG based brain-computer interface.

    Science.gov (United States)

    Siuly; Li, Yan; Paul Wen, Peng

    2014-03-01

    Motor imagery (MI) tasks classification provides an important basis for designing brain-computer interface (BCI) systems. If the MI tasks are reliably distinguished through identifying typical patterns in electroencephalography (EEG) data, a motor disabled people could communicate with a device by composing sequences of these mental states. In our earlier study, we developed a cross-correlation based logistic regression (CC-LR) algorithm for the classification of MI tasks for BCI applications, but its performance was not satisfactory. This study develops a modified version of the CC-LR algorithm exploring a suitable feature set that can improve the performance. The modified CC-LR algorithm uses the C3 electrode channel (in the international 10-20 system) as a reference channel for the cross-correlation (CC) technique and applies three diverse feature sets separately, as the input to the logistic regression (LR) classifier. The present algorithm investigates which feature set is the best to characterize the distribution of MI tasks based EEG data. This study also provides an insight into how to select a reference channel for the CC technique with EEG signals considering the anatomical structure of the human brain. The proposed algorithm is compared with eight of the most recently reported well-known methods including the BCI III Winner algorithm. The findings of this study indicate that the modified CC-LR algorithm has potential to improve the identification performance of MI tasks in BCI systems. The results demonstrate that the proposed technique provides a classification improvement over the existing methods tested. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  14. Discriminative Localization in CNNs for Weakly-Supervised Segmentation of Pulmonary Nodules.

    Science.gov (United States)

    Feng, Xinyang; Yang, Jie; Laine, Andrew F; Angelini, Elsa D

    2017-09-01

    Automated detection and segmentation of pulmonary nodules on lung computed tomography (CT) scans can facilitate early lung cancer diagnosis. Existing supervised approaches for automated nodule segmentation on CT scans require voxel-based annotations for training, which are labor- and time-consuming to obtain. In this work, we propose a weakly-supervised method that generates accurate voxel-level nodule segmentation trained with image-level labels only. By adapting a convolutional neural network (CNN) trained for image classification, our proposed method learns discriminative regions from the activation maps of convolution units at different scales, and identifies the true nodule location with a novel candidate-screening framework. Experimental results on the public LIDC-IDRI dataset demonstrate that, our weakly-supervised nodule segmentation framework achieves competitive performance compared to a fully-supervised CNN-based segmentation method.

  15. Graph-based semi-supervised learning

    CERN Document Server

    Subramanya, Amarnag

    2014-01-01

    While labeled data is expensive to prepare, ever increasing amounts of unlabeled data is becoming widely available. In order to adapt to this phenomenon, several semi-supervised learning (SSL) algorithms, which learn from labeled as well as unlabeled data, have been developed. In a separate line of work, researchers have started to realize that graphs provide a natural way to represent data in a variety of domains. Graph-based SSL algorithms, which bring together these two lines of work, have been shown to outperform the state-of-the-art in many applications in speech processing, computer visi

  16. Evaluation of a treatment-based classification algorithm for low back pain: a cross-sectional study.

    Science.gov (United States)

    Stanton, Tasha R; Fritz, Julie M; Hancock, Mark J; Latimer, Jane; Maher, Christopher G; Wand, Benedict M; Parent, Eric C

    2011-04-01

    Several studies have investigated criteria for classifying patients with low back pain (LBP) into treatment-based subgroups. A comprehensive algorithm was created to translate these criteria into a clinical decision-making guide. This study investigated the translation of the individual subgroup criteria into a comprehensive algorithm by studying the prevalence of patients meeting the criteria for each treatment subgroup and the reliability of the classification. This was a cross-sectional, observational study. Two hundred fifty patients with acute or subacute LBP were recruited from the United States and Australia to participate in the study. Trained physical therapists performed standardized assessments on all participants. The researchers used these findings to classify participants into subgroups. Thirty-one participants were reassessed to determine interrater reliability of the algorithm decision. Based on individual subgroup criteria, 25.2% (95% confidence interval [CI]=19.8%-30.6%) of the participants did not meet the criteria for any subgroup, 49.6% (95% CI=43.4%-55.8%) of the participants met the criteria for only one subgroup, and 25.2% (95% CI=19.8%-30.6%) of the participants met the criteria for more than one subgroup. The most common combination of subgroups was manipulation + specific exercise (68.4% of the participants who met the criteria for 2 subgroups). Reliability of the algorithm decision was moderate (kappa=0.52, 95% CI=0.27-0.77, percentage of agreement=67%). Due to a relatively small patient sample, reliability estimates are somewhat imprecise. These findings provide important clinical data to guide future research and revisions to the algorithm. The finding that 25% of the participants met the criteria for more than one subgroup has important implications for the sequencing of treatments in the algorithm. Likewise, the finding that 25% of the participants did not meet the criteria for any subgroup provides important information regarding

  17. 7 CFR 27.73 - Supervision of transfers of cotton.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Supervision of transfers of cotton. 27.73 Section 27.73 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards, Inspections, Marketing Practices), DEPARTMENT OF AGRICULTURE COMMODITY STANDARDS AND STANDARD CONTAINER REGULATIONS COTTON CLASSIFICATION UNDER...

  18. Active link selection for efficient semi-supervised community detection

    Science.gov (United States)

    Yang, Liang; Jin, Di; Wang, Xiao; Cao, Xiaochun

    2015-01-01

    Several semi-supervised community detection algorithms have been proposed recently to improve the performance of traditional topology-based methods. However, most of them focus on how to integrate supervised information with topology information; few of them pay attention to which information is critical for performance improvement. This leads to large amounts of demand for supervised information, which is expensive or difficult to obtain in most fields. For this problem we propose an active link selection framework, that is we actively select the most uncertain and informative links for human labeling for the efficient utilization of the supervised information. We also disconnect the most likely inter-community edges to further improve the efficiency. Our main idea is that, by connecting uncertain nodes to their community hubs and disconnecting the inter-community edges, one can sharpen the block structure of adjacency matrix more efficiently than randomly labeling links as the existing methods did. Experiments on both synthetic and real networks demonstrate that our new approach significantly outperforms the existing methods in terms of the efficiency of using supervised information. It needs ~13% of the supervised information to achieve a performance similar to that of the original semi-supervised approaches. PMID:25761385

  19. Evaluating the statistical performance of less applied algorithms in classification of worldview-3 imagery data in an urbanized landscape

    Science.gov (United States)

    Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa

    2018-03-01

    In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.

  20. Neural networks applied to the classification of remotely sensed data

    NARCIS (Netherlands)

    Mulder, Nanno; Spreeuwers, Lieuwe Jan

    1991-01-01

    A neural network with topology 2-8-8 is evaluated against the standard of supervised non-parametric maximum likelihood classification. The purpose of the evaluation is to compare the performance in terms of training speed and quality of classification. Classification is done on multispectral data

  1. Spectral Classification of Similar Materials using the Tetracorder Algorithm: The Calcite-Epidote-Chlorite Problem

    Science.gov (United States)

    Dalton, J. Brad; Bove, Dana; Mladinich, Carol; Clark, Roger; Rockwell, Barnaby; Swayze, Gregg; King, Trude; Church, Stanley

    2001-01-01

    Recent work on automated spectral classification algorithms has sought to distinguish ever-more similar materials. From modest beginnings separating shade, soil, rock and vegetation to ambitious attempts to discriminate mineral types and specific plant species, the trend seems to be toward using increasingly subtle spectral differences to perform the classification. Rule-based expert systems exploiting the underlying physics of spectroscopy such as the US Geological Society Tetracorder system are now taking advantage of the high spectral resolution and dimensionality of current imaging spectrometer designs to discriminate spectrally similar materials. The current paper details recent efforts to discriminate three minerals having absorptions centered at the same wavelength, with encouraging results.

  2. pySPACE - A Signal Processing and Classification Environment in Python

    Directory of Open Access Journals (Sweden)

    Mario Michael Krell

    2013-12-01

    Full Text Available In neuroscience large amounts of data are recorded to provide insights into cerebral information processing and function. The successful extraction of the relevant signals becomes more and more challenging due to increasing complexities in acquisition techniques and questions addressed. Here, automated signal processing and machine learning tools can help to process the data, e.g., to separate signal and noise. With the presented software pySPACE (http://pyspace.github.io/pyspace, signal processing algorithms can be compared and applied automatically on time series data, either with the aim of finding a suitable preprocessing, or of training supervised algorithms to classify the data. pySPACE originally has been built to process multi-sensor windowed time series data, like event-related potentials from the electroencephalogram (EEG. The software provides automated data handling, distributed processing, modular build-up of signal processing chains and tools for visualization and performance evaluation. Included in the software are various algorithms like temporal and spatial filters, feature generation and selection, classification algorithms and evaluation schemes. Further, interfaces to other signal processing tools are provided and, since pySPACE is a modular framework, it can be extended with new algorithms according to individual needs. In the presented work, the structural hierarchies are described. It is illustrated how users and developers can interface the software and execute offline and online modes. Configuration of pySPACE is realized with the YAML format, so that programming skills are not mandatory for usage. The concept of pySPACE is to have one comprehensive tool that can be used to perform complete signal processing and classification tasks. It further allows to define own algorithms, or to integrate and use already existing libraries.

  3. A Semi-Supervised Learning Algorithm for Predicting Four Types MiRNA-Disease Associations by Mutual Information in a Heterogeneous Network.

    Science.gov (United States)

    Zhang, Xiaotian; Yin, Jian; Zhang, Xu

    2018-03-02

    Increasing evidence suggests that dysregulation of microRNAs (miRNAs) may lead to a variety of diseases. Therefore, identifying disease-related miRNAs is a crucial problem. Currently, many computational approaches have been proposed to predict binary miRNA-disease associations. In this study, in order to predict underlying miRNA-disease association types, a semi-supervised model called the network-based label propagation algorithm is proposed to infer multiple types of miRNA-disease associations (NLPMMDA) by mutual information derived from the heterogeneous network. The NLPMMDA method integrates disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity information of miRNAs and diseases to construct a heterogeneous network. NLPMMDA is a semi-supervised model which does not require verified negative samples. Leave-one-out cross validation (LOOCV) was implemented for four known types of miRNA-disease associations and demonstrated the reliable performance of our method. Moreover, case studies of lung cancer and breast cancer confirmed effective performance of NLPMMDA to predict novel miRNA-disease associations and their association types.

  4. Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data

    Directory of Open Access Journals (Sweden)

    Connie Ko

    2016-08-01

    Full Text Available Recent research into improving the effectiveness of forest inventory management using airborne LiDAR data has focused on developing advanced theories in data analytics. Furthermore, supervised learning as a predictive model for classifying tree genera (and species, where possible has been gaining popularity in order to minimize this labor-intensive task. However, bottlenecks remain that hinder the immediate adoption of supervised learning methods. With supervised classification, training samples are required for learning the parameters that govern the performance of a classifier, yet the selection of training data is often subjective and the quality of such samples is critically important. For LiDAR scanning in forest environments, the quantification of data quality is somewhat abstract, normally referring to some metric related to the completeness of individual tree crowns; however, this is not an issue that has received much attention in the literature. Intuitively the choice of training samples having varying quality will affect classification accuracy. In this paper a Diversity Index (DI is proposed that characterizes the diversity of data quality (Qi among selected training samples required for constructing a classification model of tree genera. The training sample is diversified in terms of data quality as opposed to the number of samples per class. The diversified training sample allows the classifier to better learn the positive and negative instances and; therefore; has a higher classification accuracy in discriminating the “unknown” class samples from the “known” samples. Our algorithm is implemented within the Random Forests base classifiers with six derived geometric features from LiDAR data. The training sample contains three tree genera (pine; poplar; and maple and the validation samples contains four labels (pine; poplar; maple; and “unknown”. Classification accuracy improved from 72.8%; when training samples were

  5. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  6. Integration of spectral, spatial and morphometric data into lithological mapping: A comparison of different Machine Learning Algorithms in the Kurdistan Region, NE Iraq

    Science.gov (United States)

    Othman, Arsalan A.; Gloaguen, Richard

    2017-09-01

    Lithological mapping in mountainous regions is often impeded by limited accessibility due to relief. This study aims to evaluate (1) the performance of different supervised classification approaches using remote sensing data and (2) the use of additional information such as geomorphology. We exemplify the methodology in the Bardi-Zard area in NE Iraq, a part of the Zagros Fold - Thrust Belt, known for its chromite deposits. We highlighted the improvement of remote sensing geological classification by integrating geomorphic features and spatial information in the classification scheme. We performed a Maximum Likelihood (ML) classification method besides two Machine Learning Algorithms (MLA): Support Vector Machine (SVM) and Random Forest (RF) to allow the joint use of geomorphic features, Band Ratio (BR), Principal Component Analysis (PCA), spatial information (spatial coordinates) and multispectral data of the Advanced Space-borne Thermal Emission and Reflection radiometer (ASTER) satellite. The RF algorithm showed reliable results and discriminated serpentinite, talus and terrace deposits, red argillites with conglomerates and limestone, limy conglomerates and limestone conglomerates, tuffites interbedded with basic lavas, limestone and Metamorphosed limestone and reddish green shales. The best overall accuracy (∼80%) was achieved by Random Forest (RF) algorithms in the majority of the sixteen tested combination datasets.

  7. Comparison of Different Machine Learning Algorithms for Lithological Mapping Using Remote Sensing Data and Morphological Features: A Case Study in Kurdistan Region, NE Iraq

    Science.gov (United States)

    Othman, Arsalan; Gloaguen, Richard

    2015-04-01

    Topographic effects and complex vegetation cover hinder lithology classification in mountain regions based not only in field, but also in reflectance remote sensing data. The area of interest "Bardi-Zard" is located in the NE of Iraq. It is part of the Zagros orogenic belt, where seven lithological units outcrop and is known for its chromite deposit. The aim of this study is to compare three machine learning algorithms (MLAs): Maximum Likelihood (ML), Support Vector Machines (SVM), and Random Forest (RF) in the context of a supervised lithology classification task using Advanced Space-borne Thermal Emission and Reflection radiometer (ASTER) satellite, its derived, spatial information (spatial coordinates) and geomorphic data. We emphasize the enhancement in remote sensing lithological mapping accuracy that arises from the integration of geomorphic features and spatial information (spatial coordinates) in classifications. This study identifies that RF is better than ML and SVM algorithms in almost the sixteen combination datasets, which were tested. The overall accuracy of the best dataset combination with the RF map for the all seven classes reach ~80% and the producer and user's accuracies are ~73.91% and 76.09% respectively while the kappa coefficient is ~0.76. TPI is more effective with SVM algorithm than an RF algorithm. This paper demonstrates that adding geomorphic indices such as TPI and spatial information in the dataset increases the lithological classification accuracy.

  8. On using the Multiple Signal Classification algorithm to study microbaroms

    Science.gov (United States)

    Marcillo, O. E.; Blom, P. S.; Euler, G. G.

    2016-12-01

    Multiple Signal Classification (MUSIC) (Schmidt, 1986) is a well-known high-resolution algorithm used in array processing for parameter estimation. We report on the application of MUSIC to infrasonic array data in a study of the structure of microbaroms. Microbaroms can be globally observed and display energy centered around 0.2 Hz. Microbaroms are an infrasonic signal generated by the non-linear interaction of ocean surface waves that radiate into the ocean and atmosphere as well as the solid earth in the form of microseisms. Microbaroms sources are dynamic and, in many cases, distributed in space and moving in time. We assume that the microbarom energy detected by an infrasonic array is the result of multiple sources (with different back-azimuths) in the same bandwidth and apply the MUSIC algorithm accordingly to recover the back-azimuth and trace velocity of the individual components. Preliminary results show that the multiple component assumption in MUSIC allows one to resolve the fine structure in the microbarom band that can be related to multiple ocean surface phenomena.

  9. A semi-supervised segmentation algorithm as applied to k-means ...

    African Journals Online (AJOL)

    Segmentation (or partitioning) of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main streams of segmentation and examples exist where the application of these techniques improved the performance of ...

  10. Response monitoring using quantitative ultrasound methods and supervised dictionary learning in locally advanced breast cancer

    Science.gov (United States)

    Gangeh, Mehrdad J.; Fung, Brandon; Tadayyon, Hadi; Tran, William T.; Czarnota, Gregory J.

    2016-03-01

    A non-invasive computer-aided-theragnosis (CAT) system was developed for the early assessment of responses to neoadjuvant chemotherapy in patients with locally advanced breast cancer. The CAT system was based on quantitative ultrasound spectroscopy methods comprising several modules including feature extraction, a metric to measure the dissimilarity between "pre-" and "mid-treatment" scans, and a supervised learning algorithm for the classification of patients to responders/non-responders. One major requirement for the successful design of a high-performance CAT system is to accurately measure the changes in parametric maps before treatment onset and during the course of treatment. To this end, a unified framework based on Hilbert-Schmidt independence criterion (HSIC) was used for the design of feature extraction from parametric maps and the dissimilarity measure between the "pre-" and "mid-treatment" scans. For the feature extraction, HSIC was used to design a supervised dictionary learning (SDL) method by maximizing the dependency between the scans taken from "pre-" and "mid-treatment" with "dummy labels" given to the scans. For the dissimilarity measure, an HSIC-based metric was employed to effectively measure the changes in parametric maps as an indication of treatment effectiveness. The HSIC-based feature extraction and dissimilarity measure used a kernel function to nonlinearly transform input vectors into a higher dimensional feature space and computed the population means in the new space, where enhanced group separability was ideally obtained. The results of the classification using the developed CAT system indicated an improvement of performance compared to a CAT system with basic features using histogram of intensity.

  11. An Improved Cloud Classification Algorithm for China’s FY-2C Multi-Channel Images Using Artificial Neural Network

    Directory of Open Access Journals (Sweden)

    Chun-Xiang Shi

    2009-07-01

    Full Text Available The crowning objective of this research was to identify a better cloud classification method to upgrade the current window-based clustering algorithm used operationally for China’s first operational geostationary meteorological satellite FengYun-2C (FY-2C data. First, the capabilities of six widely-used Artificial Neural Network (ANN methods are analyzed, together with the comparison of two other methods: Principal Component Analysis (PCA and a Support Vector Machine (SVM, using 2864 cloud samples manually collected by meteorologists in June, July, and August in 2007 from three FY-2C channel (IR1, 10.3-11.3 μm; IR2, 11.5-12.5 μm and WV 6.3-7.6 μm imagery. The result shows that: (1 ANN approaches, in general, outperformed the PCA and the SVM given sufficient training samples and (2 among the six ANN networks, higher cloud classification accuracy was obtained with the Self-Organizing Map (SOM and Probabilistic Neural Network (PNN. Second, to compare the ANN methods to the present FY-2C operational algorithm, this study implemented SOM, one of the best ANN network identified from this study, as an automated cloud classification system for the FY-2C multi-channel data. It shows that SOM method has improved the results greatly not only in pixel-level accuracy but also in cloud patch-level classification by more accurately identifying cloud types such as cumulonimbus, cirrus and clouds in high latitude. Findings of this study suggest that the ANN-based classifiers, in particular the SOM, can be potentially used as an improved Automated Cloud Classification Algorithm to upgrade the current window-based clustering method for the FY-2C operational products.

  12. Supervised learning of probability distributions by neural networks

    Science.gov (United States)

    Baum, Eric B.; Wilczek, Frank

    1988-01-01

    Supervised learning algorithms for feedforward neural networks are investigated analytically. The back-propagation algorithm described by Werbos (1974), Parker (1985), and Rumelhart et al. (1986) is generalized by redefining the values of the input and output neurons as probabilities. The synaptic weights are then varied to follow gradients in the logarithm of likelihood rather than in the error. This modification is shown to provide a more rigorous theoretical basis for the algorithm and to permit more accurate predictions. A typical application involving a medical-diagnosis expert system is discussed.

  13. Exploiting machine learning algorithms for tree species classification in a semiarid woodland using RapidEye image

    CSIR Research Space (South Africa)

    Adelabu, S

    2013-11-01

    Full Text Available in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples. We performed...

  14. Supervised detection of exoplanets in high-contrast imaging sequences

    Science.gov (United States)

    Gomez Gonzalez, C. A.; Absil, O.; Van Droogenbroeck, M.

    2018-06-01

    Context. Post-processing algorithms play a key role in pushing the detection limits of high-contrast imaging (HCI) instruments. State-of-the-art image processing approaches for HCI enable the production of science-ready images relying on unsupervised learning techniques, such as low-rank approximations, for generating a model point spread function (PSF) and subtracting the residual starlight and speckle noise. Aims: In order to maximize the detection rate of HCI instruments and survey campaigns, advanced algorithms with higher sensitivities to faint companions are needed, especially for the speckle-dominated innermost region of the images. Methods: We propose a reformulation of the exoplanet detection task (for ADI sequences) that builds on well-established machine learning techniques to take HCI post-processing from an unsupervised to a supervised learning context. In this new framework, we present algorithmic solutions using two different discriminative models: SODIRF (random forests) and SODINN (neural networks). We test these algorithms on real ADI datasets from VLT/NACO and VLT/SPHERE HCI instruments. We then assess their performances by injecting fake companions and using receiver operating characteristic analysis. This is done in comparison with state-of-the-art ADI algorithms, such as ADI principal component analysis (ADI-PCA). Results: This study shows the improved sensitivity versus specificity trade-off of the proposed supervised detection approach. At the diffraction limit, SODINN improves the true positive rate by a factor ranging from 2 to 10 (depending on the dataset and angular separation) with respect to ADI-PCA when working at the same false-positive level. Conclusions: The proposed supervised detection framework outperforms state-of-the-art techniques in the task of discriminating planet signal from speckles. In addition, it offers the possibility of re-processing existing HCI databases to maximize their scientific return and potentially improve

  15. Building an Arabic Sentiment Lexicon Using Semi-supervised Learning

    Directory of Open Access Journals (Sweden)

    Fawaz H.H. Mahyoub

    2014-12-01

    Full Text Available Sentiment analysis is the process of determining a predefined sentiment from text written in a natural language with respect to the entity to which it is referring. A number of lexical resources are available to facilitate this task in English. One such resource is the SentiWordNet, which assigns sentiment scores to words found in the English WordNet. In this paper, we present an Arabic sentiment lexicon that assigns sentiment scores to the words found in the Arabic WordNet. Starting from a small seed list of positive and negative words, we used semi-supervised learning to propagate the scores in the Arabic WordNet by exploiting the synset relations. Our algorithm assigned a positive sentiment score to more than 800, a negative score to more than 600 and a neutral score to more than 6000 words in the Arabic WordNet. The lexicon was evaluated by incorporating it into a machine learning-based classifier. The experiments were conducted on several Arabic sentiment corpora, and we were able to achieve a 96% classification accuracy.

  16. Experimental analysis of the performance of machine learning algorithms in the classification of navigation accident records

    Directory of Open Access Journals (Sweden)

    REIS, M V. S. de A.

    2017-06-01

    Full Text Available This paper aims to evaluate the use of machine learning techniques in a database of marine accidents. We analyzed and evaluated the main causes and types of marine accidents in the Northern Fluminense region. For this, machine learning techniques were used. The study showed that the modeling can be done in a satisfactory manner using different configurations of classification algorithms, varying the activation functions and training parameters. The SMO (Sequential Minimal Optimization algorithm showed the best performance result.

  17. Multi-step EMG Classification Algorithm for Human-Computer Interaction

    Science.gov (United States)

    Ren, Peng; Barreto, Armando; Adjouadi, Malek

    A three-electrode human-computer interaction system, based on digital processing of the Electromyogram (EMG) signal, is presented. This system can effectively help disabled individuals paralyzed from the neck down to interact with computers or communicate with people through computers using point-and-click graphic interfaces. The three electrodes are placed on the right frontalis, the left temporalis and the right temporalis muscles in the head, respectively. The signal processing algorithm used translates the EMG signals during five kinds of facial movements (left jaw clenching, right jaw clenching, eyebrows up, eyebrows down, simultaneous left & right jaw clenching) into five corresponding types of cursor movements (left, right, up, down and left-click), to provide basic mouse control. The classification strategy is based on three principles: the EMG energy of one channel is typically larger than the others during one specific muscle contraction; the spectral characteristics of the EMG signals produced by the frontalis and temporalis muscles during different movements are different; the EMG signals from adjacent channels typically have correlated energy profiles. The algorithm is evaluated on 20 pre-recorded EMG signal sets, using Matlab simulations. The results show that this method provides improvements and is more robust than other previous approaches.

  18. Embedded vision equipment of industrial robot for inline detection of product errors by clustering–classification algorithms

    Directory of Open Access Journals (Sweden)

    Kamil Zidek

    2016-10-01

    Full Text Available The article deals with the design of embedded vision equipment of industrial robots for inline diagnosis of product error during manipulation process. The vision equipment can be attached to the end effector of robots or manipulators, and it provides an image snapshot of part surface before grasp, searches for error during manipulation, and separates products with error from the next operation of manufacturing. The new approach is a methodology based on machine teaching for the automated identification, localization, and diagnosis of systematic errors in products of high-volume production. To achieve this, we used two main data mining algorithms: clustering for accumulation of similar errors and classification methods for the prediction of any new error to proposed class. The presented methodology consists of three separate processing levels: image acquisition for fail parameterization, data clustering for categorizing errors to separate classes, and new pattern prediction with a proposed class model. We choose main representatives of clustering algorithms, for example, K-mean from quantization of vectors, fast library for approximate nearest neighbor from hierarchical clustering, and density-based spatial clustering of applications with noise from algorithm based on the density of the data. For machine learning, we selected six major algorithms of classification: support vector machines, normal Bayesian classifier, K-nearest neighbor, gradient boosted trees, random trees, and neural networks. The selected algorithms were compared for speed and reliability and tested on two platforms: desktop-based computer system and embedded system based on System on Chip (SoC with vision equipment.

  19. Feature selection and nearest centroid classification for protein mass spectrometry

    Directory of Open Access Journals (Sweden)

    Levner Ilya

    2005-03-01

    Full Text Available Abstract Background The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry. Results This study examines the performance of the nearest centroid classifier coupled with the following feature selection algorithms. Student-t test, Kolmogorov-Smirnov test, and the P-test are univariate statistics used for filter-based feature ranking. From the wrapper approaches we tested sequential forward selection and a modified version of sequential backward selection. Embedded approaches included shrunken nearest centroid and a novel version of boosting based feature selection we developed. In addition, we tested several dimensionality reduction approaches, namely principal component analysis and principal component analysis coupled with linear discriminant analysis. To fairly assess each algorithm, evaluation was done using stratified cross validation with an internal leave-one-out cross-validation loop for automated feature selection. Comprehensive experiments, conducted on five popular cancer data sets, revealed that the less advocated sequential forward selection and boosted feature selection algorithms produce the most consistent results across all data sets. In contrast, the state-of-the-art performance reported on isolated data sets for several of the studied algorithms, does not hold across all data sets. Conclusion This study tested a number of popular feature

  20. Weakly supervised visual dictionary learning by harnessing image attributes.

    Science.gov (United States)

    Gao, Yue; Ji, Rongrong; Liu, Wei; Dai, Qionghai; Hua, Gang

    2014-12-01

    Bag-of-features (BoFs) representation has been extensively applied to deal with various computer vision applications. To extract discriminative and descriptive BoF, one important step is to learn a good dictionary to minimize the quantization loss between local features and codewords. While most existing visual dictionary learning approaches are engaged with unsupervised feature quantization, the latest trend has turned to supervised learning by harnessing the semantic labels of images or regions. However, such labels are typically too expensive to acquire, which restricts the scalability of supervised dictionary learning approaches. In this paper, we propose to leverage image attributes to weakly supervise the dictionary learning procedure without requiring any actual labels. As a key contribution, our approach establishes a generative hidden Markov random field (HMRF), which models the quantized codewords as the observed states and the image attributes as the hidden states, respectively. Dictionary learning is then performed by supervised grouping the observed states, where the supervised information is stemmed from the hidden states of the HMRF. In such a way, the proposed dictionary learning approach incorporates the image attributes to learn a semantic-preserving BoF representation without any genuine supervision. Experiments in large-scale image retrieval and classification tasks corroborate that our approach significantly outperforms the state-of-the-art unsupervised dictionary learning approaches.

  1. Automatic structure classification of small proteins using random forest

    Directory of Open Access Journals (Sweden)

    Hirst Jonathan D

    2010-07-01

    Full Text Available Abstract Background Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs. An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP Class, Fold, Super-family or Family levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions The utility of random forest in classifying domains from the place-holder classes of SCOP to the true Class, Fold, Super-family or Family levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.

  2. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Science.gov (United States)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Benien, Parul; Ozcan, Aydogan

    2017-06-01

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of 0.8 cm2 and weighs only 180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved a

  3. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    KAUST Repository

    Ceylan Koydemir, Hatice

    2017-06-14

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved

  4. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    KAUST Repository

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Benien, Parul; Ozcan, Aydogan

    2017-01-01

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved

  5. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Directory of Open Access Journals (Sweden)

    Ceylan Koydemir Hatice

    2017-06-01

    Full Text Available Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond

  6. A semi-supervised approach using label propagation to support citation screening.

    Science.gov (United States)

    Kontonatsios, Georgios; Brockmeier, Austin J; Przybyła, Piotr; McNaught, John; Mu, Tingting; Goulermas, John Y; Ananiadou, Sophia

    2017-08-01

    Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

    Science.gov (United States)

    Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sanghyun

    2014-01-01

    The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

  8. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

    Directory of Open Access Journals (Sweden)

    Chihyun Park

    Full Text Available BACKGROUND: The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. RESULTS: In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. CONCLUSIONS: The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

  9. Spiking neural networks for handwritten digit recognition-Supervised learning and network optimization.

    Science.gov (United States)

    Kulkarni, Shruti R; Rajendran, Bipin

    2018-07-01

    We demonstrate supervised learning in Spiking Neural Networks (SNNs) for the problem of handwritten digit recognition using the spike triggered Normalized Approximate Descent (NormAD) algorithm. Our network that employs neurons operating at sparse biological spike rates below 300Hz achieves a classification accuracy of 98.17% on the MNIST test database with four times fewer parameters compared to the state-of-the-art. We present several insights from extensive numerical experiments regarding optimization of learning parameters and network configuration to improve its accuracy. We also describe a number of strategies to optimize the SNN for implementation in memory and energy constrained hardware, including approximations in computing the neuronal dynamics and reduced precision in storing the synaptic weights. Experiments reveal that even with 3-bit synaptic weights, the classification accuracy of the designed SNN does not degrade beyond 1% as compared to the floating-point baseline. Further, the proposed SNN, which is trained based on the precise spike timing information outperforms an equivalent non-spiking artificial neural network (ANN) trained using back propagation, especially at low bit precision. Thus, our study shows the potential for realizing efficient neuromorphic systems that use spike based information encoding and learning for real-world applications. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. CLASSIFICATION OF TRAFFIC RELATED SHORT TEXTS TO ANALYSE ROAD PROBLEMS IN URBAN AREAS

    Directory of Open Access Journals (Sweden)

    A. M. M. Saldana-Perez

    2017-09-01

    Full Text Available The Volunteer Geographic Information (VGI can be used to understand the urban dynamics. In the classification of traffic related short texts to analyze road problems in urban areas, a VGI data analysis is done over a social media’s publications, in order to classify traffic events at big cities that modify the movement of vehicles and people through the roads, such as car accidents, traffic and closures. The classification of traffic events described in short texts is done by applying a supervised machine learning algorithm. In the approach users are considered as sensors which describe their surroundings and provide their geographic position at the social network. The posts are treated by a text mining process and classified into five groups. Finally, the classified events are grouped in a data corpus and geo-visualized in the study area, to detect the places with more vehicular problems.

  11. Classification of Traffic Related Short Texts to Analyse Road Problems in Urban Areas

    Science.gov (United States)

    Saldana-Perez, A. M. M.; Moreno-Ibarra, M.; Tores-Ruiz, M.

    2017-09-01

    The Volunteer Geographic Information (VGI) can be used to understand the urban dynamics. In the classification of traffic related short texts to analyze road problems in urban areas, a VGI data analysis is done over a social media's publications, in order to classify traffic events at big cities that modify the movement of vehicles and people through the roads, such as car accidents, traffic and closures. The classification of traffic events described in short texts is done by applying a supervised machine learning algorithm. In the approach users are considered as sensors which describe their surroundings and provide their geographic position at the social network. The posts are treated by a text mining process and classified into five groups. Finally, the classified events are grouped in a data corpus and geo-visualized in the study area, to detect the places with more vehicular problems.

  12. Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours.

    Directory of Open Access Journals (Sweden)

    Monique A Ladds

    Full Text Available Constructing activity budgets for marine animals when they are at sea and cannot be directly observed is challenging, but recent advances in bio-logging technology offer solutions to this problem. Accelerometers can potentially identify a wide range of behaviours for animals based on unique patterns of acceleration. However, when analysing data derived from accelerometers, there are many statistical techniques available which when applied to different data sets produce different classification accuracies. We investigated a selection of supervised machine learning methods for interpreting behavioural data from captive otariids (fur seals and sea lions. We conducted controlled experiments with 12 seals, where their behaviours were filmed while they were wearing 3-axis accelerometers. From video we identified 26 behaviours that could be grouped into one of four categories (foraging, resting, travelling and grooming representing key behaviour states for wild seals. We used data from 10 seals to train four predictive classification models: stochastic gradient boosting (GBM, random forests, support vector machine using four different kernels and a baseline model: penalised logistic regression. We then took the best parameters from each model and cross-validated the results on the two seals unseen so far. We also investigated the influence of feature statistics (describing some characteristic of the seal, testing the models both with and without these. Cross-validation accuracies were lower than training accuracy, but the SVM with a polynomial kernel was still able to classify seal behaviour with high accuracy (>70%. Adding feature statistics improved accuracies across all models tested. Most categories of behaviour -resting, grooming and feeding-were all predicted with reasonable accuracy (52-81% by the SVM while travelling was poorly categorised (31-41%. These results show that model selection is important when classifying behaviour and that by using

  13. Robust head pose estimation via supervised manifold learning.

    Science.gov (United States)

    Wang, Chao; Song, Xubo

    2014-05-01

    Head poses can be automatically estimated using manifold learning algorithms, with the assumption that with the pose being the only variable, the face images should lie in a smooth and low-dimensional manifold. However, this estimation approach is challenging due to other appearance variations related to identity, head location in image, background clutter, facial expression, and illumination. To address the problem, we propose to incorporate supervised information (pose angles of training samples) into the process of manifold learning. The process has three stages: neighborhood construction, graph weight computation and projection learning. For the first two stages, we redefine inter-point distance for neighborhood construction as well as graph weight by constraining them with the pose angle information. For Stage 3, we present a supervised neighborhood-based linear feature transformation algorithm to keep the data points with similar pose angles close together but the data points with dissimilar pose angles far apart. The experimental results show that our method has higher estimation accuracy than the other state-of-art algorithms and is robust to identity and illumination variations. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Medical X-ray Image Hierarchical Classification Using a Merging and Splitting Scheme in Feature Space.

    Science.gov (United States)

    Fesharaki, Nooshin Jafari; Pourghassem, Hossein

    2013-07-01

    Due to the daily mass production and the widespread variation of medical X-ray images, it is necessary to classify these for searching and retrieving proposes, especially for content-based medical image retrieval systems. In this paper, a medical X-ray image hierarchical classification structure based on a novel merging and splitting scheme and using shape and texture features is proposed. In the first level of the proposed structure, to improve the classification performance, similar classes with regard to shape contents are grouped based on merging measures and shape features into the general overlapped classes. In the next levels of this structure, the overlapped classes split in smaller classes based on the classification performance of combination of shape and texture features or texture features only. Ultimately, in the last levels, this procedure is also continued forming all the classes, separately. Moreover, to optimize the feature vector in the proposed structure, we use orthogonal forward selection algorithm according to Mahalanobis class separability measure as a feature selection and reduction algorithm. In other words, according to the complexity and inter-class distance of each class, a sub-space of the feature space is selected in each level and then a supervised merging and splitting scheme is applied to form the hierarchical classification. The proposed structure is evaluated on a database consisting of 2158 medical X-ray images of 18 classes (IMAGECLEF 2005 database) and accuracy rate of 93.6% in the last level of the hierarchical structure for an 18-class classification problem is obtained.

  15. Improving the Interpretability of Classification Rules Discovered by an Ant Colony Algorithm: Extended Results.

    Science.gov (United States)

    Otero, Fernando E B; Freitas, Alex A

    2016-01-01

    Most ant colony optimization (ACO) algorithms for inducing classification rules use a ACO-based procedure to create a rule in a one-at-a-time fashion. An improved search strategy has been proposed in the cAnt-Miner[Formula: see text] algorithm, where an ACO-based procedure is used to create a complete list of rules (ordered rules), i.e., the ACO search is guided by the quality of a list of rules instead of an individual rule. In this paper we propose an extension of the cAnt-Miner[Formula: see text] algorithm to discover a set of rules (unordered rules). The main motivations for this work are to improve the interpretation of individual rules by discovering a set of rules and to evaluate the impact on the predictive accuracy of the algorithm. We also propose a new measure to evaluate the interpretability of the discovered rules to mitigate the fact that the commonly used model size measure ignores how the rules are used to make a class prediction. Comparisons with state-of-the-art rule induction algorithms, support vector machines, and the cAnt-Miner[Formula: see text] producing ordered rules are also presented.

  16. A comparison of graph- and kernel-based -omics data integration algorithms for classifying complex traits.

    Science.gov (United States)

    Yan, Kang K; Zhao, Hongyu; Pang, Herbert

    2017-12-06

    High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. In this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study. In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally. The empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources.

  17. A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction

    Science.gov (United States)

    Benvenuto, Federico; Piana, Michele; Campi, Cristina; Massone, Anna Maria

    2018-01-01

    This paper introduces a novel method for flare forecasting, combining prediction accuracy with the ability to identify the most relevant predictive variables. This result is obtained by means of a two-step approach: first, a supervised regularization method for regression, namely, LASSO is applied, where a sparsity-enhancing penalty term allows the identification of the significance with which each data feature contributes to the prediction; then, an unsupervised fuzzy clustering technique for classification, namely, Fuzzy C-Means, is applied, where the regression outcome is partitioned through the minimization of a cost function and without focusing on the optimization of a specific skill score. This approach is therefore hybrid, since it combines supervised and unsupervised learning; realizes classification in an automatic, skill-score-independent way; and provides effective prediction performances even in the case of imbalanced data sets. Its prediction power is verified against NOAA Space Weather Prediction Center data, using as a test set, data in the range between 1996 August and 2010 December and as training set, data in the range between 1988 December and 1996 June. To validate the method, we computed several skill scores typically utilized in flare prediction and compared the values provided by the hybrid approach with the ones provided by several standard (non-hybrid) machine learning methods. The results showed that the hybrid approach performs classification better than all other supervised methods and with an effectiveness comparable to the one of clustering methods; but, in addition, it provides a reliable ranking of the weights with which the data properties contribute to the forecast.

  18. Classification as clustering: a Pareto cooperative-competitive GP approach.

    Science.gov (United States)

    McIntyre, Andrew R; Heywood, Malcolm I

    2011-01-01

    Intuitively population based algorithms such as genetic programming provide a natural environment for supporting solutions that learn to decompose the overall task between multiple individuals, or a team. This work presents a framework for evolving teams without recourse to prespecifying the number of cooperating individuals. To do so, each individual evolves a mapping to a distribution of outcomes that, following clustering, establishes the parameterization of a (Gaussian) local membership function. This gives individuals the opportunity to represent subsets of tasks, where the overall task is that of classification under the supervised learning domain. Thus, rather than each team member representing an entire class, individuals are free to identify unique subsets of the overall classification task. The framework is supported by techniques from evolutionary multiobjective optimization (EMO) and Pareto competitive coevolution. EMO establishes the basis for encouraging individuals to provide accurate yet nonoverlaping behaviors; whereas competitive coevolution provides the mechanism for scaling to potentially large unbalanced datasets. Benchmarking is performed against recent examples of nonlinear SVM classifiers over 12 UCI datasets with between 150 and 200,000 training instances. Solutions from the proposed coevolutionary multiobjective GP framework appear to provide a good balance between classification performance and model complexity, especially as the dataset instance count increases.

  19. Study of Image Analysis Algorithms for Segmentation, Feature Extraction and Classification of Cells

    Directory of Open Access Journals (Sweden)

    Margarita Gamarra

    2017-08-01

    Full Text Available Recent advances in microcopy and improvements in image processing algorithms have allowed the development of computer-assisted analytical approaches in cell identification. Several applications could be mentioned in this field: Cellular phenotype identification, disease detection and treatment, identifying virus entry in cells and virus classification; these applications could help to complement the opinion of medical experts. Although many surveys have been presented in medical image analysis, they focus mainly in tissues and organs and none of the surveys about image cells consider an analysis following the stages in the typical image processing: Segmentation, feature extraction and classification. The goal of this study is to provide comprehensive and critical analyses about the trends in each stage of cell image processing. In this paper, we present a literature survey about cell identification using different image processing techniques.

  20. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback.

    Science.gov (United States)

    Yang, Yi; Nie, Feiping; Xu, Dong; Luo, Jiebo; Zhuang, Yueting; Pan, Yunhe

    2012-04-01

    We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.

  1. Comparison Of Semi-Automatic And Automatic Slick Detection Algorithms For Jiyeh Power Station Oil Spill, Lebanon

    Science.gov (United States)

    Osmanoglu, B.; Ozkan, C.; Sunar, F.

    2013-10-01

    After air strikes on July 14 and 15, 2006 the Jiyeh Power Station started leaking oil into the eastern Mediterranean Sea. The power station is located about 30 km south of Beirut and the slick covered about 170 km of coastline threatening the neighboring countries Turkey and Cyprus. Due to the ongoing conflict between Israel and Lebanon, cleaning efforts could not start immediately resulting in 12 000 to 15 000 tons of fuel oil leaking into the sea. In this paper we compare results from automatic and semi-automatic slick detection algorithms. The automatic detection method combines the probabilities calculated for each pixel from each image to obtain a joint probability, minimizing the adverse effects of atmosphere on oil spill detection. The method can readily utilize X-, C- and L-band data where available. Furthermore wind and wave speed observations can be used for a more accurate analysis. For this study, we utilize Envisat ASAR ScanSAR data. A probability map is generated based on the radar backscatter, effect of wind and dampening value. The semi-automatic algorithm is based on supervised classification. As a classifier, Artificial Neural Network Multilayer Perceptron (ANN MLP) classifier is used since it is more flexible and efficient than conventional maximum likelihood classifier for multisource and multi-temporal data. The learning algorithm for ANN MLP is chosen as the Levenberg-Marquardt (LM). Training and test data for supervised classification are composed from the textural information created from SAR images. This approach is semiautomatic because tuning the parameters of classifier and composing training data need a human interaction. We point out the similarities and differences between the two methods and their results as well as underlining their advantages and disadvantages. Due to the lack of ground truth data, we compare obtained results to each other, as well as other published oil slick area assessments.

  2. Classification of large-sized hyperspectral imagery using fast machine learning algorithms

    Science.gov (United States)

    Xia, Junshi; Yokoya, Naoto; Iwasaki, Akira

    2017-07-01

    We present a framework of fast machine learning algorithms in the context of large-sized hyperspectral images classification from the theoretical to a practical viewpoint. In particular, we assess the performance of random forest (RF), rotation forest (RoF), and extreme learning machine (ELM) and the ensembles of RF and ELM. These classifiers are applied to two large-sized hyperspectral images and compared to the support vector machines. To give the quantitative analysis, we pay attention to comparing these methods when working with high input dimensions and a limited/sufficient training set. Moreover, other important issues such as the computational cost and robustness against the noise are also discussed.

  3. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Ricardo Andres Pizarro

    2016-12-01

    Full Text Available High-resolution three-dimensional magnetic resonance imaging (3D-MRI is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM algorithm in the quality assessment of structural brain images, using global and region of interest (ROI automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  4. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm.

    Science.gov (United States)

    Pizarro, Ricardo A; Cheng, Xi; Barnett, Alan; Lemaitre, Herve; Verchinski, Beth A; Goldman, Aaron L; Xiao, Ena; Luo, Qian; Berman, Karen F; Callicott, Joseph H; Weinberger, Daniel R; Mattay, Venkata S

    2016-01-01

    High-resolution three-dimensional magnetic resonance imaging (3D-MRI) is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM) algorithm in the quality assessment of structural brain images, using global and region of interest (ROI) automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy) of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  5. Examining the Capability of Supervised Machine Learning Classifiers in Extracting Flooded Areas from Landsat TM Imagery: A Case Study from a Mediterranean Flood

    Directory of Open Access Journals (Sweden)

    Gareth Ireland

    2015-03-01

    Full Text Available This study explored the capability of Support Vector Machines (SVMs and regularised kernel Fisher’s discriminant analysis (rkFDA machine learning supervised classifiers in extracting flooded area from optical Landsat TM imagery. The ability of both techniques was evaluated using a case study of a riverine flood event in 2010 in a heterogeneous Mediterranean region, for which TM imagery acquired shortly after the flood event was available. For the two classifiers, both linear and non-linear (kernel versions were utilised in their implementation. The ability of the different classifiers to map the flooded area extent was assessed on the basis of classification accuracy assessment metrics. Results showed that rkFDA outperformed SVMs in terms of accurate flooded pixels detection, also producing fewer missed detections of the flooded area. Yet, SVMs showed less false flooded area detections. Overall, the non-linear rkFDA classification method was the more accurate of the two techniques (OA = 96.23%, K = 0.877. Both methods outperformed the standard Normalized Difference Water Index (NDWI thresholding (OA = 94.63, K = 0.818 by roughly 0.06 K points. Although overall accuracy results for the rkFDA and SVMs classifications only showed a somewhat minor improvement on the overall accuracy exhibited by the NDWI thresholding, notably both classifiers considerably outperformed the thresholding algorithm in other specific accuracy measures (e.g. producer accuracy for the “not flooded” class was ~10.5% less accurate for the NDWI thresholding algorithm in comparison to the classifiers, and average per-class accuracy was ~5% less accurate than the machine learning models. This study provides evidence of the successful application of supervised machine learning for classifying flooded areas in Landsat imagery, where few studies so far exist in this direction. Considering that Landsat data is open access and has global coverage, the results of this study

  6. CLAss-Specific Subspace Kernel Representations and Adaptive Margin Slack Minimization for Large Scale Classification.

    Science.gov (United States)

    Yu, Yinan; Diamantaras, Konstantinos I; McKelvey, Tomas; Kung, Sun-Yuan

    2018-02-01

    In kernel-based classification models, given limited computational power and storage capacity, operations over the full kernel matrix becomes prohibitive. In this paper, we propose a new supervised learning framework using kernel models for sequential data processing. The framework is based on two components that both aim at enhancing the classification capability with a subset selection scheme. The first part is a subspace projection technique in the reproducing kernel Hilbert space using a CLAss-specific Subspace Kernel representation for kernel approximation. In the second part, we propose a novel structural risk minimization algorithm called the adaptive margin slack minimization to iteratively improve the classification accuracy by an adaptive data selection. We motivate each part separately, and then integrate them into learning frameworks for large scale data. We propose two such frameworks: the memory efficient sequential processing for sequential data processing and the parallelized sequential processing for distributed computing with sequential data acquisition. We test our methods on several benchmark data sets and compared with the state-of-the-art techniques to verify the validity of the proposed techniques.

  7. Correlation coefficient based supervised locally linear embedding for pulmonary nodule recognition.

    Science.gov (United States)

    Wu, Panpan; Xia, Kewen; Yu, Hengyong

    2016-11-01

    Dimensionality reduction techniques are developed to suppress the negative effects of high dimensional feature space of lung CT images on classification performance in computer aided detection (CAD) systems for pulmonary nodule detection. An improved supervised locally linear embedding (SLLE) algorithm is proposed based on the concept of correlation coefficient. The Spearman's rank correlation coefficient is introduced to adjust the distance metric in the SLLE algorithm to ensure that more suitable neighborhood points could be identified, and thus to enhance the discriminating power of embedded data. The proposed Spearman's rank correlation coefficient based SLLE (SC(2)SLLE) is implemented and validated in our pilot CAD system using a clinical dataset collected from the publicly available lung image database consortium and image database resource initiative (LICD-IDRI). Particularly, a representative CAD system for solitary pulmonary nodule detection is designed and implemented. After a sequential medical image processing steps, 64 nodules and 140 non-nodules are extracted, and 34 representative features are calculated. The SC(2)SLLE, as well as SLLE and LLE algorithm, are applied to reduce the dimensionality. Several quantitative measurements are also used to evaluate and compare the performances. Using a 5-fold cross-validation methodology, the proposed algorithm achieves 87.65% accuracy, 79.23% sensitivity, 91.43% specificity, and 8.57% false positive rate, on average. Experimental results indicate that the proposed algorithm outperforms the original locally linear embedding and SLLE coupled with the support vector machine (SVM) classifier. Based on the preliminary results from a limited number of nodules in our dataset, this study demonstrates the great potential to improve the performance of a CAD system for nodule detection using the proposed SC(2)SLLE. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  8. DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages.

    Science.gov (United States)

    He, Lifang; Kong, Xiangnan; Yu, Philip S; Ragin, Ann B; Hao, Zhifeng; Yang, Xiaowei

    With advances in data collection technologies, tensor data is assuming increasing prominence in many applications and the problem of supervised tensor learning has emerged as a topic of critical significance in the data mining and machine learning community. Conventional methods for supervised tensor learning mainly focus on learning kernels by flattening the tensor into vectors or matrices, however structural information within the tensors will be lost. In this paper, we introduce a new scheme to design structure-preserving kernels for supervised tensor learning. Specifically, we demonstrate how to leverage the naturally available structure within the tensorial representation to encode prior knowledge in the kernel. We proposed a tensor kernel that can preserve tensor structures based upon dual-tensorial mapping. The dual-tensorial mapping function can map each tensor instance in the input space to another tensor in the feature space while preserving the tensorial structure. Theoretically, our approach is an extension of the conventional kernels in the vector space to tensor space. We applied our novel kernel in conjunction with SVM to real-world tensor classification problems including brain fMRI classification for three different diseases ( i.e ., Alzheimer's disease, ADHD and brain damage by HIV). Extensive empirical studies demonstrate that our proposed approach can effectively boost tensor classification performances, particularly with small sample sizes.

  9. Transfer Kernel Common Spatial Patterns for Motor Imagery Brain-Computer Interface Classification

    Science.gov (United States)

    Dai, Mengxi; Liu, Shucong; Zhang, Pengju

    2018-01-01

    Motor-imagery-based brain-computer interfaces (BCIs) commonly use the common spatial pattern (CSP) as preprocessing step before classification. The CSP method is a supervised algorithm. Therefore a lot of time-consuming training data is needed to build the model. To address this issue, one promising approach is transfer learning, which generalizes a learning model can extract discriminative information from other subjects for target classification task. To this end, we propose a transfer kernel CSP (TKCSP) approach to learn a domain-invariant kernel by directly matching distributions of source subjects and target subjects. The dataset IVa of BCI Competition III is used to demonstrate the validity by our proposed methods. In the experiment, we compare the classification performance of the TKCSP against CSP, CSP for subject-to-subject transfer (CSP SJ-to-SJ), regularizing CSP (RCSP), stationary subspace CSP (ssCSP), multitask CSP (mtCSP), and the combined mtCSP and ssCSP (ss + mtCSP) method. The results indicate that the superior mean classification performance of TKCSP can achieve 81.14%, especially in case of source subjects with fewer number of training samples. Comprehensive experimental evidence on the dataset verifies the effectiveness and efficiency of the proposed TKCSP approach over several state-of-the-art methods. PMID:29743934

  10. Comparison of classification algorithms for various methods of preprocessing radar images of the MSTAR base

    Science.gov (United States)

    Borodinov, A. A.; Myasnikov, V. V.

    2018-04-01

    The present work is devoted to comparing the accuracy of the known qualification algorithms in the task of recognizing local objects on radar images for various image preprocessing methods. Preprocessing involves speckle noise filtering and normalization of the object orientation in the image by the method of image moments and by a method based on the Hough transform. In comparison, the following classification algorithms are used: Decision tree; Support vector machine, AdaBoost, Random forest. The principal component analysis is used to reduce the dimension. The research is carried out on the objects from the base of radar images MSTAR. The paper presents the results of the conducted studies.

  11. Classification of Ultrasonic NDE Signals Using the Expectation Maximization (EM) and Least Mean Square (LMS) Algorithms

    International Nuclear Information System (INIS)

    Kim, Dae Won

    2005-01-01

    Ultrasonic inspection methods are widely used for detecting flaws in materials. The signal analysis step plays a crucial part in the data interpretation process. A number of signal processing methods have been proposed to classify ultrasonic flaw signals. One of the more popular methods involves the extraction of an appropriate set of features followed by the use of a neural network for the classification of the signals in the feature spare. This paper describes an alternative approach which uses the least mean square (LMS) method and exportation maximization (EM) algorithm with the model based deconvolution which is employed for classifying nondestructive evaluation (NDE) signals from steam generator tubes in a nuclear power plant. The signals due to cracks and deposits are not significantly different. These signals must be discriminated to prevent from happening a huge disaster such as contamination of water or explosion. A model based deconvolution has been described to facilitate comparison of classification results. The method uses the space alternating generalized expectation maximiBation (SAGE) algorithm ill conjunction with the Newton-Raphson method which uses the Hessian parameter resulting in fast convergence to estimate the time of flight and the distance between the tube wall and the ultrasonic sensor. Results using these schemes for the classification of ultrasonic signals from cracks and deposits within steam generator tubes are presented and showed a reasonable performances

  12. Combination of laser-induced breakdown spectroscopy and Raman spectroscopy for multivariate classification of bacteria

    Science.gov (United States)

    Prochazka, D.; Mazura, M.; Samek, O.; Rebrošová, K.; Pořízka, P.; Klus, J.; Prochazková, P.; Novotný, J.; Novotný, K.; Kaiser, J.

    2018-01-01

    In this work, we investigate the impact of data provided by complementary laser-based spectroscopic methods on multivariate classification accuracy. Discrimination and classification of five Staphylococcus bacterial strains and one strain of Escherichia coli is presented. The technique that we used for measurements is a combination of Raman spectroscopy and Laser-Induced Breakdown Spectroscopy (LIBS). Obtained spectroscopic data were then processed using Multivariate Data Analysis algorithms. Principal Components Analysis (PCA) was selected as the most suitable technique for visualization of bacterial strains data. To classify the bacterial strains, we used Neural Networks, namely a supervised version of Kohonen's self-organizing maps (SOM). We were processing results in three different ways - separately from LIBS measurements, from Raman measurements, and we also merged data from both mentioned methods. The three types of results were then compared. By applying the PCA to Raman spectroscopy data, we observed that two bacterial strains were fully distinguished from the rest of the data set. In the case of LIBS data, three bacterial strains were fully discriminated. Using a combination of data from both methods, we achieved the complete discrimination of all bacterial strains. All the data were classified with a high success rate using SOM algorithm. The most accurate classification was obtained using a combination of data from both techniques. The classification accuracy varied, depending on specific samples and techniques. As for LIBS, the classification accuracy ranged from 45% to 100%, as for Raman Spectroscopy from 50% to 100% and in case of merged data, all samples were classified correctly. Based on the results of the experiments presented in this work, we can assume that the combination of Raman spectroscopy and LIBS significantly enhances discrimination and classification accuracy of bacterial species and strains. The reason is the complementarity in

  13. Voice based gender classification using machine learning

    Science.gov (United States)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  14. Hybrid Optimization of Object-Based Classification in High-Resolution Images Using Continous ANT Colony Algorithm with Emphasis on Building Detection

    Science.gov (United States)

    Tamimi, E.; Ebadi, H.; Kiani, A.

    2017-09-01

    Automatic building detection from High Spatial Resolution (HSR) images is one of the most important issues in Remote Sensing (RS). Due to the limited number of spectral bands in HSR images, using other features will lead to improve accuracy. By adding these features, the presence probability of dependent features will be increased, which leads to accuracy reduction. In addition, some parameters should be determined in Support Vector Machine (SVM) classification. Therefore, it is necessary to simultaneously determine classification parameters and select independent features according to image type. Optimization algorithm is an efficient method to solve this problem. On the other hand, pixel-based classification faces several challenges such as producing salt-paper results and high computational time in high dimensional data. Hence, in this paper, a novel method is proposed to optimize object-based SVM classification by applying continuous Ant Colony Optimization (ACO) algorithm. The advantages of the proposed method are relatively high automation level, independency of image scene and type, post processing reduction for building edge reconstruction and accuracy improvement. The proposed method was evaluated by pixel-based SVM and Random Forest (RF) classification in terms of accuracy. In comparison with optimized pixel-based SVM classification, the results showed that the proposed method improved quality factor and overall accuracy by 17% and 10%, respectively. Also, in the proposed method, Kappa coefficient was improved by 6% rather than RF classification. Time processing of the proposed method was relatively low because of unit of image analysis (image object). These showed the superiority of the proposed method in terms of time and accuracy.

  15. Supervised machine learning algorithms to diagnose stress for vehicle drivers based on physiological sensor signals.

    Science.gov (United States)

    Barua, Shaibal; Begum, Shahina; Ahmed, Mobyen Uddin

    2015-01-01

    Machine learning algorithms play an important role in computer science research. Recent advancement in sensor data collection in clinical sciences lead to a complex, heterogeneous data processing, and analysis for patient diagnosis and prognosis. Diagnosis and treatment of patients based on manual analysis of these sensor data are difficult and time consuming. Therefore, development of Knowledge-based systems to support clinicians in decision-making is important. However, it is necessary to perform experimental work to compare performances of different machine learning methods to help to select appropriate method for a specific characteristic of data sets. This paper compares classification performance of three popular machine learning methods i.e., case-based reasoning, neutral networks and support vector machine to diagnose stress of vehicle drivers using finger temperature and heart rate variability. The experimental results show that case-based reasoning outperforms other two methods in terms of classification accuracy. Case-based reasoning has achieved 80% and 86% accuracy to classify stress using finger temperature and heart rate variability. On contrary, both neural network and support vector machine have achieved less than 80% accuracy by using both physiological signals.

  16. Generalized SMO algorithm for SVM-based multitask learning.

    Science.gov (United States)

    Cai, Feng; Cherkassky, Vladimir

    2012-06-01

    Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, training data can be naturally separated into several groups, and incorporating this group information into learning may improve generalization. Recently, Vapnik proposed a general approach to formalizing such problems, known as "learning with structured data" and its support vector machine (SVM) based optimization formulation called SVM+. Liang and Cherkassky showed the connection between SVM+ and multitask learning (MTL) approaches in machine learning, and proposed an SVM-based formulation for MTL called SVM+MTL for classification. Training the SVM+MTL classifier requires the solution of a large quadratic programming optimization problem which scales as O(n(3)) with sample size n. So there is a need to develop computationally efficient algorithms for implementing SVM+MTL. This brief generalizes Platt's sequential minimal optimization (SMO) algorithm to the SVM+MTL setting. Empirical results show that, for typical SVM+MTL problems, the proposed generalized SMO achieves over 100 times speed-up, in comparison with general-purpose optimization routines.

  17. Automatic Classification of Sub-Techniques in Classical Cross-Country Skiing Using a Machine Learning Algorithm on Micro-Sensor Data

    Directory of Open Access Journals (Sweden)

    Ole Marius Hoel Rindal

    2017-12-01

    Full Text Available The automatic classification of sub-techniques in classical cross-country skiing provides unique possibilities for analyzing the biomechanical aspects of outdoor skiing. This is currently possible due to the miniaturization and flexibility of wearable inertial measurement units (IMUs that allow researchers to bring the laboratory to the field. In this study, we aimed to optimize the accuracy of the automatic classification of classical cross-country skiing sub-techniques by using two IMUs attached to the skier’s arm and chest together with a machine learning algorithm. The novelty of our approach is the reliable detection of individual cycles using a gyroscope on the skier’s arm, while a neural network machine learning algorithm robustly classifies each cycle to a sub-technique using sensor data from an accelerometer on the chest. In this study, 24 datasets from 10 different participants were separated into the categories training-, validation- and test-data. Overall, we achieved a classification accuracy of 93.9% on the test-data. Furthermore, we illustrate how an accurate classification of sub-techniques can be combined with data from standard sports equipment including position, altitude, speed and heart rate measuring systems. Combining this information has the potential to provide novel insight into physiological and biomechanical aspects valuable to coaches, athletes and researchers.

  18. Classification and Weakly Supervised Pain Localization using Multiple Segment Representation.

    Science.gov (United States)

    Sikka, Karan; Dhall, Abhinav; Bartlett, Marian Stewart

    2014-10-01

    Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through 'concept frames' to 'concept segments' and argues through extensive experiments that algorithms such as MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Extensive experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of the approach by achieving competitive results on both tasks of pain classification and localization in videos. We also empirically evaluate the contributions of different components of MS-MIL. The paper also includes the visualization of discriminative facial patches, important for pain detection, as discovered by our

  19. Application of multiple signal classification algorithm to frequency estimation in coherent dual-frequency lidar

    Science.gov (United States)

    Li, Ruixiao; Li, Kun; Zhao, Changming

    2018-01-01

    Coherent dual-frequency Lidar (CDFL) is a new development of Lidar which dramatically enhances the ability to decrease the influence of atmospheric interference by using dual-frequency laser to measure the range and velocity with high precision. Based on the nature of CDFL signals, we propose to apply the multiple signal classification (MUSIC) algorithm in place of the fast Fourier transform (FFT) to estimate the phase differences in dual-frequency Lidar. In the presence of Gaussian white noise, the simulation results show that the signal peaks are more evident when using MUSIC algorithm instead of FFT in condition of low signal-noise-ratio (SNR), which helps to improve the precision of detection on range and velocity, especially for the long distance measurement systems.

  20. Support Vector Machines Trained with Evolutionary Algorithms Employing Kernel Adatron for Large Scale Classification of Protein Structures.

    Science.gov (United States)

    Arana-Daniel, Nancy; Gallegos, Alberto A; López-Franco, Carlos; Alanís, Alma Y; Morales, Jacob; López-Franco, Adriana

    2016-01-01

    With the increasing power of computers, the amount of data that can be processed in small periods of time has grown exponentially, as has the importance of classifying large-scale data efficiently. Support vector machines have shown good results classifying large amounts of high-dimensional data, such as data generated by protein structure prediction, spam recognition, medical diagnosis, optical character recognition and text classification, etc. Most state of the art approaches for large-scale learning use traditional optimization methods, such as quadratic programming or gradient descent, which makes the use of evolutionary algorithms for training support vector machines an area to be explored. The present paper proposes an approach that is simple to implement based on evolutionary algorithms and Kernel-Adatron for solving large-scale classification problems, focusing on protein structure prediction. The functional properties of proteins depend upon their three-dimensional structures. Knowing the structures of proteins is crucial for biology and can lead to improvements in areas such as medicine, agriculture and biofuels.

  1. Classification Model for Forest Fire Hotspot Occurrences Prediction Using ANFIS Algorithm

    Science.gov (United States)

    Wijayanto, A. K.; Sani, O.; Kartika, N. D.; Herdiyeni, Y.

    2017-01-01

    This study proposed the application of data mining technique namely Adaptive Neuro-Fuzzy inference system (ANFIS) on forest fires hotspot data to develop classification models for hotspots occurrence in Central Kalimantan. Hotspot is a point that is indicated as the location of fires. In this study, hotspot distribution is categorized as true alarm and false alarm. ANFIS is a soft computing method in which a given inputoutput data set is expressed in a fuzzy inference system (FIS). The FIS implements a nonlinear mapping from its input space to the output space. The method of this study classified hotspots as target objects by correlating spatial attributes data using three folds in ANFIS algorithm to obtain the best model. The best result obtained from the 3rd fold provided low error for training (error = 0.0093676) and also low error testing result (error = 0.0093676). Attribute of distance to road is the most determining factor that influences the probability of true and false alarm where the level of human activities in this attribute is higher. This classification model can be used to develop early warning system of forest fire.

  2. Automated supervised classification of variable stars. I. Methodology

    NARCIS (Netherlands)

    Debosscher, J.; Sarro, L.M.; Aerts, C.C.; Cuypers, J.; Vandenbussche, B.; Garrido, R.; Solano, E.

    2007-01-01

    Context: The fast classification of new variable stars is an important step in making them available for further research. Selection of science targets from large databases is much more efficient if they have been classified first. Defining the classes in terms of physical parameters is also

  3. A Novel User Classification Method for Femtocell Network by Using Affinity Propagation Algorithm and Artificial Neural Network

    Directory of Open Access Journals (Sweden)

    Afaz Uddin Ahmed

    2014-01-01

    Full Text Available An artificial neural network (ANN and affinity propagation (AP algorithm based user categorization technique is presented. The proposed algorithm is designed for closed access femtocell network. ANN is used for user classification process and AP algorithm is used to optimize the ANN training process. AP selects the best possible training samples for faster ANN training cycle. The users are distinguished by using the difference of received signal strength in a multielement femtocell device. A previously developed directive microstrip antenna is used to configure the femtocell device. Simulation results show that, for a particular house pattern, the categorization technique without AP algorithm takes 5 indoor users and 10 outdoor users to attain an error-free operation. While integrating AP algorithm with ANN, the system takes 60% less training samples reducing the training time up to 50%. This procedure makes the femtocell more effective for closed access operation.

  4. A Novel User Classification Method for Femtocell Network by Using Affinity Propagation Algorithm and Artificial Neural Network

    Science.gov (United States)

    Ahmed, Afaz Uddin; Tariqul Islam, Mohammad; Ismail, Mahamod; Kibria, Salehin; Arshad, Haslina

    2014-01-01

    An artificial neural network (ANN) and affinity propagation (AP) algorithm based user categorization technique is presented. The proposed algorithm is designed for closed access femtocell network. ANN is used for user classification process and AP algorithm is used to optimize the ANN training process. AP selects the best possible training samples for faster ANN training cycle. The users are distinguished by using the difference of received signal strength in a multielement femtocell device. A previously developed directive microstrip antenna is used to configure the femtocell device. Simulation results show that, for a particular house pattern, the categorization technique without AP algorithm takes 5 indoor users and 10 outdoor users to attain an error-free operation. While integrating AP algorithm with ANN, the system takes 60% less training samples reducing the training time up to 50%. This procedure makes the femtocell more effective for closed access operation. PMID:25133214

  5. An automated land-use mapping comparison of the Bayesian maximum likelihood and linear discriminant analysis algorithms

    Science.gov (United States)

    Tom, C. H.; Miller, L. D.

    1984-01-01

    The Bayesian maximum likelihood parametric classifier has been tested against the data-based formulation designated 'linear discrimination analysis', using the 'GLIKE' decision and "CLASSIFY' classification algorithms in the Landsat Mapping System. Identical supervised training sets, USGS land use/land cover classes, and various combinations of Landsat image and ancilliary geodata variables, were used to compare the algorithms' thematic mapping accuracy on a single-date summer subscene, with a cellularized USGS land use map of the same time frame furnishing the ground truth reference. CLASSIFY, which accepts a priori class probabilities, is found to be more accurate than GLIKE, which assumes equal class occurrences, for all three mapping variable sets and both levels of detail. These results may be generalized to direct accuracy, time, cost, and flexibility advantages of linear discriminant analysis over Bayesian methods.

  6. Machine-learning methods in the classification of water bodies

    Directory of Open Access Journals (Sweden)

    Sołtysiak Marek

    2016-06-01

    Full Text Available Amphibian species have been considered as useful ecological indicators. They are used as indicators of environmental contamination, ecosystem health and habitat quality., Amphibian species are sensitive to changes in the aquatic environment and therefore, may form the basis for the classification of water bodies. Water bodies in which there are a large number of amphibian species are especially valuable even if they are located in urban areas. The automation of the classification process allows for a faster evaluation of the presence of amphibian species in the water bodies. Three machine-learning methods (artificial neural networks, decision trees and the k-nearest neighbours algorithm have been used to classify water bodies in Chorzów – one of 19 cities in the Upper Silesia Agglomeration. In this case, classification is a supervised data mining method consisting of several stages such as building the model, the testing phase and the prediction. Seven natural and anthropogenic features of water bodies (e.g. the type of water body, aquatic plants, the purpose of the water body (destination, position of the water body in relation to any possible buildings, condition of the water body, the degree of littering, the shore type and fishing activities have been taken into account in the classification. The data set used in this study involved information about 71 different water bodies and 9 amphibian species living in them. The results showed that the best average classification accuracy was obtained with the multilayer perceptron neural network.

  7. Application of Musical Information Retrieval (MIR Techniques to Seismic Facies Classification. Examples in Hydrocarbon Exploration

    Directory of Open Access Journals (Sweden)

    Paolo Dell’Aversana

    2016-12-01

    Full Text Available In this paper, we introduce a novel approach for automatic pattern recognition and classification of geophysical data based on digital music technology. We import and apply in the geophysical domain the same approaches commonly used for Musical Information Retrieval (MIR. After accurate conversion from geophysical formats (example: SEG-Y to musical formats (example: Musical Instrument Digital Interface, or briefly MIDI, we extract musical features from the converted data. These can be single-valued attributes, such as pitch and sound intensity, or multi-valued attributes, such as pitch histograms, melodic, harmonic and rhythmic paths. Using a real data set, we show that these musical features can be diagnostic for seismic facies classification in a complex exploration area. They can be complementary with respect to “conventional” seismic attributes. Using a supervised machine learning approach based on the k-Nearest Neighbors algorithm and on Automatic Neural Networks, we classify three gas-bearing channels. The good performance of our classification approach is confirmed by borehole data available in the same area.

  8. Restricted Boltzmann machines based oversampling and semi-supervised learning for false positive reduction in breast CAD.

    Science.gov (United States)

    Cao, Peng; Liu, Xiaoli; Bao, Hang; Yang, Jinzhu; Zhao, Dazhe

    2015-01-01

    The false-positive reduction (FPR) is a crucial step in the computer aided detection system for the breast. The issues of imbalanced data distribution and the limitation of labeled samples complicate the classification procedure. To overcome these challenges, we propose oversampling and semi-supervised learning methods based on the restricted Boltzmann machines (RBMs) to solve the classification of imbalanced data with a few labeled samples. To evaluate the proposed method, we conducted a comprehensive performance study and compared its results with the commonly used techniques. Experiments on benchmark dataset of DDSM demonstrate the effectiveness of the RBMs based oversampling and semi-supervised learning method in terms of geometric mean (G-mean) for false positive reduction in Breast CAD.

  9. Penalized feature selection and classification in bioinformatics

    OpenAIRE

    Ma, Shuangge; Huang, Jian

    2008-01-01

    In bioinformatics studies, supervised classification with high-dimensional input variables is frequently encountered. Examples routinely arise in genomic, epigenetic and proteomic studies. Feature selection can be employed along with classifier construction to avoid over-fitting, to generate more reliable classifier and to provide more insights into the underlying causal relationships. In this article, we provide a review of several recently developed penalized feature selection and classific...

  10. Data Transformation Functions for Expanded Search Spaces in Geographic Sample Supervised Segment Generation

    OpenAIRE

    Christoff Fourie; Elisabeth Schoepfer

    2014-01-01

    Sample supervised image analysis, in particular sample supervised segment generation, shows promise as a methodological avenue applicable within Geographic Object-Based Image Analysis (GEOBIA). Segmentation is acknowledged as a constituent component within typically expansive image analysis processes. A general extension to the basic formulation of an empirical discrepancy measure directed segmentation algorithm parameter tuning approach is proposed. An expanded search landscape is defined, c...

  11. Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

    Directory of Open Access Journals (Sweden)

    Mustafa Serter Uzer

    2013-01-01

    Full Text Available This paper offers a hybrid approach that uses the artificial bee colony (ABC algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.

  12. Reproducibility of measurements and variability of the classification algorithm of Stratus OCT in normal, hypertensive, and glaucomatous patients

    Directory of Open Access Journals (Sweden)

    Alfonso Antón

    2009-01-01

    Full Text Available Alfonso Antón1,2,3, Marta Castany1,2, Marta Pazos-Lopez1,2, Ruben Cuadrado3, Ana Flores3, Miguel Castilla11Hospital de la Esperanza-Hospital del Mar (IMAS, Barcelona, Spain; 2Institut Català de la Retina (ICR, Barcelona, Spain. Glaucoma Department; 3Instituto Universitario de Oftalmobiología Aplicada (IOBA, Universidad de Valladolid, Valladolid, EspañaPurpose: To assess the reproducibility of retinal nerve fiber layer (RNFL measurements and the variability of the probabilistic classification algorithm in normal, hypertensive and glaucomatous eyes using Stratus optical coherence tomography (OCT.Methods: Forty-nine eyes (13 normal, 17 ocular hypertensive [OHT] and 19 glaucomatous of 49 subjects were included in this study. RNFL was determined with Stratus OCT using the standard protocol RNFL thickness 3.4. Three different images of each eye were taken consecutively during the same session. To evaluate OCT reproducibility, coefficient of variation (COV and intraclass correlation coefficient (ICC were calculated for average thickness (AvgT, superior average thickness (Savg, and inferior average thickness (Iavg parameters. The variability of the results of the probabilistic classification algorithm, based on the OCT normative database, was also analyzed. The percentage of eyes with changes in the category assigned was calculated for each group.Results: The 50th percentile of COV was 2.96%, 4.00%, and 4.31% for AvgT, Savg, and Iavg, respectively. Glaucoma group presented the largest COV for all three parameters (3.87%, 5.55%, 7.82%. ICC were greater than 0.75 for almost all measures (except from the inferior thickness parameter in the normal group; ICC = 0.64, 95% CI 0.334–0.857. Regarding the probabilistic classification algorithm for the three parameters (AvgT, Savg, Iavg, the percentage of eyes without color-code category changes among the three images was as follows: normal group, 100%, 84.6% and 92%; OHT group, 89.5%, 52.7%, 79%; and

  13. HYBRID OPTIMIZATION OF OBJECT-BASED CLASSIFICATION IN HIGH-RESOLUTION IMAGES USING CONTINOUS ANT COLONY ALGORITHM WITH EMPHASIS ON BUILDING DETECTION

    Directory of Open Access Journals (Sweden)

    E. Tamimi

    2017-09-01

    Full Text Available Automatic building detection from High Spatial Resolution (HSR images is one of the most important issues in Remote Sensing (RS. Due to the limited number of spectral bands in HSR images, using other features will lead to improve accuracy. By adding these features, the presence probability of dependent features will be increased, which leads to accuracy reduction. In addition, some parameters should be determined in Support Vector Machine (SVM classification. Therefore, it is necessary to simultaneously determine classification parameters and select independent features according to image type. Optimization algorithm is an efficient method to solve this problem. On the other hand, pixel-based classification faces several challenges such as producing salt-paper results and high computational time in high dimensional data. Hence, in this paper, a novel method is proposed to optimize object-based SVM classification by applying continuous Ant Colony Optimization (ACO algorithm. The advantages of the proposed method are relatively high automation level, independency of image scene and type, post processing reduction for building edge reconstruction and accuracy improvement. The proposed method was evaluated by pixel-based SVM and Random Forest (RF classification in terms of accuracy. In comparison with optimized pixel-based SVM classification, the results showed that the proposed method improved quality factor and overall accuracy by 17% and 10%, respectively. Also, in the proposed method, Kappa coefficient was improved by 6% rather than RF classification. Time processing of the proposed method was relatively low because of unit of image analysis (image object. These showed the superiority of the proposed method in terms of time and accuracy.

  14. Asynchronous data-driven classification of weapon systems

    International Nuclear Information System (INIS)

    Jin, Xin; Mukherjee, Kushal; Gupta, Shalabh; Ray, Asok; Phoha, Shashi; Damarla, Thyagaraju

    2009-01-01

    This communication addresses real-time weapon classification by analysis of asynchronous acoustic data, collected from microphones on a sensor network. The weapon classification algorithm consists of two parts: (i) feature extraction from time-series data using symbolic dynamic filtering (SDF), and (ii) pattern classification based on the extracted features using the language measure (LM) and support vector machine (SVM). The proposed algorithm has been tested on field data, generated by firing of two types of rifles. The results of analysis demonstrate high accuracy and fast execution of the pattern classification algorithm with low memory requirements. Potential applications include simultaneous shooter localization and weapon classification with soldier-wearable networked sensors. (rapid communication)

  15. Detection of Erroneous Payments Utilizing Supervised And Unsupervised Data Mining Techniques

    National Research Council Canada - National Science Library

    Yanik, Todd

    2004-01-01

    ... (C&RT)) modeling algorithms. S-Plus software was used to construct a supervised model of vendor payment data using Logistic Regression, along with the Hosmer-Lemeshow Test, for testing the predictive ability of the model...

  16. Transfer learning improves supervised image segmentation across imaging protocols

    DEFF Research Database (Denmark)

    van Opbroek, Annegreet; Ikram, M. Arfan; Vernooij, Meike W.

    2015-01-01

    with slightly different characteristics. The performance of the four transfer classifiers was compared to that of standard supervised classification on two MRI brain-segmentation tasks with multi-site data: white matter, gray matter, and CSF segmentation; and white-matter- /MS-lesion segmentation......The variation between images obtained with different scanners or different imaging protocols presents a major challenge in automatic segmentation of biomedical images. This variation especially hampers the application of otherwise successful supervised-learning techniques which, in order to perform...... well, often require a large amount of labeled training data that is exactly representative of the target data. We therefore propose to use transfer learning for image segmentation. Transfer-learning techniques can cope with differences in distributions between training and target data, and therefore...

  17. A bifurcation identifier for IV-OCT using orthogonal least squares and supervised machine learning.

    Science.gov (United States)

    Macedo, Maysa M G; Guimarães, Welingson V N; Galon, Micheli Z; Takimura, Celso K; Lemos, Pedro A; Gutierrez, Marco Antonio

    2015-12-01

    Intravascular optical coherence tomography (IV-OCT) is an in-vivo imaging modality based on the intravascular introduction of a catheter which provides a view of the inner wall of blood vessels with a spatial resolution of 10-20 μm. Recent studies in IV-OCT have demonstrated the importance of the bifurcation regions. Therefore, the development of an automated tool to classify hundreds of coronary OCT frames as bifurcation or nonbifurcation can be an important step to improve automated methods for atherosclerotic plaques quantification, stent analysis and co-registration between different modalities. This paper describes a fully automated method to identify IV-OCT frames in bifurcation regions. The method is divided into lumen detection; feature extraction; and classification, providing a lumen area quantification, geometrical features of the cross-sectional lumen and labeled slices. This classification method is a combination of supervised machine learning algorithms and feature selection using orthogonal least squares methods. Training and tests were performed in sets with a maximum of 1460 human coronary OCT frames. The lumen segmentation achieved a mean difference of lumen area of 0.11 mm(2) compared with manual segmentation, and the AdaBoost classifier presented the best result reaching a F-measure score of 97.5% using 104 features. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. DuSK: A Dual Structure-preserving Kernel for Supervised Tensor Learning with Applications to Neuroimages

    Science.gov (United States)

    He, Lifang; Kong, Xiangnan; Yu, Philip S.; Ragin, Ann B.; Hao, Zhifeng; Yang, Xiaowei

    2015-01-01

    With advances in data collection technologies, tensor data is assuming increasing prominence in many applications and the problem of supervised tensor learning has emerged as a topic of critical significance in the data mining and machine learning community. Conventional methods for supervised tensor learning mainly focus on learning kernels by flattening the tensor into vectors or matrices, however structural information within the tensors will be lost. In this paper, we introduce a new scheme to design structure-preserving kernels for supervised tensor learning. Specifically, we demonstrate how to leverage the naturally available structure within the tensorial representation to encode prior knowledge in the kernel. We proposed a tensor kernel that can preserve tensor structures based upon dual-tensorial mapping. The dual-tensorial mapping function can map each tensor instance in the input space to another tensor in the feature space while preserving the tensorial structure. Theoretically, our approach is an extension of the conventional kernels in the vector space to tensor space. We applied our novel kernel in conjunction with SVM to real-world tensor classification problems including brain fMRI classification for three different diseases (i.e., Alzheimer's disease, ADHD and brain damage by HIV). Extensive empirical studies demonstrate that our proposed approach can effectively boost tensor classification performances, particularly with small sample sizes. PMID:25927014

  19. Supervised remote sensing image classification: An example of a ...

    African Journals Online (AJOL)

    These conventional multi-class classifiers/algorithms are usually written in programming languages such as C, C++, and python. The objective of this research is to experiment the use of a binary classifier/algorithm for multi-class remote sensing task, implemented in MATLAB. MATLAB is a programming language just like C ...

  20. Classification and its applications for drug-target interaction identification

    OpenAIRE

    Mei, Jian-Ping; Kwoh, Chee-Keong; Yang, Peng; Li, Xiao-Li

    2015-01-01

    Classification is one of the most popular and widely used supervised learning tasks, which categorizes objects into predefined classes based on known knowledge. Classification has been an important research topic in machine learning and data mining. Different classification methods have been proposed and applied to deal with various real-world problems. Unlike unsupervised learning such as clustering, a classifier is typically trained with labeled data before being used to make prediction, an...

  1. Regular graph construction for semi-supervised learning

    International Nuclear Information System (INIS)

    Vega-Oliveros, Didier A; Berton, Lilian; Eberle, Andre Mantini; Lopes, Alneu de Andrade; Zhao, Liang

    2014-01-01

    Semi-supervised learning (SSL) stands out for using a small amount of labeled points for data clustering and classification. In this scenario graph-based methods allow the analysis of local and global characteristics of the available data by identifying classes or groups regardless data distribution and representing submanifold in Euclidean space. Most of methods used in literature for SSL classification do not worry about graph construction. However, regular graphs can obtain better classification accuracy compared to traditional methods such as k-nearest neighbor (kNN), since kNN benefits the generation of hubs and it is not appropriate for high-dimensionality data. Nevertheless, methods commonly used for generating regular graphs have high computational cost. We tackle this problem introducing an alternative method for generation of regular graphs with better runtime performance compared to methods usually find in the area. Our technique is based on the preferential selection of vertices according some topological measures, like closeness, generating at the end of the process a regular graph. Experiments using the global and local consistency method for label propagation show that our method provides better or equal classification rate in comparison with kNN

  2. Cellular automata rule characterization and classification using texture descriptors

    Science.gov (United States)

    Machicao, Jeaneth; Ribas, Lucas C.; Scabini, Leonardo F. S.; Bruno, Odermir M.

    2018-05-01

    The cellular automata (CA) spatio-temporal patterns have attracted the attention from many researchers since it can provide emergent behavior resulting from the dynamics of each individual cell. In this manuscript, we propose an approach of texture image analysis to characterize and classify CA rules. The proposed method converts the CA spatio-temporal patterns into a gray-scale image. The gray-scale is obtained by creating a binary number based on the 8-connected neighborhood of each dot of the CA spatio-temporal pattern. We demonstrate that this technique enhances the CA rule characterization and allow to use different texture image analysis algorithms. Thus, various texture descriptors were evaluated in a supervised training approach aiming to characterize the CA's global evolution. Our results show the efficiency of the proposed method for the classification of the elementary CA (ECAs), reaching a maximum of 99.57% of accuracy rate according to the Li-Packard scheme (6 classes) and 94.36% for the classification of the 88 rules scheme. Moreover, within the image analysis context, we found a better performance of the method by means of a transformation of the binary states to a gray-scale.

  3. Gender classification system in uncontrolled environments

    Science.gov (United States)

    Zeng, Pingping; Zhang, Yu-Jin; Duan, Fei

    2011-01-01

    Most face analysis systems available today perform mainly on restricted databases of images in terms of size, age, illumination. In addition, it is frequently assumed that all images are frontal and unconcealed. Actually, in a non-guided real-time supervision, the face pictures taken may often be partially covered and with head rotation less or more. In this paper, a special system supposed to be used in real-time surveillance with un-calibrated camera and non-guided photography is described. It mainly consists of five parts: face detection, non-face filtering, best-angle face selection, texture normalization, and gender classification. Emphases are focused on non-face filtering and best-angle face selection parts as well as texture normalization. Best-angle faces are figured out by PCA reconstruction, which equals to an implicit face alignment and results in a huge increase of the accuracy for gender classification. Dynamic skin model and a masked PCA reconstruction algorithm are applied to filter out faces detected in error. In order to fully include facial-texture and shape-outline features, a hybrid feature that is a combination of Gabor wavelet and PHoG (pyramid histogram of gradients) was proposed to equitable inner texture and outer contour. Comparative study on the effects of different non-face filtering and texture masking methods in the context of gender classification by SVM is reported through experiments on a set of UT (a company name) face images, a large number of internet images and CAS (Chinese Academy of Sciences) face database. Some encouraging results are obtained.

  4. Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data

    Directory of Open Access Journals (Sweden)

    Viswanath Satish

    2012-02-01

    of high-dimensional biomedical data classification and segmentation problems. Our generalizable framework allows for improved representation and classification in the context of both imaging and non-imaging data. The algorithm offers a promising solution to problems that currently plague DR methods, and may allow for extension to other areas of biomedical data analysis.

  5. A study of several CAD methods for classification of clustered microcalcifications

    Science.gov (United States)

    Wei, Liyang; Yang, Yongyi; Nishikawa, Robert M.; Jiang, Yulei

    2005-04-01

    In this paper we investigate several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs), aimed to assisting radiologists for more accurate diagnosis of breast cancer in a computer-aided diagnosis (CADx) scheme. The methods we consider include: support vector machine (SVM), kernel Fisher discriminant (KFD), and committee machines (ensemble averaging and AdaBoost), most of which have been developed recently in statistical learning theory. We formulate differentiation of malignant from benign MCs as a supervised learning problem, and apply these learning methods to develop the classification algorithms. As input, these methods use image features automatically extracted from clustered MCs. We test these methods using a database of 697 clinical mammograms from 386 cases, which include a wide spectrum of difficult-to-classify cases. We use receiver operating characteristic (ROC) analysis to evaluate and compare the classification performance by the different methods. In addition, we also investigate how to combine information from multiple-view mammograms of the same case so that the best decision can be made by a classifier. In our experiments, the kernel-based methods (i.e., SVM, KFD) yield the best performance, significantly outperforming a well-established CADx approach based on neural network learning.

  6. Prediction of survival to discharge following cardiopulmonary resuscitation using classification and regression trees.

    Science.gov (United States)

    Ebell, Mark H; Afonso, Anna M; Geocadin, Romergryko G

    2013-12-01

    To predict the likelihood that an inpatient who experiences cardiopulmonary arrest and undergoes cardiopulmonary resuscitation survives to discharge with good neurologic function or with mild deficits (Cerebral Performance Category score = 1). Classification and Regression Trees were used to develop branching algorithms that optimize the ability of a series of tests to correctly classify patients into two or more groups. Data from 2007 to 2008 (n = 38,092) were used to develop candidate Classification and Regression Trees models to predict the outcome of inpatient cardiopulmonary resuscitation episodes and data from 2009 (n = 14,435) to evaluate the accuracy of the models and judge the degree of over fitting. Both supervised and unsupervised approaches to model development were used. 366 hospitals participating in the Get With the Guidelines-Resuscitation registry. Adult inpatients experiencing an index episode of cardiopulmonary arrest and undergoing cardiopulmonary resuscitation in the hospital. The five candidate models had between 8 and 21 nodes and an area under the receiver operating characteristic curve from 0.718 to 0.766 in the derivation group and from 0.683 to 0.746 in the validation group. One of the supervised models had 14 nodes and classified 27.9% of patients as very unlikely to survive neurologically intact or with mild deficits (Tree models that predict survival to discharge with good neurologic function or with mild deficits following in-hospital cardiopulmonary arrest. Models like this can assist physicians and patients who are considering do-not-resuscitate orders.

  7. Classification of smooth Fano polytopes

    DEFF Research Database (Denmark)

    Øbro, Mikkel

    A simplicial lattice polytope containing the origin in the interior is called a smooth Fano polytope, if the vertices of every facet is a basis of the lattice. The study of smooth Fano polytopes is motivated by their connection to toric varieties. The thesis concerns the classification of smooth...... Fano polytopes up to isomorphism. A smooth Fano -polytope can have at most vertices. In case of vertices an explicit classification is known. The thesis contains the classification in case of vertices. Classifications of smooth Fano -polytopes for fixed exist only for . In the thesis an algorithm...... for the classification of smooth Fano -polytopes for any given is presented. The algorithm has been implemented and used to obtain the complete classification for ....

  8. Incorporating functional inter-relationships into protein function prediction algorithms

    Directory of Open Access Journals (Sweden)

    Kumar Vipin

    2009-05-01

    Full Text Available Abstract Background Functional classification schemes (e.g. the Gene Ontology that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches. Results We propose a method to enhance the performance of classification-based protein function prediction algorithms by addressing the issue of using these interrelationships between functional classes constituting functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate predictions for a large number of the functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of

  9. Adaptive Matrices for Color Texture Classification

    NARCIS (Netherlands)

    Bunte, Kerstin; Giotis, Ioannis; Petkov, Nicolai; Biehl, Michael; Real, P; DiazPernil, D; MolinaAbril, H; Berciano, A; Kropatsch, W

    2011-01-01

    In this paper we introduce an integrative approach towards color texture classification learned by a supervised framework. Our approach is based on the Generalized Learning Vector Quantization (GLVQ), extended by an adaptive distance measure which is defined in the Fourier domain and 2D Gabor

  10. The efficiency of the RULES-4 classification learning algorithm in predicting the density of agents

    Directory of Open Access Journals (Sweden)

    Ziad Salem

    2014-12-01

    Full Text Available Learning is the act of obtaining new or modifying existing knowledge, behaviours, skills or preferences. The ability to learn is found in humans, other organisms and some machines. Learning is always based on some sort of observations or data such as examples, direct experience or instruction. This paper presents a classification algorithm to learn the density of agents in an arena based on the measurements of six proximity sensors of a combined actuator sensor units (CASUs. Rules are presented that were induced by the learning algorithm that was trained with data-sets based on the CASU’s sensor data streams collected during a number of experiments with “Bristlebots (agents in the arena (environment”. It was found that a set of rules generated by the learning algorithm is able to predict the number of bristlebots in the arena based on the CASU’s sensor readings with satisfying accuracy.

  11. Predicting incomplete gene microarray data with the use of supervised learning algorithms

    CSIR Research Space (South Africa)

    Twala, B

    2010-10-01

    Full Text Available that prediction using supervised learning can be improved in probabilistic terms given incomplete microarray data. This imputation approach is based on the a priori probability of each value determined from the instances at that node of a decision tree (PDT...

  12. Mapping US Urban Extents from MODIS Data Using One-Class Classification Method

    Directory of Open Access Journals (Sweden)

    Bo Wan

    2015-08-01

    Full Text Available Urban areas are one of the most important components of human society. Their extents have been continuously growing during the last few decades. Accurate and timely measurements of the extents of urban areas can help in analyzing population densities and urban sprawls and in studying environmental issues related to urbanization. Urban extents detected from remotely sensed data are usually a by-product of land use classification results, and their interpretation requires a full understanding of land cover types. In this study, for the first time, we mapped urban extents in the continental United States using a novel one-class classification method, i.e., positive and unlabeled learning (PUL, with multi-temporal Moderate Resolution Imaging Spectroradiometer (MODIS data for the year 2010. The Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS night stable light data were used to calibrate the urban extents obtained from the one-class classification scheme. Our results demonstrated the effectiveness of the use of the PUL algorithm in mapping large-scale urban areas from coarse remote-sensing images, for the first time. The total accuracy of mapped urban areas was 92.9% and the kappa coefficient was 0.85. The use of DMSP-OLS night stable light data can significantly reduce false detection rates from bare land and cropland far from cities. Compared with traditional supervised classification methods, the one-class classification scheme can greatly reduce the effort involved in collecting training datasets, without losing predictive accuracy.

  13. Strategies to Increase Accuracy in Text Classification

    NARCIS (Netherlands)

    D. Blommesteijn (Dennis)

    2014-01-01

    htmlabstractText classification via supervised learning involves various steps from processing raw data, features extraction to training and validating classifiers. Within these steps implementation decisions are critical to the resulting classifier accuracy. This paper contains a report of the

  14. Data Transformation Functions for Expanded Search Spaces in Geographic Sample Supervised Segment Generation

    Directory of Open Access Journals (Sweden)

    Christoff Fourie

    2014-04-01

    Full Text Available Sample supervised image analysis, in particular sample supervised segment generation, shows promise as a methodological avenue applicable within Geographic Object-Based Image Analysis (GEOBIA. Segmentation is acknowledged as a constituent component within typically expansive image analysis processes. A general extension to the basic formulation of an empirical discrepancy measure directed segmentation algorithm parameter tuning approach is proposed. An expanded search landscape is defined, consisting not only of the segmentation algorithm parameters, but also of low-level, parameterized image processing functions. Such higher dimensional search landscapes potentially allow for achieving better segmentation accuracies. The proposed method is tested with a range of low-level image transformation functions and two segmentation algorithms. The general effectiveness of such an approach is demonstrated compared to a variant only optimising segmentation algorithm parameters. Further, it is shown that the resultant search landscapes obtained from combining mid- and low-level image processing parameter domains, in our problem contexts, are sufficiently complex to warrant the use of population based stochastic search methods. Interdependencies of these two parameter domains are also demonstrated, necessitating simultaneous optimization.

  15. An Automated Cropland Classification Algorithm (ACCA) for Tajikistan by Combining Landsat, MODIS, and Secondary Data

    OpenAIRE

    Thenkabail, Prasad S.; Wu, Zhuoting

    2012-01-01

    The overarching goal of this research was to develop and demonstrate an automated Cropland Classification Algorithm (ACCA) that will rapidly, routinely, and accurately classify agricultural cropland extent, areas, and characteristics (e.g., irrigated vs. rainfed) over large areas such as a country or a region through combination of multi-sensor remote sensing and secondary data. In this research, a rule-based ACCA was conceptualized, developed, and demonstrated for the country of Tajikistan u...

  16. A review of supervised machine learning applied to ageing research.

    Science.gov (United States)

    Fabris, Fabio; Magalhães, João Pedro de; Freitas, Alex A

    2017-04-01

    Broadly speaking, supervised machine learning is the computational task of learning correlations between variables in annotated data (the training set), and using this information to create a predictive model capable of inferring annotations for new data, whose annotations are not known. Ageing is a complex process that affects nearly all animal species. This process can be studied at several levels of abstraction, in different organisms and with different objectives in mind. Not surprisingly, the diversity of the supervised machine learning algorithms applied to answer biological questions reflects the complexities of the underlying ageing processes being studied. Many works using supervised machine learning to study the ageing process have been recently published, so it is timely to review these works, to discuss their main findings and weaknesses. In summary, the main findings of the reviewed papers are: the link between specific types of DNA repair and ageing; ageing-related proteins tend to be highly connected and seem to play a central role in molecular pathways; ageing/longevity is linked with autophagy and apoptosis, nutrient receptor genes, and copper and iron ion transport. Additionally, several biomarkers of ageing were found by machine learning. Despite some interesting machine learning results, we also identified a weakness of current works on this topic: only one of the reviewed papers has corroborated the computational results of machine learning algorithms through wet-lab experiments. In conclusion, supervised machine learning has contributed to advance our knowledge and has provided novel insights on ageing, yet future work should have a greater emphasis in validating the predictions.

  17. Physical Human Activity Recognition Using Wearable Sensors

    Directory of Open Access Journals (Sweden)

    Ferhat Attal

    2015-12-01

    Full Text Available This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle. Three main steps describe the activity recognition process: sensors’ placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN, Support Vector Machines (SVM, Gaussian Mixture Models (GMM, and Random Forest (RF as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM and Hidden Markov Model (HMM, are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.

  18. Physical Human Activity Recognition Using Wearable Sensors.

    Science.gov (United States)

    Attal, Ferhat; Mohammed, Samer; Dedabrishvili, Mariam; Chamroukhi, Faicel; Oukhellou, Latifa; Amirat, Yacine

    2015-12-11

    This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.

  19. Real-time distributed fiber optic sensor for security systems: Performance, event classification and nuisance mitigation

    Science.gov (United States)

    Mahmoud, Seedahmed S.; Visagathilagar, Yuvaraja; Katsifolis, Jim

    2012-09-01

    The success of any perimeter intrusion detection system depends on three important performance parameters: the probability of detection (POD), the nuisance alarm rate (NAR), and the false alarm rate (FAR). The most fundamental parameter, POD, is normally related to a number of factors such as the event of interest, the sensitivity of the sensor, the installation quality of the system, and the reliability of the sensing equipment. The suppression of nuisance alarms without degrading sensitivity in fiber optic intrusion detection systems is key to maintaining acceptable performance. Signal processing algorithms that maintain the POD and eliminate nuisance alarms are crucial for achieving this. In this paper, a robust event classification system using supervised neural networks together with a level crossings (LCs) based feature extraction algorithm is presented for the detection and recognition of intrusion and non-intrusion events in a fence-based fiber-optic intrusion detection system. A level crossings algorithm is also used with a dynamic threshold to suppress torrential rain-induced nuisance alarms in a fence system. Results show that rain-induced nuisance alarms can be suppressed for rainfall rates in excess of 100 mm/hr with the simultaneous detection of intrusion events. The use of a level crossing based detection and novel classification algorithm is also presented for a buried pipeline fiber optic intrusion detection system for the suppression of nuisance events and discrimination of intrusion events. The sensor employed for both types of systems is a distributed bidirectional fiber-optic Mach-Zehnder (MZ) interferometer.

  20. 7 CFR 27.80 - Fees; classification, Micronaire, and supervision.

    Science.gov (United States)

    2010-01-01

    ....80 Section 27.80 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards, Inspections, Marketing Practices), DEPARTMENT OF AGRICULTURE COMMODITY STANDARDS AND STANDARD CONTAINER REGULATIONS COTTON CLASSIFICATION UNDER COTTON FUTURES LEGISLATION Regulations Costs of...

  1. An efficient flow-based botnet detection using supervised machine learning

    DEFF Research Database (Denmark)

    Stevanovic, Matija; Pedersen, Jens Myrup

    2014-01-01

    Botnet detection represents one of the most crucial prerequisites of successful botnet neutralization. This paper explores how accurate and timely detection can be achieved by using supervised machine learning as the tool of inferring about malicious botnet traffic. In order to do so, the paper...... introduces a novel flow-based detection system that relies on supervised machine learning for identifying botnet network traffic. For use in the system we consider eight highly regarded machine learning algorithms, indicating the best performing one. Furthermore, the paper evaluates how much traffic needs...... to accurately and timely detect botnet traffic using purely flow-based traffic analysis and supervised machine learning. Additionally, the results show that in order to achieve accurate detection traffic flows need to be monitored for only a limited time period and number of packets per flow. This indicates...

  2. Using Unlabeled Data to Improve Text Classification

    National Research Council Canada - National Science Library

    Nigam, Kamal P

    2001-01-01

    .... This dissertation demonstrates that supervised learning algorithms that use a small number of labeled examples and many inexpensive unlabeled examples can create high-accuracy text classifiers...

  3. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    Energy Technology Data Exchange (ETDEWEB)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K. [Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT (United Kingdom); McEwen, Jason D., E-mail: dr.michelle.lochner@gmail.com [Mullard Space Science Laboratory, University College London, Surrey RH5 6NT (United Kingdom)

    2016-08-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  4. PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING

    International Nuclear Information System (INIS)

    Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.; McEwen, Jason D.

    2016-01-01

    Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.

  5. Introduction of a methodology for visualization and graphical interpretation of Bayesian classification models.

    Science.gov (United States)

    Balfer, Jenny; Bajorath, Jürgen

    2014-09-22

    Supervised machine learning models are widely used in chemoinformatics, especially for the prediction of new active compounds or targets of known actives. Bayesian classification methods are among the most popular machine learning approaches for the prediction of activity from chemical structure. Much work has focused on predicting structure-activity relationships (SARs) on the basis of experimental training data. By contrast, only a few efforts have thus far been made to rationalize the performance of Bayesian or other supervised machine learning models and better understand why they might succeed or fail. In this study, we introduce an intuitive approach for the visualization and graphical interpretation of naïve Bayesian classification models. Parameters derived during supervised learning are visualized and interactively analyzed to gain insights into model performance and identify features that determine predictions. The methodology is introduced in detail and applied to assess Bayesian modeling efforts and predictions on compound data sets of varying structural complexity. Different classification models and features determining their performance are characterized in detail. A prototypic implementation of the approach is provided.

  6. Application of a kernel-based online learning algorithm to the classification of nodule candidates in computer-aided detection of CT lung nodules

    International Nuclear Information System (INIS)

    Matsumoto, S.; Ohno, Y.; Takenaka, D.; Sugimura, K.; Yamagata, H.

    2007-01-01

    Classification of the nodule candidates in computer-aided detection (CAD) of lung nodules in CT images was addressed by constructing a nonlinear discriminant function using a kernel-based learning algorithm called the kernel recursive least-squares (KRLS) algorithm. Using the nodule candidates derived from the processing by a CAD scheme of 100 CT datasets containing 253 non-calcified nodules or 3 mm or larger as determined by the consensus of two thoracic radiologists, the following trial were carried out 100 times: by randomly selecting 50 datasets for training, a nonlinear discriminant function was obtained using the nodule candidates in the training datasets and tested with the remaining candidates; for comparison, a rule-based classification was tested in a similar manner. At the number of false positives per case of about 5, the nonlinear classification method showed an improved sensitivity of 80% (mean over the 100 trials) compared with 74% of the rule-based method. (orig.)

  7. Image Classification Workflow Using Machine Learning Methods

    Science.gov (United States)

    Christoffersen, M. S.; Roser, M.; Valadez-Vergara, R.; Fernández-Vega, J. A.; Pierce, S. A.; Arora, R.

    2016-12-01

    Recent increases in the availability and quality of remote sensing datasets have fueled an increasing number of scientifically significant discoveries based on land use classification and land use change analysis. However, much of the software made to work with remote sensing data products, specifically multispectral images, is commercial and often prohibitively expensive. The free to use solutions that are currently available come bundled up as small parts of much larger programs that are very susceptible to bugs and difficult to install and configure. What is needed is a compact, easy to use set of tools to perform land use analysis on multispectral images. To address this need, we have developed software using the Python programming language with the sole function of land use classification and land use change analysis. We chose Python to develop our software because it is relatively readable, has a large body of relevant third party libraries such as GDAL and Spectral Python, and is free to install and use on Windows, Linux, and Macintosh operating systems. In order to test our classification software, we performed a K-means unsupervised classification, Gaussian Maximum Likelihood supervised classification, and a Mahalanobis Distance based supervised classification. The images used for testing were three Landsat rasters of Austin, Texas with a spatial resolution of 60 meters for the years of 1984 and 1999, and 30 meters for the year 2015. The testing dataset was easily downloaded using the Earth Explorer application produced by the USGS. The software should be able to perform classification based on any set of multispectral rasters with little to no modification. Our software makes the ease of land use classification using commercial software available without an expensive license.

  8. [Exploring Flow and Supervision of Medical Instruments by Standing on Frontier of the Reform of Free Trade Zone].

    Science.gov (United States)

    Shen, Jianhua; Han, Meixian; Lu, Fei

    2017-11-30

    Shanghai Waigaoqiao Free Trade Zone as one of the special customs supervision areas of China (Shanghai) free trade pilot area, gathered a large number of general agent enterprises related to medical apparatus and instruments. This article analyzes the characteristics of special environment and medical equipment business in Shanghai Waigaoqiao Free Trade Zone in order to further implement the national administrative examination and approval reform. According to the latest requirement in laws and regulations of medical instruments, and trend of development in the industry of medical instruments, as well as research on the basis of practices of market supervision in countries around the world, this article also proposes measures about precision supervision, coordination of supervision, classification supervision and dynamic supervision to establish a new order of fair and standardized competition in market, and create conditions for establishment of allocation and transport hub of international medicine.

  9. The Application of the Analytic Hierarchy Process and a New Correlation Algorithm to Urban Construction and Supervision Using Multi-Source Government Data in Tianjin

    Directory of Open Access Journals (Sweden)

    Shaoyi Wang

    2018-02-01

    Full Text Available As the era of big data approaches, big data has attracted increasing amounts of attention from researchers. Various types of studies have been conducted and these studies have focused particularly on the management, organization, and correlation of data and calculations using data. Most studies involving big data address applications in scientific, commercial, and ecological fields. However, the application of big data to government management is also needed. This paper examines the application of multi-source government data to urban construction and supervision in Tianjin, China. The analytic hierarchy process and a new approach called the correlation degree algorithm are introduced to calculate the degree of correlation between different approval items in one construction project and between different construction projects. The results show that more than 75% of the construction projects and their approval items are highly correlated. The results of this study suggest that most of the examined construction projects are well supervised, have relatively high probabilities of satisfying the relevant legal requirements, and observe their initial planning schemes.

  10. Text mining in the classification of digital documents

    Directory of Open Access Journals (Sweden)

    Marcial Contreras Barrera

    2016-11-01

    Full Text Available Objective: Develop an automated classifier for the classification of bibliographic material by means of the text mining. Methodology: The text mining is used for the development of the classifier, based on a method of type supervised, conformed by two phases; learning and recognition, in the learning phase, the classifier learns patterns across the analysis of bibliographical records, of the classification Z, belonging to library science, information sciences and information resources, recovered from the database LIBRUNAM, in this phase is obtained the classifier capable of recognizing different subclasses (LC. In the recognition phase the classifier is validated and evaluates across classification tests, for this end bibliographical records of the classification Z are taken randomly, classified by a cataloguer and processed by the automated classifier, in order to obtain the precision of the automated classifier. Results: The application of the text mining achieved the development of the automated classifier, through the method classifying documents supervised type. The precision of the classifier was calculated doing the comparison among the assigned topics manually and automated obtaining 75.70% of precision. Conclusions: The application of text mining facilitated the creation of automated classifier, allowing to obtain useful technology for the classification of bibliographical material with the aim of improving and speed up the process of organizing digital documents.

  11. Traumatic subarachnoid pleural fistula in children: case report, algorithm and classification proposal

    Directory of Open Access Journals (Sweden)

    Moscote-Salazar Luis Rafael

    2016-06-01

    Full Text Available Subarachnoid pleural fistulas are rare. They have been described as complications of thoracic surgery, penetrating injuries and spinal surgery, among others. We present the case of a 3-year-old female child, who suffer spinal cord trauma secondary to a car accident, developing a posterior subarachnoid pleural fistula. To our knowledge this is the first reported case of a pediatric patient with subarachnoid pleural fistula resulting from closed trauma, requiring intensive multimodal management. We also present a management algorithm and a proposed classification. The diagnosis of this pathology is difficult when not associated with neurological deficit. A high degree of suspicion, multidisciplinary management and timely surgical intervention allow optimal management.

  12. Support vector machines and evolutionary algorithms for classification single or together?

    CERN Document Server

    Stoean, Catalin

    2014-01-01

    When discussing classification, support vector machines are known to be a capable and efficient technique to learn and predict with high accuracy within a quick time frame. Yet, their black box means to do so make the practical users quite circumspect about relying on it, without much understanding of the how and why of its predictions. The question raised in this book is how can this ‘masked hero’ be made more comprehensible and friendly to the public: provide a surrogate model for its hidden optimization engine, replace the method completely or appoint a more friendly approach to tag along and offer the much desired explanations? Evolutionary algorithms can do all these and this book presents such possibilities of achieving high accuracy, comprehensibility, reasonable runtime as well as unconstrained performance.

  13. Distributed data collection and supervision based on web sensor

    Science.gov (United States)

    He, Pengju; Dai, Guanzhong; Fu, Lei; Li, Xiangjun

    2006-11-01

    As a node in Internet/Intranet, web sensor has been promoted in recent years and wildly applied in remote manufactory, workshop measurement and control field. However, the conventional scheme can only support HTTP protocol, and the remote users supervise and control the collected data published by web in the standard browser because of the limited resource of the microprocessor in the sensor; moreover, only one node of data acquirement can be supervised and controlled in one instant therefore the requirement of centralized remote supervision, control and data process can not be satisfied in some fields. In this paper, the centralized remote supervision, control and data process by the web sensor are proposed and implemented by the principle of device driver program. The useless information of the every collected web page embedded in the sensor is filtered and the useful data is transmitted to the real-time database in the workstation, and different filter algorithms are designed for different sensors possessing independent web pages. Every sensor node has its own filter program of web, called "web data collection driver program", the collecting details are shielded, and the supervision, control and configuration software can be implemented by the call of web data collection driver program just like the use of the I/O driver program. The proposed technology can be applied in the data acquirement where relative low real-time is required.

  14. Protein classification using modified n-grams and skip-grams.

    Science.gov (United States)

    Islam, S M Ashiqul; Heil, Benjamin J; Kearney, Christopher Michel; Baker, Erich J

    2018-05-01

    Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce a supervised protein classification method with a novel means of automating the work-intensive feature generation step via a Natural Language Processing (NLP)-dependent model, using a modified combination of n-grams and skip-grams (m-NGSG). A meta-comparison of cross-validation accuracy with twelve training datasets from nine different published studies demonstrates a consistent increase in accuracy of m-NGSG when compared to contemporary classification and feature generation models. We expect this model to accelerate the classification of proteins from primary sequence data and increase the accessibility of protein characteristic prediction to a broader range of scientists. m-NGSG is freely available at Bitbucket: https://bitbucket.org/sm_islam/mngsg/src. A web server is available at watson.ecs.baylor.edu/ngsg. erich_baker@baylor.edu. Supplementary data are available at Bioinformatics online.

  15. Supervised learning of tools for content-based search of image databases

    Science.gov (United States)

    Delanoy, Richard L.

    1996-03-01

    A computer environment, called the Toolkit for Image Mining (TIM), is being developed with the goal of enabling users with diverse interests and varied computer skills to create search tools for content-based image retrieval and other pattern matching tasks. Search tools are generated using a simple paradigm of supervised learning that is based on the user pointing at mistakes of classification made by the current search tool. As mistakes are identified, a learning algorithm uses the identified mistakes to build up a model of the user's intentions, construct a new search tool, apply the search tool to a test image, display the match results as feedback to the user, and accept new inputs from the user. Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge- based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that base searches on the characteristics of a single input example or on a predefined and semantically- constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to the learning of shapes, as well. Possible applications of TIM include not only content-based image retrieval, but also quantitative image analysis, the generation of metadata for annotating images, data prioritization or data reduction in bandwidth-limited situations, and the construction of components for larger, more complex computer vision algorithms.

  16. Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation.

    Science.gov (United States)

    Xu, Zhe; Huang, Shaoli; Zhang, Ya; Tao, Dacheng

    2018-05-01

    Learning visual representations from web data has recently attracted attention for object recognition. Previous studies have mainly focused on overcoming label noise and data bias and have shown promising results by learning directly from web data. However, we argue that it might be better to transfer knowledge from existing human labeling resources to improve performance at nearly no additional cost. In this paper, we propose a new semi-supervised method for learning via web data. Our method has the unique design of exploiting strong supervision, i.e., in addition to standard image-level labels, our method also utilizes detailed annotations including object bounding boxes and part landmarks. By transferring as much knowledge as possible from existing strongly supervised datasets to weakly supervised web images, our method can benefit from sophisticated object recognition algorithms and overcome several typical problems found in webly-supervised learning. We consider the problem of fine-grained visual categorization, in which existing training resources are scarce, as our main research objective. Comprehensive experimentation and extensive analysis demonstrate encouraging performance of the proposed approach, which, at the same time, delivers a new pipeline for fine-grained visual categorization that is likely to be highly effective for real-world applications.

  17. LDA boost classification: boosting by topics

    Science.gov (United States)

    Lei, La; Qiao, Guo; Qimin, Cao; Qitao, Li

    2012-12-01

    AdaBoost is an efficacious classification algorithm especially in text categorization (TC) tasks. The methodology of setting up a classifier committee and voting on the documents for classification can achieve high categorization precision. However, traditional Vector Space Model can easily lead to the curse of dimensionality and feature sparsity problems; so it affects classification performance seriously. This article proposed a novel classification algorithm called LDABoost based on boosting ideology which uses Latent Dirichlet Allocation (LDA) to modeling the feature space. Instead of using words or phrase, LDABoost use latent topics as the features. In this way, the feature dimension is significantly reduced. Improved Naïve Bayes (NB) is designed as the weaker classifier which keeps the efficiency advantage of classic NB algorithm and has higher precision. Moreover, a two-stage iterative weighted method called Cute Integration in this article is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more rational way. Mutual Information is used as metrics of weights allocation. The voting information and the categorization decision made by basis classifiers are fully utilized for generating the strong classifier. Experimental results reveals LDABoost making categorization in a low-dimensional space, it has higher accuracy than traditional AdaBoost algorithms and many other classic classification algorithms. Moreover, its runtime consumption is lower than different versions of AdaBoost, TC algorithms based on support vector machine and Neural Networks.

  18. Algorithms and data structures for automated change detection and classification of sidescan sonar imagery

    Science.gov (United States)

    Gendron, Marlin Lee

    During Mine Warfare (MIW) operations, MIW analysts perform change detection by visually comparing historical sidescan sonar imagery (SSI) collected by a sidescan sonar with recently collected SSI in an attempt to identify objects (which might be explosive mines) placed at sea since the last time the area was surveyed. This dissertation presents a data structure and three algorithms, developed by the author, that are part of an automated change detection and classification (ACDC) system. MIW analysts at the Naval Oceanographic Office, to reduce the amount of time to perform change detection, are currently using ACDC. The dissertation introductory chapter gives background information on change detection, ACDC, and describes how SSI is produced from raw sonar data. Chapter 2 presents the author's Geospatial Bitmap (GB) data structure, which is capable of storing information geographically and is utilized by the three algorithms. This chapter shows that a GB data structure used in a polygon-smoothing algorithm ran between 1.3--48.4x faster than a sparse matrix data structure. Chapter 3 describes the GB clustering algorithm, which is the author's repeatable, order-independent method for clustering. Results from tests performed in this chapter show that the time to cluster a set of points is not affected by the distribution or the order of the points. In Chapter 4, the author presents his real-time computer-aided detection (CAD) algorithm that automatically detects mine-like objects on the seafloor in SSI. The author ran his GB-based CAD algorithm on real SSI data, and results of these tests indicate that his real-time CAD algorithm performs comparably to or better than other non-real-time CAD algorithms. The author presents his computer-aided search (CAS) algorithm in Chapter 5. CAS helps MIW analysts locate mine-like features that are geospatially close to previously detected features. A comparison between the CAS and a great circle distance algorithm shows that the

  19. Automatic Hierarchical Color Image Classification

    Directory of Open Access Journals (Sweden)

    Jing Huang

    2003-02-01

    Full Text Available Organizing images into semantic categories can be extremely useful for content-based image retrieval and image annotation. Grouping images into semantic classes is a difficult problem, however. Image classification attempts to solve this hard problem by using low-level image features. In this paper, we propose a method for hierarchical classification of images via supervised learning. This scheme relies on using a good low-level feature and subsequently performing feature-space reconfiguration using singular value decomposition to reduce noise and dimensionality. We use the training data to obtain a hierarchical classification tree that can be used to categorize new images. Our experimental results suggest that this scheme not only performs better than standard nearest-neighbor techniques, but also has both storage and computational advantages.

  20. Lossless Compression of Classification-Map Data

    Science.gov (United States)

    Hua, Xie; Klimesh, Matthew

    2009-01-01

    A lossless image-data-compression algorithm intended specifically for application to classification-map data is based on prediction, context modeling, and entropy coding. The algorithm was formulated, in consideration of the differences between classification maps and ordinary images of natural scenes, so as to be capable of compressing classification- map data more effectively than do general-purpose image-data-compression algorithms. Classification maps are typically generated from remote-sensing images acquired by instruments aboard aircraft (see figure) and spacecraft. A classification map is a synthetic image that summarizes information derived from one or more original remote-sensing image(s) of a scene. The value assigned to each pixel in such a map is the index of a class that represents some type of content deduced from the original image data for example, a type of vegetation, a mineral, or a body of water at the corresponding location in the scene. When classification maps are generated onboard the aircraft or spacecraft, it is desirable to compress the classification-map data in order to reduce the volume of data that must be transmitted to a ground station.