WorldWideScience

Sample records for classification algorithm aimed

  1. Development and Validation of a Spike Detection and Classification Algorithm Aimed at Implementation on Hardware Devices

    Directory of Open Access Journals (Sweden)

    E. Biffi

    2010-01-01

    Full Text Available Neurons cultured in vitro on MicroElectrode Array (MEA devices connect to each other, forming a network. To study electrophysiological activity and long term plasticity effects, long period recording and spike sorter methods are needed. Therefore, on-line and real time analysis, optimization of memory use and data transmission rate improvement become necessary. We developed an algorithm for amplitude-threshold spikes detection, whose performances were verified with (a statistical analysis on both simulated and real signal and (b Big O Notation. Moreover, we developed a PCA-hierarchical classifier, evaluated on simulated and real signal. Finally we proposed a spike detection hardware design on FPGA, whose feasibility was verified in terms of CLBs number, memory occupation and temporal requirements; once realized, it will be able to execute on-line detection and real time waveform analysis, reducing data storage problems.

  2. Classification of Global Illumination Algorithms

    OpenAIRE

    Lesev, Hristo

    2010-01-01

    This article describes and classifies various approaches for solving the global illumination problem. The classification aims to show the similarities between different types of algorithms. We introduce the concept of Light Manager, as a central element and mediator between illumination algorithms in a heterogeneous environment of a graphical system. We present results and analysis of the implementation of the described ideas.

  3. Classification algorithms using adaptive partitioning

    KAUST Repository

    Binev, Peter

    2014-12-01

    © 2014 Institute of Mathematical Statistics. Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piecewise constant approximation on the generated partition [IEEE Trans. Inform. Theory 52 (2006) 1335.1353; Mach. Learn. 66 (2007) 209.242], we consider decorated trees, which allow us to derive higher order methods. Convergence rates for these methods are derived in terms the parameter - of margin conditions and a rate s of best approximation of the Bayes set by decorated adaptive partitions. They can also be expressed in terms of the Besov smoothness β of the regression function that governs its approximability by piecewise polynomials on adaptive partition. The execution of the algorithms does not require knowledge of the smoothness or margin conditions. Besov smoothness conditions are weaker than the commonly used Holder conditions, which govern approximation by nonadaptive partitions, and therefore for a given regression function can result in a higher rate of convergence. This in turn mitigates the compatibility conflict between smoothness and margin parameters.

  4. Image Classification through integrated K- Means Algorithm

    Directory of Open Access Journals (Sweden)

    Balasubramanian Subbiah

    2012-03-01

    Full Text Available Image Classification has a significant role in the field of medical diagnosis as well as mining analysis and is even used for cancer diagnosis in the recent years. Clustering analysis is a valuable and useful tool for image classification and object diagnosis. A variety of clustering algorithms are available and still this is a topic of interest in the image processing field. However, these clustering algorithms are confronted with difficulties in meeting the optimum quality requirements, automation and robustness requirements. In this paper, we propose two clustering algorithm combinations with integration of K-Means algorithm that can tackle some of these problems. Comparison study is made between these two novel combination algorithms. The experimental results demonstrate that the proposed algorithms are very effective in producing desired clusters of the given data sets as well as diagnosis. These algorithms are very much useful for image classification as well as extraction of objects.

  5. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    Science.gov (United States)

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity.

  6. Web Classification Using DYN FP Algorithm

    Directory of Open Access Journals (Sweden)

    Bhanu Pratap Singh

    2014-01-01

    Full Text Available Web mining is the application of data mining techniques to extract knowledge from Web. Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that includes Web Search, Classification and Personalization etc. The primary goal of the web site is to provide the relevant information to the users. Web mining technique is used to categorize users and pages by analyzing users behavior, the content of pages and order of URLs accessed. In this paper, proposes an auto-classification algorithm of web pages using data mining techniques. The problem of discovering association rules between terms in a set of web pages belonging to a category in a search engine database, and present an auto – classification algorithm for solving this problem that are fundamentally based on FP-growth algorithm

  7. Distribution Bottlenecks in Classification Algorithms

    NARCIS (Netherlands)

    Zwartjes, G.J.; Havinga, P.J.M.; Smit, G.J.M.; Hurink, J.L.

    2012-01-01

    The abundance of data available on Wireless Sensor Networks makes online processing necessary. In industrial applications for example, the correct operation of equipment can be the point of interest while raw sampled data is of minor importance. Classification algorithms can be used to make state cla

  8. Protein fold classification with genetic algorithms and feature selection.

    Science.gov (United States)

    Chen, Peng; Liu, Chunmei; Burge, Legand; Mahmood, Mohammad; Southerland, William; Gloster, Clay

    2009-10-01

    Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors.

  9. EVALUATION OF REGISTRATION, COMPRESSION AND CLASSIFICATION ALGORITHMS

    Science.gov (United States)

    Jayroe, R. R.

    1994-01-01

    Several types of algorithms are generally used to process digital imagery such as Landsat data. The most commonly used algorithms perform the task of registration, compression, and classification. Because there are different techniques available for performing registration, compression, and classification, imagery data users need a rationale for selecting a particular approach to meet their particular needs. This collection of registration, compression, and classification algorithms was developed so that different approaches could be evaluated and the best approach for a particular application determined. Routines are included for six registration algorithms, six compression algorithms, and two classification algorithms. The package also includes routines for evaluating the effects of processing on the image data. This collection of routines should be useful to anyone using or developing image processing software. Registration of image data involves the geometrical alteration of the imagery. Registration routines available in the evaluation package include image magnification, mapping functions, partitioning, map overlay, and data interpolation. The compression of image data involves reducing the volume of data needed for a given image. Compression routines available in the package include adaptive differential pulse code modulation, two-dimensional transforms, clustering, vector reduction, and picture segmentation. Classification of image data involves analyzing the uncompressed or compressed image data to produce inventories and maps of areas of similar spectral properties within a scene. The classification routines available include a sequential linear technique and a maximum likelihood technique. The choice of the appropriate evaluation criteria is quite important in evaluating the image processing functions. The user is therefore given a choice of evaluation criteria with which to investigate the available image processing functions. All of the available

  10. An Experimental Comparative Study on Three Classification Algorithms

    Institute of Scientific and Technical Information of China (English)

    蔡巍; 王永成; 李伟; 尹中航

    2003-01-01

    Classification algorithm is one of the key techniques to affect text automatic classification system's performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm's performance presented by the groups is superior to the other two algorithms.

  11. Automatic modulation classification principles, algorithms and applications

    CERN Document Server

    Zhu, Zhechen

    2014-01-01

    Automatic Modulation Classification (AMC) has been a key technology in many military, security, and civilian telecommunication applications for decades. In military and security applications, modulation often serves as another level of encryption; in modern civilian applications, multiple modulation types can be employed by a signal transmitter to control the data rate and link reliability. This book offers comprehensive documentation of AMC models, algorithms and implementations for successful modulation recognition. It provides an invaluable theoretical and numerical comparison of AMC algo

  12. Gaussian maximum likelihood and contextual classification algorithms for multicrop classification

    Science.gov (United States)

    Di Zenzo, Silvano; Bernstein, Ralph; Kolsky, Harwood G.; Degloria, Stephen D.

    1987-01-01

    The paper reviews some of the ways in which context has been handled in the remote-sensing literature, and additional possibilities are introduced. The problem of computing exhaustive and normalized class-membership probabilities from the likelihoods provided by the Gaussian maximum likelihood classifier (to be used as initial probability estimates to start relaxation) is discussed. An efficient implementation of probabilistic relaxation is proposed, suiting the needs of actual remote-sensing applications. A modified fuzzy-relaxation algorithm using generalized operations between fuzzy sets is presented. Combined use of the two relaxation algorithms is proposed to exploit context in multispectral classification of remotely sensed data. Results on both one artificially created image and one MSS data set are reported.

  13. Fast deterministic algorithm for EEE components classification

    Science.gov (United States)

    Kazakovtsev, L. A.; Antamoshkin, A. N.; Masich, I. S.

    2015-10-01

    Authors consider the problem of automatic classification of the electronic, electrical and electromechanical (EEE) components based on results of the test control. Electronic components of the same type used in a high- quality unit must be produced as a single production batch from a single batch of the raw materials. Data of the test control are used for splitting a shipped lot of the components into several classes representing the production batches. Methods such as k-means++ clustering or evolutionary algorithms combine local search and random search heuristics. The proposed fast algorithm returns a unique result for each data set. The result is comparatively precise. If the data processing is performed by the customer of the EEE components, this feature of the algorithm allows easy checking of the results by a producer or supplier.

  14. A new classification algorithm based on RGH-tree search

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    In this paper, we put forward a new classification algorithm based on RGH-Tree search and perform the classification analysis and comparison study. This algorithm can save computing resource and increase the classification efficiency. The experiment shows that this algorithm can get better effect in dealing with three dimensional multi-kind data. We find that the algorithm has better generalization ability for small training set and big testing result.

  15. Structure-Based Algorithms for Microvessel Classification

    KAUST Repository

    Smith, Amy F.

    2015-02-01

    © 2014 The Authors. Microcirculation published by John Wiley & Sons Ltd. Objective: Recent developments in high-resolution imaging techniques have enabled digital reconstruction of three-dimensional sections of microvascular networks down to the capillary scale. To better interpret these large data sets, our goal is to distinguish branching trees of arterioles and venules from capillaries. Methods: Two novel algorithms are presented for classifying vessels in microvascular anatomical data sets without requiring flow information. The algorithms are compared with a classification based on observed flow directions (considered the gold standard), and with an existing resistance-based method that relies only on structural data. Results: The first algorithm, developed for networks with one arteriolar and one venular tree, performs well in identifying arterioles and venules and is robust to parameter changes, but incorrectly labels a significant number of capillaries as arterioles or venules. The second algorithm, developed for networks with multiple inlets and outlets, correctly identifies more arterioles and venules, but is more sensitive to parameter changes. Conclusions: The algorithms presented here can be used to classify microvessels in large microvascular data sets lacking flow information. This provides a basis for analyzing the distinct geometrical properties and modelling the functional behavior of arterioles, capillaries, and venules.

  16. Machine Learning Algorithms in Web Page Classification

    Directory of Open Access Journals (Sweden)

    W.A.AWAD

    2012-11-01

    Full Text Available In this paper we use machine learning algorithms like SVM, KNN and GIS to perform a behaviorcomparison on the web pages classifications problem, from the experiment we see in the SVM with smallnumber of negative documents to build the centroids has the smallest storage requirement and the least online test computation cost. But almost all GIS with different number of nearest neighbors have an evenhigher storage requirement and on line test computation cost than KNN. This suggests that some futurework should be done to try to reduce the storage requirement and on list test cost of GIS.

  17. Genomic-enabled prediction with classification algorithms.

    Science.gov (United States)

    Ornella, L; Pérez, P; Tapia, E; González-Camacho, J M; Burgueño, J; Zhang, X; Singh, S; Vicente, F S; Bonnett, D; Dreisigacker, S; Singh, R; Long, N; Crossa, J

    2014-06-01

    Pearson's correlation coefficient (ρ) is the most commonly reported metric of the success of prediction in genomic selection (GS). However, in real breeding ρ may not be very useful for assessing the quality of the regression in the tails of the distribution, where individuals are chosen for selection. This research used 14 maize and 16 wheat data sets with different trait-environment combinations. Six different models were evaluated by means of a cross-validation scheme (50 random partitions each, with 90% of the individuals in the training set and 10% in the testing set). The predictive accuracy of these algorithms for selecting individuals belonging to the best α=10, 15, 20, 25, 30, 35, 40% of the distribution was estimated using Cohen's kappa coefficient (κ) and an ad hoc measure, which we call relative efficiency (RE), which indicates the expected genetic gain due to selection when individuals are selected based on GS exclusively. We put special emphasis on the analysis for α=15%, because it is a percentile commonly used in plant breeding programmes (for example, at CIMMYT). We also used ρ as a criterion for overall success. The algorithms used were: Bayesian LASSO (BL), Ridge Regression (RR), Reproducing Kernel Hilbert Spaces (RHKS), Random Forest Regression (RFR), and Support Vector Regression (SVR) with linear (lin) and Gaussian kernels (rbf). The performance of regression methods for selecting the best individuals was compared with that of three supervised classification algorithms: Random Forest Classification (RFC) and Support Vector Classification (SVC) with linear (lin) and Gaussian (rbf) kernels. Classification methods were evaluated using the same cross-validation scheme but with the response vector of the original training sets dichotomised using a given threshold. For α=15%, SVC-lin presented the highest κ coefficients in 13 of the 14 maize data sets, with best values ranging from 0.131 to 0.722 (statistically significant in 9 data sets

  18. Contextual classification of multispectral image data: Approximate algorithm

    Science.gov (United States)

    Tilton, J. C. (Principal Investigator)

    1980-01-01

    An approximation to a classification algorithm incorporating spatial context information in a general, statistical manner is presented which is computationally less intensive. Classifications that are nearly as accurate are produced.

  19. Greylevel Difference Classification Algorithm inFractal Image Compression

    Institute of Scientific and Technical Information of China (English)

    陈毅松; 卢坚; 孙正兴; 张福炎

    2002-01-01

    This paper proposes the notion of a greylevel difference classification algorithm in fractal image compression. Then an example of the greylevel difference classification algo rithm is given as an improvement of the quadrant greylevel and variance classification in the quadtree-based encoding algorithm. The algorithm incorporates the frequency feature in spatial analysis using the notion of average quadrant greylevel difference, leading to an enhancement in terms of encoding time, PSNR value and compression ratio.

  20. Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

    Directory of Open Access Journals (Sweden)

    Wenbo Wang

    2012-09-01

    Full Text Available As an important task of data mining, Classification has been received considerable attention in many applications, such as information retrieval, web searching, etc. The enlarging volumes of information emerging by the progress of technology and the growing individual needs of data mining, makes classifying of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel classification algorithms. This paper introduces the classification algorithms and cloud computing briefly, based on it analyses the bad points of the present parallel classification algorithms, then addresses a new model of parallel classifying algorithms. And it mainly introduces a parallel Naïve Bayes classification algorithm based on MapReduce, which is a simple yet powerful parallel programming technique. The experimental results demonstrate that the proposed algorithm improves the original algorithm performance, and it can process large datasets efficiently on commodity hardware.

  1. Multiscale modeling for classification of SAR imagery using hybrid EM algorithm and genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    Xianbin Wen; Hua Zhang; Jianguang Zhang; Xu Jiao; Lei Wang

    2009-01-01

    A novel method that hybridizes genetic algorithm (GA) and expectation maximization (EM) algorithm for the classification of syn-thetic aperture radar (SAR) imagery is proposed by the finite Gaussian mixtures model (GMM) and multiscale autoregressive (MAR)model. This algorithm is capable of improving the global optimality and consistency of the classification performance. The experiments on the SAR images show that the proposed algorithm outperforms the standard EM method significantly in classification accuracy.

  2. Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations

    Science.gov (United States)

    Ribeiro, Sidarta; Pereira, Danillo R.; Papa, João P.; de Albuquerque, Victor Hugo C.

    2016-01-01

    Automatic classification of vocalization type could potentially become a useful tool for acoustic the monitoring of captive colonies of highly vocal primates. However, for classification to be useful in practice, a reliable algorithm that can be successfully trained on small datasets is necessary. In this work, we consider seven different classification algorithms with the goal of finding a robust classifier that can be successfully trained on small datasets. We found good classification performance (accuracy > 0.83 and F1-score > 0.84) using the Optimum Path Forest classifier. Dataset and algorithms are made publicly available. PMID:27654941

  3. Support vector classification algorithm based on variable parameter linear programming

    Institute of Scientific and Technical Information of China (English)

    Xiao Jianhua; Lin Jian

    2007-01-01

    To solve the problems of SVM in dealing with large sample size and asymmetric distributed samples, a support vector classification algorithm based on variable parameter linear programming is proposed.In the proposed algorithm, linear programming is employed to solve the optimization problem of classification to decrease the computation time and to reduce its complexity when compared with the original model.The adjusted punishment parameter greatly reduced the classification error resulting from asymmetric distributed samples and the detailed procedure of the proposed algorithm is given.An experiment is conducted to verify whether the proposed algorithm is suitable for asymmetric distributed samples.

  4. Improved RMR Rock Mass Classification Using Artificial Intelligence Algorithms

    Science.gov (United States)

    Gholami, Raoof; Rasouli, Vamegh; Alimoradi, Andisheh

    2013-09-01

    Rock mass classification systems such as rock mass rating (RMR) are very reliable means to provide information about the quality of rocks surrounding a structure as well as to propose suitable support systems for unstable regions. Many correlations have been proposed to relate measured quantities such as wave velocity to rock mass classification systems to limit the associated time and cost of conducting the sampling and mechanical tests conventionally used to calculate RMR values. However, these empirical correlations have been found to be unreliable, as they usually overestimate or underestimate the RMR value. The aim of this paper is to compare the results of RMR classification obtained from the use of empirical correlations versus machine-learning methodologies based on artificial intelligence algorithms. The proposed methods were verified based on two case studies located in northern Iran. Relevance vector regression (RVR) and support vector regression (SVR), as two robust machine-learning methodologies, were used to predict the RMR for tunnel host rocks. RMR values already obtained by sampling and site investigation at one tunnel were taken into account as the output of the artificial networks during training and testing phases. The results reveal that use of empirical correlations overestimates the predicted RMR values. RVR and SVR, however, showed more reliable results, and are therefore suggested for use in RMR classification for design purposes of rock structures.

  5. Hybrid model based on Genetic Algorithms and SVM applied to variable selection within fruit juice classification.

    Science.gov (United States)

    Fernandez-Lozano, C; Canto, C; Gestal, M; Andrade-Garda, J M; Rabuñal, J R; Dorado, J; Pazos, A

    2013-01-01

    Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected.

  6. Text Classification Retrieval Based on Complex Network and ICA Algorithm

    Directory of Open Access Journals (Sweden)

    Hongxia Li

    2013-08-01

    Full Text Available With the development of computer science and information technology, the library is developing toward information and network. The library digital process converts the book into digital information. The high-quality preservation and management are achieved by computer technology as well as text classification techniques. It realizes knowledge appreciation. This paper introduces complex network theory in the text classification process and put forwards the ICA semantic clustering algorithm. It realizes the independent component analysis of complex network text classification. Through the ICA clustering algorithm of independent component, it realizes character words clustering extraction of text classification. The visualization of text retrieval is improved. Finally, we make a comparative analysis of collocation algorithm and ICA clustering algorithm through text classification and keyword search experiment. The paper gives the clustering degree of algorithm and accuracy figure. Through simulation analysis, we find that ICA clustering algorithm increases by 1.2% comparing with text classification clustering degree. Accuracy can be improved by 11.1% at most. It improves the efficiency and accuracy of text classification retrieval. It also provides a theoretical reference for text retrieval classification of eBook

  7. Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis

    Directory of Open Access Journals (Sweden)

    S. Muthurajkumar

    2014-05-01

    Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.

  8. A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms.

    Science.gov (United States)

    Şen, Baha; Peker, Musa; Çavuşoğlu, Abdullah; Çelebi, Fatih V

    2014-03-01

    Sleep scoring is one of the most important diagnostic methods in psychiatry and neurology. Sleep staging is a time consuming and difficult task undertaken by sleep experts. This study aims to identify a method which would classify sleep stages automatically and with a high degree of accuracy and, in this manner, will assist sleep experts. This study consists of three stages: feature extraction, feature selection from EEG signals, and classification of these signals. In the feature extraction stage, it is used 20 attribute algorithms in four categories. 41 feature parameters were obtained from these algorithms. Feature selection is important in the elimination of irrelevant and redundant features and in this manner prediction accuracy is improved and computational overhead in classification is reduced. Effective feature selection algorithms such as minimum redundancy maximum relevance (mRMR); fast correlation based feature selection (FCBF); ReliefF; t-test; and Fisher score algorithms are preferred at the feature selection stage in selecting a set of features which best represent EEG signals. The features obtained are used as input parameters for the classification algorithms. At the classification stage, five different classification algorithms (random forest (RF); feed-forward neural network (FFNN); decision tree (DT); support vector machine (SVM); and radial basis function neural network (RBF)) classify the problem. The results, obtained from different classification algorithms, are provided so that a comparison can be made between computation times and accuracy rates. Finally, it is obtained 97.03 % classification accuracy using the proposed method. The results show that the proposed method indicate the ability to design a new intelligent assistance sleep scoring system.

  9. Weighted K-Nearest Neighbor Classification Algorithm Based on Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Xuesong Yan

    2013-10-01

    Full Text Available K-Nearest Neighbor (KNN is one of the most popular algorithms for data classification. Many researchers have found that the KNN algorithm accomplishes very good performance in their experiments on different datasets. The traditional KNN text classification algorithm has limitations: calculation complexity, the performance is solely dependent on the training set, and so on. To overcome these limitations, an improved version of KNN is proposed in this paper, we use genetic algorithm combined with weighted KNN to improve its classification performance. and the experiment results shown that our proposed algorithm outperforms the KNN with greater accuracy.

  10. Comparative Analysis of Serial Decision Tree Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Matthew Nwokejizie Anyanwu

    2009-09-01

    Full Text Available Classification of data objects based on a predefined knowledge of the objects is a data mining and knowledge management technique used in grouping similar data objects together. It can be defined as supervised learning algorithms as it assigns class labels to data objects based on the relationship between the data items with a pre-defined class label. Classification algorithms have a wide range of applications like churn prediction, fraud detection, artificial intelligence, and credit card rating etc. Also there are many classification algorithms available in literature but decision trees is the most commonly used because of its ease of implementation and easier to understand compared to other classification algorithms. Decision Tree classification algorithm can be implemented in a serial or parallel fashion based on the volume of data, memory space available on the computer resource and scalability of the algorithm. In this paper we will review the serial implementations of the decision tree algorithms, identify those that are commonly used. We will also use experimental analysis based on sample data records (Statlog data sets to evaluate the performance of the commonly used serial decision tree algorithms

  11. Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

    Directory of Open Access Journals (Sweden)

    R. Sathya

    2013-02-01

    Full Text Available This paper presents a comparative account of unsupervised and supervised learning models and their pattern classification evaluations as applied to the higher education scenario. Classification plays a vital role in machine based learning algorithms and in the present study, we found that, though the error back-propagation learning algorithm as provided by supervised learning model is very efficient for a number of non-linear real-time problems, KSOM of unsupervised learning model, offers efficient solution and classification in the present study.

  12. Mass spectrometry cancer data classification using wavelets and genetic algorithm.

    Science.gov (United States)

    Nguyen, Thanh; Nahavandi, Saeid; Creighton, Douglas; Khosravi, Abbas

    2015-12-21

    This paper introduces a hybrid feature extraction method applied to mass spectrometry (MS) data for cancer classification. Haar wavelets are employed to transform MS data into orthogonal wavelet coefficients. The most prominent discriminant wavelets are then selected by genetic algorithm (GA) to form feature sets. The combination of wavelets and GA yields highly distinct feature sets that serve as inputs to classification algorithms. Experimental results show the robustness and significant dominance of the wavelet-GA against competitive methods. The proposed method therefore can be applied to cancer classification models that are useful as real clinical decision support systems for medical practitioners.

  13. An Algorithm for Classification of 3-D Spherical Spatial Points

    Institute of Scientific and Technical Information of China (English)

    ZHU Qing-xin; Mudur SP; LIU Chang; PENG Bo; WU Jia

    2003-01-01

    This paper presents a highly efficient algorithm for classification of 3D points sampled from lots of spheres, using neighboring relations of spatial points to construct a neighbor graph from points cloud. This algorithm can be used in object recognition, computer vision, and CAD model building, etc.

  14. A New Clustering Algorithm for Face Classification

    Directory of Open Access Journals (Sweden)

    Shaker K. Ali

    2016-06-01

    Full Text Available In This paper, we proposed new clustering algorithm depend on other clustering algorithm ideas. The proposed algorithm idea is based on getting distance matrix, then the exclusion of the matrix points which will be clustered by saving the location (row, column of these points and determine the minimum distance of these points which will be belongs the group (class and keep the other points which are not clustering yet. The propose algorithm is applied to image data base of the human face with different environment (direction, angles... etc.. These data are collected from different resource (ORL site and real images collected from random sample of Thi_Qar city population in lraq. Our algorithm has been implemented on three types of distance to calculate the minimum distance between points (Euclidean, Correlation and Minkowski distance .The efficiency ratio of proposed algorithm has varied according to the data base and threshold, the efficiency of our algorithm is exceeded (96%. Matlab (2014 has been used in this work.

  15. Discovering Fuzzy Censored Classification Rules (Fccrs: A Genetic Algorithm Approach

    Directory of Open Access Journals (Sweden)

    Renu Bala

    2012-08-01

    Full Text Available Classification Rules (CRs are often discovered in the form of ‘If-Then’ Production Rules (PRs. PRs, beinghigh level symbolic rules, are comprehensible and easy to implement. However, they are not capable ofdealing with cognitive uncertainties like vagueness and ambiguity imperative to real word decision makingsituations. Fuzzy Classification Rules (FCRs based on fuzzy logic provide a framework for a flexiblehuman like reasoning involving linguistic variables. Moreover, a classification system consisting of simple‘If-Then’ rules is not competent in handling exceptional circumstances. In this paper, we propose aGenetic Algorithm approach to discover Fuzzy Censored Classification Rules (FCCRs. A FCCR is aFuzzy Classification Rule (FCRs augmented with censors. Here, censors are exceptional conditions inwhich the behaviour of a rule gets modified. The proposed algorithm works in two phases. In the firstphase, the Genetic Algorithm discovers Fuzzy Classification Rules. Subsequently, these FuzzyClassification Rules are mutated to produce FCCRs in the second phase. The appropriate encodingscheme, fitness function and genetic operators are designed for the discovery of FCCRs. The proposedapproach for discovering FCCRs is then illustrated on a synthetic dataset.

  16. A Syntactic Classification based Web Page Ranking Algorithm

    CERN Document Server

    Mukhopadhyay, Debajyoti; Kim, Young-Chon

    2011-01-01

    The existing search engines sometimes give unsatisfactory search result for lack of any categorization of search result. If there is some means to know the preference of user about the search result and rank pages according to that preference, the result will be more useful and accurate to the user. In the present paper a web page ranking algorithm is being proposed based on syntactic classification of web pages. Syntactic Classification does not bother about the meaning of the content of a web page. The proposed approach mainly consists of three steps: select some properties of web pages based on user's demand, measure them, and give different weightage to each property during ranking for different types of pages. The existence of syntactic classification is supported by running fuzzy c-means algorithm and neural network classification on a set of web pages. The change in ranking for difference in type of pages but for same query string is also being demonstrated.

  17. Application of CART Algorithm in Blood Donors Classification

    Directory of Open Access Journals (Sweden)

    T. Santhanam

    2010-01-01

    Full Text Available Problem statement: This study used data mining modeling techniques to examine the blood donor classification. The availability of blood in blood banks is a critical and important aspect in a healthcare system. Blood banks (in the developing countries context are typically based on a healthy person voluntarily donating blood and is used for transfusions or made into medications. The ability to identify regular blood donors will enable blood banks and voluntary organizations to plan systematically for organizing blood donation camps in an effective manner. Approach: Identify the blood donation behavior using the classification algorithms of data mining. The analysis had been carried out using a standard blood transfusion dataset and using the CART decision tree algorithm implemented in Weka. Results: Numerical experimental results on the UCI ML blood transfusion data with the enhancements helped to identify donor classification. Conclusion: The CART derived model along with the extended definition for identifying regular voluntary donors provided a good classification accuracy based model.

  18. An Improved Back Propagation Neural Network Algorithm on Classification Problems

    Science.gov (United States)

    Nawi, Nazri Mohd; Ransing, R. S.; Salleh, Mohd Najib Mohd; Ghazali, Rozaida; Hamid, Norhamreeza Abdul

    The back propagation algorithm is one the most popular algorithms to train feed forward neural networks. However, the convergence of this algorithm is slow, it is mainly because of gradient descent algorithm. Previous research demonstrated that in 'feed forward' algorithm, the slope of the activation function is directly influenced by a parameter referred to as 'gain'. This research proposed an algorithm for improving the performance of the back propagation algorithm by introducing the adaptive gain of the activation function. The gain values change adaptively for each node. The influence of the adaptive gain on the learning ability of a neural network is analysed. Multi layer feed forward neural networks have been assessed. Physical interpretation of the relationship between the gain value and the learning rate and weight values is given. The efficiency of the proposed algorithm is compared with conventional Gradient Descent Method and verified by means of simulation on four classification problems. In learning the patterns, the simulations result demonstrate that the proposed method converged faster on Wisconsin breast cancer with an improvement ratio of nearly 2.8, 1.76 on diabetes problem, 65% better on thyroid data sets and 97% faster on IRIS classification problem. The results clearly show that the proposed algorithm significantly improves the learning speed of the conventional back-propagation algorithm.

  19. AN ENHANCEMENT OF ASSOCIATION CLASSIFICATION ALGORITHM FOR IDENTIFYING PHISHING WEBSITES

    Directory of Open Access Journals (Sweden)

    G.Parthasarathy

    2016-08-01

    Full Text Available Phishing is a fraudulent activity that involves attacker creating a model of an existing web page in order to get more important information similar to credit card details, passwords etc., of the users. This paper is an enhancement of the existing association classification algorithm to detect the phishing websites. We can enhance the accuracy to a greater extent by applying the association rules into classification. In addition, we can also obtain some valuable information and rules which cannot be captured by using any other classification approaches. However the rule generation procedure is very time consuming while encountering large data set. The proposed algorithm makes use of Apriori algorithm for identifying frequent itemsets and hence derives a decision tree based on the features of URL.

  20. Benchmarking protein classification algorithms via supervised cross-validation.

    Science.gov (United States)

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and

  1. Evaluation of partial classification algorithms using ROC curves.

    Science.gov (United States)

    Tusch, G

    1995-01-01

    When using computer programs for decision support in clinical routine, an assessment or a comparison of the underlying classification algorithms is essential. In classical (forced) classification, the classification rule always selects exactly one alternative. A number of proven discriminant measures are available here, e.g.sensitivity and error rate. For probabilistic classification, a series of additional measures has been developed [1]. However, for many clinical applications, there are models where an observation is classified into several classes (partial classification), e.g., models from artificial intelligence, decision analysis, or fuzzy set theory. In partial classification, the discriminatory ability (Murphy) can be adjusted a priori to any level, in most practical cases. Here the usual measures do not apply. We investigate the preconditions for assessment and comparison based on medical decision theory. We focus on problems in the medical domain and establish a methodological framework. When using partial classification procedures, a ROC analysis in the classical sense is no longer appropriate. In forced classification for two classes, the problem is to find a cutoff point on the ROC curve; while in partial classification, you have to find two of them. They characterize the elements being classified as coming from both classes. This extends to several classes. We propose measures corresponding to the usual discriminant measures for forced classification (e.g., sensitivity and error rate) and demonstrate the effects using the ROC approach. For this purpose, we extend the existing method for forced classification in a mathematically sound manner. Algorithms for the construction of thresholds can easily be adapted. Two specific measurement models, based on parametric and non-parametric approaches, will be introduced. The basic methodology is suitable for all partial classification problems, whereas the extended ROC analysis assumes a rank order of the

  2. Comparison of Classification Algorithms and Training Sample Sizes in Urban Land Classification with Landsat Thematic Mapper Imagery

    OpenAIRE

    Congcong Li; Jie Wang; Lei Wang; Luanyun Hu; Peng Gong

    2014-01-01

    Although a large number of new image classification algorithms have been developed, they are rarely tested with the same classification task. In this research, with the same Landsat Thematic Mapper (TM) data set and the same classification scheme over Guangzhou City, China, we tested two unsupervised and 13 supervised classification algorithms, including a number of machine learning algorithms that became popular in remote sensing during the past 20 years. Our analysis focused primarily on ...

  3. Assessing the Accuracy of Prediction Algorithms for Classification

    DEFF Research Database (Denmark)

    Baldi, P.; Brunak, Søren; Chauvin, Y.;

    2000-01-01

    We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, ann correlation coefficients, and to information theoretic measures such as relative entropy and mutual...... information. We briefly discuss the advantages and disadvantages of each approach. For classification tasks, we derive new learning algorithms for the design of prediction systems by directly optimising the correlation coefficient. We observe and prove several results relating sensitivity nod specificity...

  4. Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

    OpenAIRE

    Saadia Zahid; Fawad Hussain; Muhammad Rashid; Muhammad Haroon Yousaf; Hafiz Adnan Habib

    2015-01-01

    Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount o...

  5. Backpropagation Learning Algorithms for Email Classification.

    Directory of Open Access Journals (Sweden)

    *David Ndumiyana and Tarirayi Mukabeta

    2016-07-01

    Full Text Available Today email has become one the fastest and most effective form of communication. The popularity of this mode of transmitting goods, information and services has motivated spammers to perfect their technical skills to fool spam filters. This development has worsened the problems faced by Internet users as they have to deal with email congestion, email overload and unprioritised email messages. The result was an exponential increase in the number of email classification management tools for the past few decades. In this paper we propose a new spam classifier using a learning process of multilayer neural network to implement back propagation technique. Our contribution to the body of knowledge is the use of an improved empirical analysis to choose an optimum, novel collection of attributes of a user’s email contents that allows a quick detection of most important words in emails. We also demonstrate the effectiveness of two equal sets of emails training and testing data.

  6. Automatic Mining of Numerical Classification Rules with Parliamentary Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    KIZILOLUK, S.

    2015-11-01

    Full Text Available In recent years, classification rules mining has been one of the most important data mining tasks. In this study, one of the newest social-based metaheuristic methods, Parliamentary Optimization Algorithm (POA, is firstly used for automatically mining of comprehensible and accurate classification rules within datasets which have numerical attributes. Four different numerical datasets have been selected from UCI data warehouse and classification rules of high quality have been obtained. Furthermore, the results obtained from designed POA have been compared with the results obtained from four different popular classification rules mining algorithms used in WEKA. Although POA is very new and no applications in complex data mining problems have been performed, the results seem promising. The used objective function is very flexible and many different objectives can easily be added to. The intervals of the numerical attributes in the rules have been automatically found without any a priori process, as done in other classification rules mining algorithms, which causes the modification of datasets.

  7. Algorithms for classification of astronomical object spectra

    Science.gov (United States)

    Wasiewicz, P.; Szuppe, J.; Hryniewicz, K.

    2015-09-01

    Obtaining interesting celestial objects from tens of thousands or even millions of recorded optical-ultraviolet spectra depends not only on the data quality but also on the accuracy of spectra decomposition. Additionally rapidly growing data volumes demands higher computing power and/or more efficient algorithms implementations. In this paper we speed up the process of substracting iron transitions and fitting Gaussian functions to emission peaks utilising C++ and OpenCL methods together with the NOSQL database. In this paper we implemented typical astronomical methods of detecting peaks in comparison to our previous hybrid methods implemented with CUDA.

  8. Optimal classification of standoff bioaerosol measurements using evolutionary algorithms

    Science.gov (United States)

    Nyhavn, Ragnhild; Moen, Hans J. F.; Farsund, Øystein; Rustad, Gunnar

    2011-05-01

    Early warning systems based on standoff detection of biological aerosols require real-time signal processing of a large quantity of high-dimensional data, challenging the systems efficiency in terms of both computational complexity and classification accuracy. Hence, optimal feature selection is essential in forming a stable and efficient classification system. This involves finding optimal signal processing parameters, characteristic spectral frequencies and other data transformations in large magnitude variable space, stating the need for an efficient and smart search algorithm. Evolutionary algorithms are population-based optimization methods inspired by Darwinian evolutionary theory. These methods focus on application of selection, mutation and recombination on a population of competing solutions and optimize this set by evolving the population of solutions for each generation. We have employed genetic algorithms in the search for optimal feature selection and signal processing parameters for classification of biological agents. The experimental data were achieved with a spectrally resolved lidar based on ultraviolet laser induced fluorescence, and included several releases of 5 common simulants. The genetic algorithm outperform benchmark methods involving analytic, sequential and random methods like support vector machines, Fisher's linear discriminant and principal component analysis, with significantly improved classification accuracy compared to the best classical method.

  9. Benchmarking protein classification algorithms via supervised cross-validation

    NARCIS (Netherlands)

    Kertész-Farkas, A.; Dhir, S.; Sonego, P.; Pacurar, M.; Netoteia, S.; Nijveen, H.; Kuzniar, A.; Leunissen, J.A.M.; Kocsor, A.; Pongor, S.

    2008-01-01

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-o

  10. Evaluation of registration, compression and classification algorithms. Volume 1: Results

    Science.gov (United States)

    Jayroe, R.; Atkinson, R.; Callas, L.; Hodges, J.; Gaggini, B.; Peterson, J.

    1979-01-01

    The registration, compression, and classification algorithms were selected on the basis that such a group would include most of the different and commonly used approaches. The results of the investigation indicate clearcut, cost effective choices for registering, compressing, and classifying multispectral imagery.

  11. Optimization of deep learning algorithms for object classification

    Science.gov (United States)

    Horváth, András.

    2017-02-01

    Deep learning is currently the state of the art algorithm for image classification. The complexity of these feedforward neural networks have overcome a critical point, resulting algorithmic breakthroughs in various fields. On the other hand their complexity makes them executable in tasks, where High-throughput computing powers are available. The optimization of these networks -considering computational complexity and applicability on embedded systems- has not yet been studied and investigated in details. In this paper I show some examples how this algorithms can be optimized and accelerated on embedded systems.

  12. GLAST Burst Monitor Trigger Classification Algorithm

    Science.gov (United States)

    Perrin, D. J.; Sidman, E. D.; Meegan, C. A.; Briggs, M. S.; Connaughton, V.

    2004-01-01

    The Gamma Ray Large Area Space Telescope (GLAST), currently set for launch in the first quarter of 2007, will consist of two instruments, the GLAST Burst Monitor (GBM) and the Large Area Telescope (LAT). One of the goals of the GBM is to identify and locate gamma-ray bursts using on-board software. The GLAST observatory can then be re-oriented to allow observations by the LAT. A Bayesian analysis will be used to distinguish gamma-ray bursts from other triggering events, such as solar flares, magnetospheric particle precipitation, soft gamma repeaters (SGRs), and Cygnus X-1 flaring. The trigger parameters used in the analysis are the burst celestial coordinates, angle from the Earth's horizon, spectral hardness, and the spacecraft geomagnetic latitude. The algorithm will be described and the results of testing will be presented.

  13. An ellipse detection algorithm based on edge classification

    Science.gov (United States)

    Yu, Liu; Chen, Feng; Huang, Jianming; Wei, Xiangquan

    2015-12-01

    In order to enhance the speed and accuracy of ellipse detection, an ellipse detection algorithm based on edge classification is proposed. Too many edge points are removed by making edge into point in serialized form and the distance constraint between the edge points. It achieves effective classification by the criteria of the angle between the edge points. And it makes the probability of randomly selecting the edge points falling on the same ellipse greatly increased. Ellipse fitting accuracy is significantly improved by the optimization of the RED algorithm. It uses Euclidean distance to measure the distance from the edge point to the elliptical boundary. Experimental results show that: it can detect ellipse well in case of edge with interference or edges blocking each other. It has higher detecting precision and less time consuming than the RED algorithm.

  14. QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Ardjan Zwartjes

    2016-10-01

    Full Text Available In this work, we introduce QUEST (QUantile Estimation after Supervised Training, an adaptive classification algorithm for Wireless Sensor Networks (WSNs that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.

  15. Classification of ETM+ Remote Sensing Image Based on Hybrid Algorithm of Genetic Algorithm and Back Propagation Neural Network

    Directory of Open Access Journals (Sweden)

    Haisheng Song

    2013-01-01

    Full Text Available The back propagation neural network (BPNN algorithm can be used as a supervised classification in the processing of remote sensing image classification. But its defects are obvious: falling into the local minimum value easily, slow convergence speed, and being difficult to determine intermediate hidden layer nodes. Genetic algorithm (GA has the advantages of global optimization and being not easy to fall into local minimum value, but it has the disadvantage of poor local searching capability. This paper uses GA to generate the initial structure of BPNN. Then, the stable, efficient, and fast BP classification network is gotten through making fine adjustments on the improved BP algorithm. Finally, we use the hybrid algorithm to execute classification on remote sensing image and compare it with the improved BP algorithm and traditional maximum likelihood classification (MLC algorithm. Results of experiments show that the hybrid algorithm outperforms improved BP algorithm and MLC algorithm.

  16. An algorithm for the arithmetic classification of multilattices

    CERN Document Server

    Indelicato, Giuliana

    2009-01-01

    A procedure for the construction and the classification of multilattices in arbitrary dimension is proposed. The algorithm allows to determine explicitly the location of the points of a multilattice given its space group, and to determine whether two multilattices are arithmetically equivalent. The algorithm is based on ideas from integer matrix theory, in particular the reduction to the Smith normal form. Among the applications of this procedure is a software package that allows the classification of complex crystalline structures and the determination of their space groups. Also, it can be used to determine the symmetry of regular systems of points in high dimension, with applications to the study of quasicrystals and sets of points with noncrystallographic symmetry in low dimension, such as viral capsid structures.

  17. New algorithm of target classification in polarimetric SAR

    Institute of Scientific and Technical Information of China (English)

    Wang Yang; Lu Jiaguo; Wu Xianliang

    2008-01-01

    The different approaches used for target decomposition (TD) theory in radar polarimetry are reviewed and three main types of theorems are introduced: those based on Mueller matrix, those using an eigenvector analysis of the coherency matrix, and those employing coherent decomposition of the scattering matrix. Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated success in many fields. A new algorithm of target classification, by combining target decomposition and the support vector machine, is proposed.To conduct the experiment, the polarimetric synthetic aperture radar (SAR) data are used. Experimental results show that it is feasible and efficient to target classification by applying target decomposition to extract scattering mechanisms, and the effects of kernel function and its parameters on the classification efficiency are significant.

  18. DTL: a language to assist cardiologists in improving classification algorithms.

    Science.gov (United States)

    Kors, J A; Kamp, D M; Henkemans, D P; van Bemmel, J H

    1991-06-01

    Heuristic classifiers, e.g., for diagnostic classification of the electrocardiogram, can be very complex. The development and refinement of such classifiers is cumbersome and time-consuming. Generally, it requires a computer expert to implement the cardiologist's diagnostic reasoning into computer language. The average cardiologist, however, is not able to verify whether his intentions have been properly realized and perform as he hoped for. But also for the initiated, it often remains obscure how a particular result was reached by a complex classification program. An environment is presented which solves these problems. The environment consists of a language, DTL (Decision Tree Language), that allows cardiologists to express their classification algorithms in a way that is familiar to them, and an interpreter and translator for that language. The considerations in the design of DTL are described and the structure and capabilities of the interpreter and translator are discussed.

  19. Low-Level Vision Algorithms for Localization, Classification, and Tracking

    OpenAIRE

    Kevin N. Gabayan

    2003-01-01

    Camera networks can provide images of detected objects that vary in perspective and level of obstruction. To improve the understanding of visual events, vision algorithms are implemented in a wireless sensor network. Methods were developed to fuse data from multiple cameras to improve object identification and location in the presence of obstructions. Training sets of images allow classification of objects into familiar categories. Feature-based object correspondence is used to track multiple...

  20. Protein sequence classification with improved extreme learning machine algorithms.

    Science.gov (United States)

    Cao, Jiuwen; Xiong, Lianglin

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms.

  1. The Optimization of Trained and Untrained Image Classification Algorithms for Use on Large Spatial Datasets

    Science.gov (United States)

    Kocurek, Michael J.

    2005-01-01

    The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.

  2. Feature extraction and classification algorithms for high dimensional data

    Science.gov (United States)

    Lee, Chulhee; Landgrebe, David

    1993-01-01

    Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized

  3. Performance comparison of SLFN training algorithms for DNA microarray classification.

    Science.gov (United States)

    Huynh, Hieu Trung; Kim, Jung-Ja; Won, Yonggwan

    2011-01-01

    The classification of biological samples measured by DNA microarrays has been a major topic of interest in the last decade, and several approaches to this topic have been investigated. However, till now, classifying the high-dimensional data of microarrays still presents a challenge to researchers. In this chapter, we focus on evaluating the performance of the training algorithms of the single hidden layer feedforward neural networks (SLFNs) to classify DNA microarrays. The training algorithms consist of backpropagation (BP), extreme learning machine (ELM) and regularized least squares ELM (RLS-ELM), and an effective algorithm called neural-SVD has recently been proposed. We also compare the performance of the neural network approaches with popular classifiers such as support vector machine (SVM), principle component analysis (PCA) and fisher discriminant analysis (FDA).

  4. Preliminary results from the ASF/GPS ice classification algorithm

    Science.gov (United States)

    Cunningham, G.; Kwok, R.; Holt, B.

    1992-01-01

    The European Space Agency Remote Sensing Satellite (ERS-1) satellite carried a C-band synthetic aperture radar (SAR) to study the earth's polar regions. The radar returns from sea ice can be used to infer properties of ice, including ice type. An algorithm has been developed for the Alaska SAR facility (ASF)/Geophysical Processor System (GPS) to infer ice type from the SAR observations over sea ice and open water. The algorithm utilizes look-up tables containing expected backscatter values from various ice types. An analysis has been made of two overlapping strips with 14 SAR images. The backscatter values of specific ice regions were sampled to study the backscatter characteristics of the ice in time and space. Results show both stability of the backscatter values in time and a good separation of multiyear and first-year ice signals, verifying the approach used in the classification algorithm.

  5. Unsupervised classification of multivariate geostatistical data: Two algorithms

    Science.gov (United States)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  6. Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2013-01-01

    Full Text Available Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP implementation and two pure Register-Transfer Level (RTL implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.

  7. Implementation of several mathematical algorithms to breast tissue density classification

    Science.gov (United States)

    Quintana, C.; Redondo, M.; Tirao, G.

    2014-02-01

    The accuracy of mammographic abnormality detection methods is strongly dependent on breast tissue characteristics, where a dense breast tissue can hide lesions causing cancer to be detected at later stages. In addition, breast tissue density is widely accepted to be an important risk indicator for the development of breast cancer. This paper presents the implementation and the performance of different mathematical algorithms designed to standardize the categorization of mammographic images, according to the American College of Radiology classifications. These mathematical techniques are based on intrinsic properties calculations and on comparison with an ideal homogeneous image (joint entropy, mutual information, normalized cross correlation and index Q) as categorization parameters. The algorithms evaluation was performed on 100 cases of the mammographic data sets provided by the Ministerio de Salud de la Provincia de Córdoba, Argentina—Programa de Prevención del Cáncer de Mama (Department of Public Health, Córdoba, Argentina, Breast Cancer Prevention Program). The obtained breast classifications were compared with the expert medical diagnostics, showing a good performance. The implemented algorithms revealed a high potentiality to classify breasts into tissue density categories.

  8. Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

    Directory of Open Access Journals (Sweden)

    Saadia Zahid

    2015-01-01

    Full Text Available Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs with artificial neural networks (ANNs. Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and lastly, speech segment is classified into silence and pure-speech segments on the basis of rule-based classifier. Minimum data is used for training classifier; ensemble methods are used for minimizing misclassification rate and approximately 98% accurate segments are obtained. A fast and efficient algorithm is designed that can be used with real-time multimedia applications.

  9. Video Analytics Algorithm for Automatic Vehicle Classification (Intelligent Transport System

    Directory of Open Access Journals (Sweden)

    ArtaIftikhar

    2013-04-01

    Full Text Available Automated Vehicle detection and classification is an important component of intelligent transport system. Due to significant importance in various fields such as traffic accidents avoidance, toll collection, congestion avoidance, terrorist activities monitoring, security and surveillance systems, intelligent transport system has become important field of study. Various technologies have been used for detecting and classifying vehicles automatically. Automated vehicle detection is broadly divided into two types- Hardware based and software based detection. Various algorithms have been implemented to classify different vehicles from videos. In this paper an efficient and economical solution for automatic vehicle detection and classification is proposed. The proposed system first isolates the object through background subtraction followed by vehicle detection using ontology. Vehicle detection is based on low level features such as shape, size, and spatial location. Finally system classifies vehicles into one of the known classes of vehicle based on size.

  10. Review of WiMAX Scheduling Algorithms and Their Classification

    Science.gov (United States)

    Yadav, A. L.; Vyavahare, P. D.; Bansod, P. P.

    2014-07-01

    Providing quality of service (QoS) in wireless communication networks has become an important consideration for supporting variety of applications. IEEE 802.16 based WiMAX is the most promising technology for broadband wireless access with best QoS features for tripe play (voice, video and data) service users. Unlike wired networks, QoS support is difficult in wireless networks due to variable and unpredictable nature of wireless channels. In transmission of voice and video main issue involves allocation of available resources among the users to meet QoS criteria such as delay, jitter and throughput requirements to maximize goodput, to minimize power consumption while keeping feasible algorithm flexibility and ensuring system scalability. WiMAX assures guaranteed QoS by including several mechanisms at the MAC layer such as admission control and scheduling. Packet scheduling is a process of resolving contention for bandwidth which determines allocation of bandwidth among users and their transmission order. Various approaches for classification of scheduling algorithms in WiMAX have appeared in literature as homogeneous, hybrid and opportunistic scheduling algorithms. The paper consolidates the parameters and performance metrics that need to be considered in developing a scheduler. The paper surveys recently proposed scheduling algorithms, their shortcomings, assumptions, suitability and improvement issues associated with these uplink scheduling algorithms.

  11. A fast version of the k-means classification algorithm for astronomical applications

    CERN Document Server

    Ordovás-Pascual, I

    2014-01-01

    Context. K-means is a clustering algorithm that has been used to classify large datasets in astronomical databases. It is an unsupervised method, able to cope very different types of problems. Aims. We check whether a variant of the algorithm called single-pass k-means can be used as a fast alternative to the traditional k-means. Methods. The execution time of the two algorithms are compared when classifying subsets drawn from the SDSS-DR7 catalog of galaxy spectra. Results. Single-pass k-means turn out to be between 20 % and 40 % faster than k-means and provide statistically equivalent classifications. This conclusion can be scaled up to other larger databases because the execution time of both algorithms increases linearly with the number of objects. Conclusions. Single-pass k-means can be safely used as a fast alternative to k-means.

  12. A Computational Algorithm for Metrical Classification of Verse

    Directory of Open Access Journals (Sweden)

    Rama N.

    2010-03-01

    Full Text Available The science of versification and analysis of verse in Sanskrit is governed by rules of metre or chandas. Such metre-wise classification of verses has numerous uses for scholars and researchers alike, such as in the study of poets and their style of Sanskrit poetical works. This paper presents a comprehensive computational scheme and set of algorithms to identify the metre of verses given as Sanskrit (Unicode or English E-text (Latin Unicode. The paper also demonstrates the use of euphonic conjunction rules to correct verses in which these conjunctions, which are compulsory in verse, have erroneously not been implemented.

  13. A novel hybrid classification model of genetic algorithms, modified k-Nearest Neighbor and developed backpropagation neural network.

    Science.gov (United States)

    Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy

    2014-01-01

    Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the

  14. Improved Algorithms for the Classification of Rough Rice Using a Bionic Electronic Nose Based on PCA and the Wilks Distribution

    Directory of Open Access Journals (Sweden)

    Sai Xu

    2014-03-01

    Full Text Available Principal Component Analysis (PCA is one of the main methods used for electronic nose pattern recognition. However, poor classification performance is common in classification and recognition when using regular PCA. This paper aims to improve the classification performance of regular PCA based on the existing Wilks ?-statistic (i.e., combined PCA with the Wilks distribution. The improved algorithms, which combine regular PCA with the Wilks ?-statistic, were developed after analysing the functionality and defects of PCA. Verification tests were conducted using a PEN3 electronic nose. The collected samples consisted of the volatiles of six varieties of rough rice (Zhongxiang1, Xiangwan13, Yaopingxiang, WufengyouT025, Pin 36, and Youyou122, grown in same area and season. The first two principal components used as analysis vectors cannot perform the rough rice varieties classification task based on a regular PCA. Using the improved algorithms, which combine the regular PCA with the Wilks ?-statistic, many different principal components were selected as analysis vectors. The set of data points of the Mahalanobis distance between each of the varieties of rough rice was selected to estimate the performance of the classification. The result illustrates that the rough rice varieties classification task is achieved well using the improved algorithm. A Probabilistic Neural Networks (PNN was also established to test the effectiveness of the improved algorithms. The first two principal components (namely PC1 and PC2 and the first and fifth principal component (namely PC1 and PC5 were selected as the inputs of PNN for the classification of the six rough rice varieties. The results indicate that the classification accuracy based on the improved algorithm was improved by 6.67% compared to the results of the regular method. These results prove the effectiveness of using the Wilks ?-statistic to improve the classification accuracy of the regular PCA approach. The

  15. Neighborhood Hypergraph Based Classification Algorithm for Incomplete Information System

    Directory of Open Access Journals (Sweden)

    Feng Hu

    2015-01-01

    Full Text Available The problem of classification in incomplete information system is a hot issue in intelligent information processing. Hypergraph is a new intelligent method for machine learning. However, it is hard to process the incomplete information system by the traditional hypergraph, which is due to two reasons: (1 the hyperedges are generated randomly in traditional hypergraph model; (2 the existing methods are unsuitable to deal with incomplete information system, for the sake of missing values in incomplete information system. In this paper, we propose a novel classification algorithm for incomplete information system based on hypergraph model and rough set theory. Firstly, we initialize the hypergraph. Second, we classify the training set by neighborhood hypergraph. Third, under the guidance of rough set, we replace the poor hyperedges. After that, we can obtain a good classifier. The proposed approach is tested on 15 data sets from UCI machine learning repository. Furthermore, it is compared with some existing methods, such as C4.5, SVM, NavieBayes, and KNN. The experimental results show that the proposed algorithm has better performance via Precision, Recall, AUC, and F-measure.

  16. Multiple signal classification algorithm for super-resolution fluorescence microscopy

    Science.gov (United States)

    Agarwal, Krishna; Macháň, Radek

    2016-12-01

    Single-molecule localization techniques are restricted by long acquisition and computational times, or the need of special fluorophores or biologically toxic photochemical environments. Here we propose a statistical super-resolution technique of wide-field fluorescence microscopy we call the multiple signal classification algorithm which has several advantages. It provides resolution down to at least 50 nm, requires fewer frames and lower excitation power and works even at high fluorophore concentrations. Further, it works with any fluorophore that exhibits blinking on the timescale of the recording. The multiple signal classification algorithm shows comparable or better performance in comparison with single-molecule localization techniques and four contemporary statistical super-resolution methods for experiments of in vitro actin filaments and other independently acquired experimental data sets. We also demonstrate super-resolution at timescales of 245 ms (using 49 frames acquired at 200 frames per second) in samples of live-cell microtubules and live-cell actin filaments imaged without imaging buffers.

  17. CLASSIFICATION ALGORITHMS FOR BIG DATA ANALYSIS, A MAP REDUCE APPROACH

    Directory of Open Access Journals (Sweden)

    V. A. Ayma

    2015-03-01

    Full Text Available Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP, which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA’s machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM. The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.

  18. Classification Algorithms for Big Data Analysis, a Map Reduce Approach

    Science.gov (United States)

    Ayma, V. A.; Ferreira, R. S.; Happ, P.; Oliveira, D.; Feitosa, R.; Costa, G.; Plaza, A.; Gamba, P.

    2015-03-01

    Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.

  19. Classification algorithms for predicting sleepiness and sleep apnea severity.

    Science.gov (United States)

    Eiseman, Nathaniel A; Westover, M Brandon; Mietus, Joseph E; Thomas, Robert J; Bianchi, Matt T

    2012-02-01

    Identifying predictors of subjective sleepiness and severity of sleep apnea are important yet challenging goals in sleep medicine. Classification algorithms may provide insights, especially when large data sets are available. We analyzed polysomnography and clinical features available from the Sleep Heart Health Study. The Epworth Sleepiness Scale and the apnea-hypopnea index were the targets of three classifiers: k-nearest neighbor, naive Bayes and support vector machine algorithms. Classification was based on up to 26 features including demographics, polysomnogram, and electrocardiogram (spectrogram). Naive Bayes was best for predicting abnormal Epworth class (0-10 versus 11-24), although prediction was weak: polysomnogram features had 16.7% sensitivity and 88.8% specificity; spectrogram features had 5.3% sensitivity and 96.5% specificity. The support vector machine performed similarly to naive Bayes for predicting sleep apnea class (0-5 versus >5): 59.0% sensitivity and 74.5% specificity using clinical features and 43.4% sensitivity and 83.5% specificity using spectrographic features compared with the naive Bayes classifier, which had 57.5% sensitivity and 73.7% specificity (clinical), and 39.0% sensitivity and 82.7% specificity (spectrogram). Mutual information analysis confirmed the minimal dependency of the Epworth score on any feature, while the apnea-hypopnea index showed modest dependency on body mass index, arousal index, oxygenation and spectrogram features. Apnea classification was modestly accurate, using either clinical or spectrogram features, and showed lower sensitivity and higher specificity than common sleep apnea screening tools. Thus, clinical prediction of sleep apnea may be feasible with easily obtained demographic and electrocardiographic analysis, but the utility of the Epworth is questioned by its minimal relation to clinical, electrocardiographic, or polysomnographic features.

  20. Land-cover classification with an expert classification algorithm using digital aerial photographs

    Directory of Open Access Journals (Sweden)

    José L. de la Cruz

    2010-05-01

    Full Text Available The purpose of this study was to evaluate the usefulness of the spectral information of digital aerial sensors in determining land-cover classification using new digital techniques. The land covers that have been evaluated are the following, (1 bare soil, (2 cereals, including maize (Zea mays L., oats (Avena sativa L., rye (Secale cereale L., wheat (Triticum aestivum L. and barley (Hordeun vulgare L., (3 high protein crops, such as peas (Pisum sativum L. and beans (Vicia faba L., (4 alfalfa (Medicago sativa L., (5 woodlands and scrublands, including holly oak (Quercus ilex L. and common retama (Retama sphaerocarpa L., (6 urban soil, (7 olive groves (Olea europaea L. and (8 burnt crop stubble. The best result was obtained using an expert classification algorithm, achieving a reliability rate of 95%. This result showed that the images of digital airborne sensors hold considerable promise for the future in the field of digital classifications because these images contain valuable information that takes advantage of the geometric viewpoint. Moreover, new classification techniques reduce problems encountered using high-resolution images; while reliabilities are achieved that are better than those achieved with traditional methods.

  1. A Comparison of RBF Neural Network Training Algorithms for Inertial Sensor Based Terrain Classification

    Directory of Open Access Journals (Sweden)

    Erkan Beşdok

    2009-08-01

    Full Text Available This paper introduces a comparison of training algorithms of radial basis function (RBF neural networks for classification purposes. RBF networks provide effective solutions in many science and engineering fields. They are especially popular in the pattern classification and signal processing areas. Several algorithms have been proposed for training RBF networks. The Artificial Bee Colony (ABC algorithm is a new, very simple and robust population based optimization algorithm that is inspired by the intelligent behavior of honey bee swarms. The training performance of the ABC algorithm is compared with the Genetic algorithm, Kalman filtering algorithm and gradient descent algorithm. In the experiments, not only well known classification problems from the UCI repository such as the Iris, Wine and Glass datasets have been used, but also an experimental setup is designed and inertial sensor based terrain classification for autonomous ground vehicles was also achieved. Experimental results show that the use of the ABC algorithm results in better learning than those of others.

  2. FPGA Implementation of Generalized Hebbian Algorithm for Texture Classification

    Directory of Open Access Journals (Sweden)

    Wei-Hao Lee

    2012-05-01

    Full Text Available This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA. It is embedded in a System-On-Programmable-Chip (SOPC platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance andlow area costs.

  3. An Evolutionary Algorithm for Enhanced Magnetic Resonance Imaging Classification

    Directory of Open Access Journals (Sweden)

    T.S. Murunya

    2014-11-01

    Full Text Available This study presents an image classification method for retrieval of images from a multi-varied MRI database. With the development of sophisticated medical imaging technology which helps doctors in diagnosis, medical image databases contain a huge amount of digital images. Magnetic Resonance Imaging (MRI is a widely used imaging technique which picks signals from a body's magnetic particles spinning to magnetic tune and through a computer converts scanned data into pictures of internal organs. Image processing techniques are required to analyze medical images and retrieve it from database. The proposed framework extracts features using Moment Invariants (MI and Wavelet Packet Tree (WPT. Extracted features are reduced using Correlation based Feature Selection (CFS and a CFS with cuckoo search algorithm is proposed. Naïve Bayes and K-Nearest Neighbor (KNN classify the selected features. National Biomedical Imaging Archive (NBIA dataset including colon, brain and chest is used to evaluate the framework.

  4. FPGA implementation of Generalized Hebbian Algorithm for texture classification.

    Science.gov (United States)

    Lin, Shiow-Jyu; Hwang, Wen-Jyi; Lee, Wei-Hao

    2012-01-01

    This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA) because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip (SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance and low area costs.

  5. Predicting disease risk using bootstrap ranking and classification algorithms.

    Science.gov (United States)

    Manor, Ohad; Segal, Eran

    2013-01-01

    Genome-wide association studies (GWAS) are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a "black box" in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction typically rank single nucleotide polymorphisms (SNPs) by the p-value of their association with the disease, and use the top-associated SNPs as input to a classification algorithm. However, the predictive power of such methods is relatively poor. To improve the predictive power, we devised BootRank, which uses bootstrapping in order to obtain a robust prioritization of SNPs for use in predictive models. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC) data and results in a more robust set of SNPs and a larger number of enriched pathways being associated with the different diseases. Finally, we show that combining BootRank with seven different classification algorithms improves performance compared to previous studies that used the WTCCC data. Notably, diseases for which BootRank results in the largest improvements were recently shown to have more heritability than previously thought, likely due to contributions from variants with low minimum allele frequency (MAF), suggesting that BootRank can be beneficial in cases where SNPs affecting the disease are poorly tagged or have low MAF. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.

  6. CLASSIFICATION OF DEFECTS IN SOFTWARE USING DECISION TREE ALGORITHM

    Directory of Open Access Journals (Sweden)

    M. SURENDRA NAIDU

    2013-06-01

    Full Text Available Software defects due to coding errors continue to plague the industry with disastrous impact, especially in the enterprise application software category. Identifying how much of these defects are specifically due to coding errors is a challenging problem. Defect prevention is the most vivid but usually neglected aspect of softwarequality assurance in any project. If functional at all stages of software development, it can condense the time, overheads and wherewithal entailed to engineer a high quality product. In order to reduce the time and cost, we will focus on finding the total number of defects if the test case shows that the software process not executing properly. That has occurred in the software development process. The proposed system classifying various defects using decision tree based defect classification technique, which is used to group the defects after identification. The classification can be done by employing algorithms such as ID3 or C4.5 etc. After theclassification the defect patterns will be measured by employing pattern mining technique. Finally the quality will be assured by using various quality metrics such as defect density, etc. The proposed system will be implemented in JAVA.

  7. Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease.

    Science.gov (United States)

    Tsanas, Athanasios; Little, Max A; McSharry, Patrick E; Spielman, Jennifer; Ramig, Lorraine O

    2012-05-01

    There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.

  8. Multi-classification algorithm and its realization based on least square support vector machine algorithm

    Institute of Scientific and Technical Information of China (English)

    Fan Youping; Chen Yunping; Sun Wansheng; Li Yu

    2005-01-01

    As a new type of learning machine developed on the basis of statistics learning theory, support vector machine (SVM) plays an important role in knowledge discovering and knowledge updating by constructing non-linear optimal classifier. However, realizing SVM requires resolving quadratic programming under constraints of inequality, which results in calculation difficulty while learning samples gets larger. Besides, standard SVM is incapable of tackling multi-classification. To overcome the bottleneck of populating SVM, with training algorithm presented, the problem of quadratic programming is converted into that of resolving a linear system of equations composed of a group of equation constraints by adopting the least square SVM(LS-SVM) and introducing a modifying variable which can change inequality constraints into equation constraints, which simplifies the calculation. With regard to multi-classification, an LS-SVM applicable in multi-classification is deduced. Finally, efficiency of the algorithm is checked by using universal Circle in square and two-spirals to measure the performance of the classifier.

  9. A Novel Training Algorithm of Genetic Neural Networks and Its Application to Classification

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    First of all, this paper discusses the drawbacks of multilayer perceptron (MLP), which is trained by the traditional back propagation (BP) algorithm and used in a special classification problem. A new training algorithm for neural networks based on genetic algorithm and BP algorithm is developed. The difference between the new training algorithm and BP algorithm in the ability of nonlinear approaching is expressed through an example, and the application foreground is illustrated by an example.

  10. Contributions to "k"-Means Clustering and Regression via Classification Algorithms

    Science.gov (United States)

    Salman, Raied

    2012-01-01

    The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…

  11. Study and Implementation of Web Mining Classification Algorithm Based on Building Tree of Detection Class Threshold

    Institute of Scientific and Technical Information of China (English)

    CHEN Jun-jie; SONG Han-tao; LU Yu-chang

    2005-01-01

    A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4. 5 algorithm, the disadvantage of excessive adaptation in C4. 5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.

  12. A New Approach Using Data Envelopment Analysis for Ranking Classification Algorithms

    Directory of Open Access Journals (Sweden)

    A. Bazleh

    2011-01-01

    Full Text Available Problem statement: A variety of methods and algorithms for classification problems have been developed recently. But the main question is that how to select an appropriate and effective classification algorithm. This has always been an important and difficult issue. Approach: Since the classification algorithm selection task needs to examine more than one criterion such as accuracy and computational time, it can be modeled and also ranked by Data Envelopment Analysis (DEA technique. Results: In this study, 44 standard databases were modeled by 7 famous classification algorithms and we have examined them by accreditation method. Conclusion/Recommendation: The results indicate that Data Envelopment Analysis (DEA is an appropriate tool for evaluating classification algorithms.

  13. Integrating genetic algorithm method with neural network for land use classification using SZ-3 CMODIS data

    Institute of Scientific and Technical Information of China (English)

    WANG Changyao; LUO Chengfeng; LIU Zhengjun

    2005-01-01

    This paper presents a methodology on land use mapping using CMODIS (Chinese Moderate Resolution Imaging Spectroradiometer ) data on-board SZ-3 (Shenzhou 3) spacecraft. The integrated method is composed of genetic algorithm (GA) for feature extraction and neural network classifier for land use classification. In the data preprocessing, a moment matching method was adopted to reuse classification was obtained. To generate a land use map, the three layers back propagation neural network classifier is used for training the samples and classification. Compared with the Maximum Likelihood classification algorithm, the results show that the accuracy of land use classification is obviously improved by using our proposed method, the selected band number in the classification process is reduced,and the computational performance for training and classification is improved. The result also shows that the CMODIS data can be effectively used for land use/land cover classification and change monitoring at regional and global scale.

  14. Feature Selection and Classification of Electroencephalographic Signals: An Artificial Neural Network and Genetic Algorithm Based Approach.

    Science.gov (United States)

    Erguzel, Turker Tekin; Ozekes, Serhat; Tan, Oguz; Gultekin, Selahattin

    2015-10-01

    Feature selection is an important step in many pattern recognition systems aiming to overcome the so-called curse of dimensionality. In this study, an optimized classification method was tested in 147 patients with major depressive disorder (MDD) treated with repetitive transcranial magnetic stimulation (rTMS). The performance of the combination of a genetic algorithm (GA) and a back-propagation (BP) neural network (BPNN) was evaluated using 6-channel pre-rTMS electroencephalographic (EEG) patterns of theta and delta frequency bands. The GA was first used to eliminate the redundant and less discriminant features to maximize classification performance. The BPNN was then applied to test the performance of the feature subset. Finally, classification performance using the subset was evaluated using 6-fold cross-validation. Although the slow bands of the frontal electrodes are widely used to collect EEG data for patients with MDD and provide quite satisfactory classification results, the outcomes of the proposed approach indicate noticeably increased overall accuracy of 89.12% and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.904 using the reduced feature set.

  15. A novel algorithm for ventricular arrhythmia classification using a fuzzy logic approach.

    Science.gov (United States)

    Weixin, Nong

    2016-12-01

    In the present study, it has been shown that an unnecessary implantable cardioverter-defibrillator (ICD) shock is often delivered to patients with an ambiguous ECG rhythm in the overlap zone between ventricular tachycardia (VT) and ventricular fibrillation (VF); these shocks significantly increase mortality. Therefore, accurate classification of the arrhythmia into VT, organized VF (OVF) or disorganized VF (DVF) is crucial to assist ICDs to deliver appropriate therapy. A classification algorithm using a fuzzy logic classifier was developed for accurately classifying the arrhythmias into VT, OVF or DVF. Compared with other studies, our method aims to combine ten ECG detectors that are calculated in the time domain and the frequency domain in addition to different levels of complexity for detecting subtle structure differences between VT, OVF and DVF. The classification in the overlap zone between VT and VF is refined by this study to avoid ambiguous identification. The present method was trained and tested using public ECG signal databases. A two-level classification was performed to first detect VT with an accuracy of 92.6 %, and then the discrimination between OVF and DVF was detected with an accuracy of 84.5 %. The validation results indicate that the proposed method has superior performance in identifying the organization level between the three types of arrhythmias (VT, OVF and DVF) and is promising for improving the appropriate therapy choice and decreasing the possibility of sudden cardiac death.

  16. Comparison of different classification algorithms for landmine detection using GPR

    Science.gov (United States)

    Karem, Andrew; Fadeev, Aleksey; Frigui, Hichem; Gader, Paul

    2010-04-01

    The Edge Histogram Detector (EHD) is a landmine detection algorithm that has been developed for ground penetrating radar (GPR) sensor data. It has been tested extensively and has demonstrated excellent performance. The EHD consists of two main components. The first one maps the raw data to a lower dimension using edge histogram based feature descriptors. The second component uses a possibilistic K-Nearest Neighbors (pK-NN) classifier to assign a confidence value. In this paper we show that performance of the baseline EHD could be improved by replacing the pK-NN classifier with model based classifiers. In particular, we investigate two such classifiers: Support Vector Regression (SVR), and Relevance Vector Machines (RVM). We investigate the adaptation of these classifiers to the landmine detection problem with GPR, and we compare their performance to the baseline EHD with a pK-NN classifier. As in the baseline EHD, we treat the problem as a two class classification problem: mine vs. clutter. Model parameters for the SVR and the RVM classifiers are estimated from training data using logarithmic grid search. For testing, soft labels are assigned to the test alarms. A confidence of zero indicates the maximum probability of being a false alarm. Similarly, a confidence of one represents the maximum probability of being a mine. Results on large and diverse GPR data collections show that the proposed modification to the classifier component can improve the overall performance of the EHD significantly.

  17. Detection of malicious attacks by Meta classification algorithms

    Directory of Open Access Journals (Sweden)

    G.Michael

    2015-03-01

    Full Text Available We address the problem of malicious node detection in a network based on the characteristics in the behavior of the network. This issue brings out a challenging set of research papers in the recent contributing a critical component to secure the network. This type of work evolves with many changes in the solution strategies. In this work, we propose carefully the learning models with cautious selection of attributes, selection of parameter thresholds and number of iterations. In this research, appropriate approach to evaluate the performance of a set of meta classifier algorithms (Ad Boost, Attribute selected classifier, Bagging, Classification via Regression, Filtered classifier, logit Boost, multiclass classifier. The ratio between training and testing data is made such way that compatibility of data patterns in both the sets are same. Hence we consider a set of supervised machine learning schemes with meta classifiers were applied on the selected dataset to predict the attack risk of the network environment . The trained models were then used for predicting the risk of the attacks in a web server environment or by any network administrator or any Security Experts. The Prediction Accuracy of the Classifiers was evaluated using 10-fold Cross Validation and the results have been compared to obtain the accuracy.

  18. Text Classification using Association Rule with a Hybrid Concept of Naive Bayes Classifier and Genetic Algorithm

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the automated assignment of natural language texts to predefined categories based on their content. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Now a day the demand of text classification is increasing tremendously. Keeping this demand into consideration, new and updated techniques are being developed for the purpose of automated text classification. This paper presents a new algorithm for text classification. Instead of using words, word relation i.e. association rules is used to derive feature set from pre-classified text documents. The concept of Naive Bayes Classifier is then used on derived features and finally a concept of Genetic Algorithm has been added for final classification. A system based on the proposed algorithm has been implemented and tested. The experimental ...

  19. Cloud classification from satellite data using a fuzzy sets algorithm: A polar example

    Science.gov (United States)

    Key, J. R.; Maslanik, J. A.; Barry, R. G.

    1988-01-01

    Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

  20. Cloud classification from satellite data using a fuzzy sets algorithm - A polar example

    Science.gov (United States)

    Key, J. R.; Maslanik, J. A.; Barry, R. G.

    1989-01-01

    Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine like areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

  1. Improved algorithms for the classification of rough rice using a bionic electronic nose based on PCA and the Wilks distribution.

    Science.gov (United States)

    Xu, Sai; Zhou, Zhiyan; Lu, Huazhong; Luo, Xiwen; Lan, Yubin

    2014-03-19

    Principal Component Analysis (PCA) is one of the main methods used for electronic nose pattern recognition. However, poor classification performance is common in classification and recognition when using regular PCA. This paper aims to improve the classification performance of regular PCA based on the existing Wilks Λ-statistic (i.e., combined PCA with the Wilks distribution). The improved algorithms, which combine regular PCA with the Wilks Λ-statistic, were developed after analysing the functionality and defects of PCA. Verification tests were conducted using a PEN3 electronic nose. The collected samples consisted of the volatiles of six varieties of rough rice (Zhongxiang1, Xiangwan13, Yaopingxiang, WufengyouT025, Pin 36, and Youyou122), grown in same area and season. The first two principal components used as analysis vectors cannot perform the rough rice varieties classification task based on a regular PCA. Using the improved algorithms, which combine the regular PCA with the Wilks Λ-statistic, many different principal components were selected as analysis vectors. The set of data points of the Mahalanobis distance between each of the varieties of rough rice was selected to estimate the performance of the classification. The result illustrates that the rough rice varieties classification task is achieved well using the improved algorithm. A Probabilistic Neural Networks (PNN) was also established to test the effectiveness of the improved algorithms. The first two principal components (namely PC1 and PC2) and the first and fifth principal component (namely PC1 and PC5) were selected as the inputs of PNN for the classification of the six rough rice varieties. The results indicate that the classification accuracy based on the improved algorithm was improved by 6.67% compared to the results of the regular method. These results prove the effectiveness of using the Wilks Λ-statistic to improve the classification accuracy of the regular PCA approach. The results

  2. Unsupervised classification algorithm based on EM method for polarimetric SAR images

    Science.gov (United States)

    Fernández-Michelli, J. I.; Hurtado, M.; Areta, J. A.; Muravchik, C. H.

    2016-07-01

    In this work we develop an iterative classification algorithm using complex Gaussian mixture models for the polarimetric complex SAR data. It is a non supervised algorithm which does not require training data or an initial set of classes. Additionally, it determines the model order from data, which allows representing data structure with minimum complexity. The algorithm consists of four steps: initialization, model selection, refinement and smoothing. After a simple initialization stage, the EM algorithm is iteratively applied in the model selection step to compute the model order and an initial classification for the refinement step. The refinement step uses Classification EM (CEM) to reach the final classification and the smoothing stage improves the results by means of non-linear filtering. The algorithm is applied to both simulated and real Single Look Complex data of the EMISAR mission and compared with the Wishart classification method. We use confusion matrix and kappa statistic to make the comparison for simulated data whose ground-truth is known. We apply Davies-Bouldin index to compare both classifications for real data. The results obtained for both types of data validate our algorithm and show that its performance is comparable to Wishart's in terms of classification quality.

  3. A Novel Algorithm of Network Trade Customer Classification Based on Fourier Basis Functions

    Directory of Open Access Journals (Sweden)

    Li Xinwu

    2013-11-01

    Full Text Available Learning algorithm of neural network is always an important research contents in neural network theory research and application field, learning algorithm about the feed-forward neural network has no satisfactory solution in particular for its defects in calculation speed. The paper presents a new Fourier basis functions neural network algorithm and applied it to classify network trade customer. First, 21 customer classification indicators are designed, based on characteristics and behaviors analysis of network trade customer, including customer characteristics type variables and customer behaviors type variables,; Second, Fourier basis functions is used to improve the calculation flow and algorithm structure of original BP neural network algorithm to speed up its convergence and then a new Fourier basis neural network model is constructed. Finally the experimental results show that the problem of convergence speed can been solved, and the accuracy of the customer classification are ensured when the new algorithm is used in network trade customer classification practically.

  4. Study on An Absolute Non-Collision Hash and Jumping Table IP Classification Algorithms

    Institute of Scientific and Technical Information of China (English)

    SHANG Feng-jun; PAN Ying-jun

    2004-01-01

    In order to classify packet, we propose a novel IP classification based the non-collision hash and jumping table Trie-tree (NHJTTT) algorithm, which is based on non-collision hash Trie-tree and Lakshman and Stiliadis proposing a 2-dimensional classification algorithm (LS algorithm).The core of algorithm consists of two parts: structure the non-collision hash function, which is constructed mainly based on destination /source port and protocol type field so that the hash function can avoid space explosion problem; introduce jumping table Trie-tree based LS algorithm in order to reduce time complexity.The test results show that the classification rate of NHJTTT algorithm is up to 1 million packets per second and the maximum memory consumed is 9 MB for 10 000 rules.

  5. Competitive evaluation of data mining algorithms for use in classification of leukocyte subtypes with Raman microspectroscopy.

    Science.gov (United States)

    Maguire, A; Vega-Carrascal, I; Bryant, J; White, L; Howe, O; Lyng, F M; Meade, A D

    2015-04-07

    Raman microspectroscopy has been investigated for some time for use in label-free cell sorting devices. These approaches require coupling of the Raman spectrometer to complex data mining algorithms for identification of cellular subtypes such as the leukocyte subpopulations of lymphocytes and monocytes. In this study, three distinct multivariate classification approaches, (PCA-LDA, SVMs and Random Forests) are developed and tested on their ability to classify the cellular subtype in extracted peripheral blood mononuclear cells (T-cell lymphocytes from myeloid cells), and are evaluated in terms of their respective classification performance. A strategy for optimisation of each of the classification algorithm is presented with emphasis on reduction of model complexity in each of the algorithms. The relative classification performance and performance characteristics are highlighted, overall suggesting the radial basis function SVM as a robust option for classification of leukocytes with Raman microspectroscopy.

  6. Classification of hyperspectral remote sensing images based on simulated annealing genetic algorithm and multiple instance learning

    Institute of Scientific and Technical Information of China (English)

    高红民; 周惠; 徐立中; 石爱业

    2014-01-01

    A hybrid feature selection and classification strategy was proposed based on the simulated annealing genetic algorithm and multiple instance learning (MIL). The band selection method was proposed from subspace decomposition, which combines the simulated annealing algorithm with the genetic algorithm in choosing different cross-over and mutation probabilities, as well as mutation individuals. Then MIL was combined with image segmentation, clustering and support vector machine algorithms to classify hyperspectral image. The experimental results show that this proposed method can get high classification accuracy of 93.13%at small training samples and the weaknesses of the conventional methods are overcome.

  7. A Non-Collision Hash Trie-Tree Based FastIP Classification Algorithm

    Institute of Scientific and Technical Information of China (English)

    徐恪; 吴建平; 喻中超; 徐明伟

    2002-01-01

    With the development of network applications, routers must support such functions as firewalls, provision of QoS, traffic billing, etc. All these functions need the classification of IP packets, according to how different the packets are processed subsequently, which is determined. In this article, a novel IP classification algorithm is proposed based on the Grid of Tries algorithm. The new algorithm not only eliminates original limitations in the case of multiple fields but also shows better performance in regard to both time and space. It has better overall performance than many other algorithms.

  8. A TCAM-based Two-dimensional Prefix Packet Classification Algorithm

    Institute of Scientific and Technical Information of China (English)

    王志恒; 刘刚; 白英彩

    2004-01-01

    Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimensional prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAMbased two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimensional PPC.Compared with other algorithms, this algorithm has the least time complexity and less space complexity.

  9. Data Mining Algorithms for Classification of Complex Biomedical Data

    Science.gov (United States)

    Lan, Liang

    2012-01-01

    In my dissertation, I will present my research which contributes to solve the following three open problems from biomedical informatics: (1) Multi-task approaches for microarray classification; (2) Multi-label classification of gene and protein prediction from multi-source biological data; (3) Spatial scan for movement data. In microarray…

  10. Applications of feature selection. [development of classification algorithms for LANDSAT data

    Science.gov (United States)

    Guseman, L. F., Jr.

    1976-01-01

    The use of satellite-acquired (LANDSAT) multispectral scanner (MSS) data to conduct an inventory of some crop of economic interest such as wheat over a large geographical area is considered in relation to the development of accurate and efficient algorithms for data classification. The dimension of the measurement space and the computational load for a classification algorithm is increased by the use of multitemporal measurements. Feature selection/combination techniques used to reduce the dimensionality of the problem are described.

  11. Development of a Fingerprint Gender Classification Algorithm Using Fingerprint Global Features

    OpenAIRE

    S. F. Abdullah; A.F.N.A. Rahman; Z.A.Abas; W.H.M Saad

    2016-01-01

    In forensic world, the process of identifying and calculating the fingerprint features is complex and take time when it is done manually using fingerprint laboratories magnifying glass. This study is meant to enhance the forensic manual method by proposing a new algorithm for fingerprint global feature extraction for gender classification. The result shows that the new algorithm gives higher acceptable readings which is above 70% of classification rate when it is compared to the manual method...

  12. Classification of Noisy Data: An Approach Based on Genetic Algorithms and Voronoi Tessellation

    DEFF Research Database (Denmark)

    Khan, Abdul Rauf; Schiøler, Henrik; Knudsen, Torben

    2016-01-01

    on the portioning of information space; and (2) use of the genetic algorithm to solve combinatorial problems for classification. In particular, we will implement our methodology to solve complex classification problems and compare the performance of our classifier with other well-known methods (SVM, KNN, and ANN...

  13. A method for classification of network traffic based on C5.0 Machine Learning Algorithm

    DEFF Research Database (Denmark)

    Bujlow, Tomasz; Riaz, M. Tahir; Pedersen, Jens Myrup

    2012-01-01

    and classification, an algorithm for recognizing flow direction and the C5.0 itself. Classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming and SSH. We performed subsequent tries using different sets of parameters and both training and classification options...

  14. Based on Perceptron Object Classification Algorithms for Processing of Agricultural Field Images

    OpenAIRE

    Ganchenko, V.; Doudkin, A.; Pawlowski, T.; Petrovsky, A.; Sadykhov, R.

    2012-01-01

    Neural network algorithms of object classification are considered in the paper applying to disease area recognition of agricultural field images. The images are presented as reduced normalized histograms. The classification is carried out for RGB-and HSV-space by using of a multilayer perceptron.

  15. Random forest algorithm for classification of multiwavelength data

    Institute of Scientific and Technical Information of China (English)

    Dan Gao; Yan-Xia Zhang; Yong-Heng Zhao

    2009-01-01

    We introduced a decision tree method called Random Forests for multiwavelength data classification. The data were adopted from different databases, including the Sloan Digital Sky Survey (SDSS) Data Release five, USNO, FIRST and ROSAT.We then studied the discrimination of quasars from stars and the classification of quasars,stars and galaxies with the sample from optical and radio bands and with that from optical and X-ray bands. Moreover, feature selection and feature weighting based on Random Forests were investigated. The performances based on different input patterns were compared. The experimental results show that the random forest method is an effective method for astronomical object classification and can be applied to other classification problems faced in astronomy. In addition, Random Forests will show its superiorities due to its own merits, e.g. classification, feature selection, feature weighting as well as outlier detection.

  16. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    Science.gov (United States)

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.

  17. ASTErIsM - Application of topometric clustering algorithms in automatic galaxy detection and classification

    CERN Document Server

    Tramacere, A; Dubath, P; Kneib, J -P; Courbin, F

    2016-01-01

    We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional ones related to segments of spiral arms. We use two different supervised ensemble classification algorithms, Random Forest,...

  18. Fast weighted K-view-voting algorithm for image texture classification

    Science.gov (United States)

    Liu, Hong; Lan, Yihua; Wang, Qian; Jin, Renchao; Song, Enmin; Hung, Chih-Cheng

    2012-02-01

    We propose an innovative and efficient approach to improve K-view-template (K-view-T) and K-view-datagram (K-view-D) algorithms for image texture classification. The proposed approach, called the weighted K-view-voting algorithm (K-view-V), uses a novel voting method for texture classification and an accelerating method based on the efficient summed square image (SSI) scheme as well as fast Fourier transform (FFT) to enable overall faster processing. Decision making, which assigns a pixel to a texture class, occurs by using our weighted voting method among the ``promising'' members in the neighborhood of a classified pixel. In other words, this neighborhood consists of all the views, and each view has a classified pixel in its territory. Experimental results on benchmark images, which are randomly taken from Brodatz Gallery and natural and medical images, show that this new classification algorithm gives higher classification accuracy than existing K-view algorithms. In particular, it improves the accurate classification of pixels near the texture boundary. In addition, the proposed acceleration method improves the processing speed of K-view-V as it requires much less computation time than other K-view algorithms. Compared with the results of earlier developed K-view algorithms and the gray level co-occurrence matrix (GLCM), the proposed algorithm is more robust, faster, and more accurate.

  19. Comparison of Tree Species Classifications at the Individual Tree Level by Combining ALS Data and RGB Images Using Different Algorithms

    Directory of Open Access Journals (Sweden)

    Songqiu Deng

    2016-12-01

    Full Text Available Individual tree delineation using remotely sensed data plays a very important role in precision forestry because it can provide detailed forest information on a large scale, which is required by forest managers. This study aimed to evaluate the utility of airborne laser scanning (ALS data for individual tree detection and species classification in Japanese coniferous forests with a high canopy density. Tree crowns in the study area were first delineated by the individual tree detection approach using a canopy height model (CHM derived from the ALS data. Then, the detected tree crowns were classified into four classes—Pinus densiflora, Chamaecyparis obtusa, Larix kaempferi, and broadleaved trees—using a tree crown-based classification approach with different combinations of 23 features derived from the ALS data and true-color (red-green-blue—RGB orthoimages. To determine the best combination of features for species classification, several loops were performed using a forward iteration method. Additionally, several classification algorithms were compared in the present study. The results of this study indicate that the combination of the RGB images with laser intensity, convex hull area, convex hull point volume, shape index, crown area, and crown height features produced the highest classification accuracy of 90.8% with the use of the quadratic support vector machines (QSVM classifier. Compared to only using the spectral characteristics of the orthophotos, the overall accuracy was improved by 14.1%, 9.4%, and 8.8% with the best combination of features when using the QSVM, neural network (NN, and random forest (RF approaches, respectively. In terms of different classification algorithms, the findings of our study recommend the QSVM approach rather than NNs and RFs to classify the tree species in the study area. However, these classification approaches should be further tested in other forests using different data. This study demonstrates

  20. Differential characteristic set algorithm for the complete symmetry classification of partial differential equations

    Institute of Scientific and Technical Information of China (English)

    Chaolu Temuer; Yu-shan BAI

    2009-01-01

    In this paper,we present a differential polynomial characteristic set algorithm for the complete symmetry classification of partial differential equations (PDEs)with some parameters. It can make the solution to the complete symmetry classification problem for PDEs become direct and systematic. As an illustrative example,the complete potential symmetry classifications of nonlinear and linear wave equations with an arbitrary function parameter are presented. This is a new application of the differential form characteristic set algorithm,i.e.,Wu's method,in differential equations.

  1. Performance evaluation of algorithms for the classification of metabolic 1H NMR fingerprints.

    Science.gov (United States)

    Hochrein, Jochen; Klein, Matthias S; Zacharias, Helena U; Li, Juan; Wijffels, Gene; Schirra, Horst Joachim; Spang, Rainer; Oefner, Peter J; Gronwald, Wolfram

    2012-12-07

    Nontargeted metabolite fingerprinting is increasingly applied to biomedical classification. The choice of classification algorithm may have a considerable impact on outcome. In this study, employing nested cross-validation for assessing predictive performance, six binary classification algorithms in combination with different strategies for data-driven feature selection were systematically compared on five data sets of urine, serum, plasma, and milk one-dimensional fingerprints obtained by proton nuclear magnetic resonance (NMR) spectroscopy. Support Vector Machines and Random Forests combined with t-score-based feature filtering performed well on most data sets, whereas the performance of the other tested methods varied between data sets.

  2. A Weighted Block Dictionary Learning Algorithm for Classification

    OpenAIRE

    Zhongrong Shi

    2016-01-01

    Discriminative dictionary learning, playing a critical role in sparse representation based classification, has led to state-of-the-art classification results. Among the existing discriminative dictionary learning methods, two different approaches, shared dictionary and class-specific dictionary, which associate each dictionary atom to all classes or a single class, have been studied. The shared dictionary is a compact method but with lack of discriminative information; the class-specific dict...

  3. IMPROVEMENT OF TCAM-BASED PACKET CLASSIFICATION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Xu Zhen; Zhang Jun; Rui Liyang; Sun Jun

    2008-01-01

    The feature of Ternary Content Addressable Memories (TCAMs) makes them particularly attractive for IP address lookup and packet classification applications in a router system. However, the limitations of TCAMs impede their utilization. In this paper, the solutions for decreasing the power consumption and avoiding entry expansion in range matching are addressed. Experimental results demonstrate that the proposed techniques can make some big improvements on the performance of TCAMs in IP address lookup and packet classification.

  4. Analysis of Distributed and Adaptive Genetic Algorithm for Mining Interesting Classification Rules

    Institute of Scientific and Technical Information of China (English)

    YI Yunfei; LIN Fang; QIN Jun

    2008-01-01

    Distributed genetic algorithm can be combined with the adaptive genetic algorithm for mining the interesting and comprehensible classification rules. The paper gives the method to encode for the rules, the fitness function, the selecting, crossover, mutation and migration operator for the DAGA at the same time are designed.

  5. Application and comparison of classification algorithms for recognition of Alzheimer's disease in electrical brain activity (EEG).

    Science.gov (United States)

    Lehmann, Christoph; Koenig, Thomas; Jelic, Vesna; Prichep, Leslie; John, Roy E; Wahlund, Lars-Olof; Dodge, Yadolah; Dierks, Thomas

    2007-04-15

    The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.

  6. A NEW UNSUPERVISED CLASSIFICATION ALGORITHM FOR POLARIMETRIC SAR IMAGES BASED ON FUZZY SET THEORY

    Institute of Scientific and Technical Information of China (English)

    Fu Yusheng; Xie Yan; Pi Yiming; Hou Yinming

    2006-01-01

    In this letter, a new method is proposed for unsupervised classification of terrain types and man-made objects using POLarimetric Synthetic Aperture Radar (POLSAR) data. This technique is a combination of the usage of polarimetric information of SAR images and the unsupervised classification method based on fuzzy set theory. Image quantization and image enhancement are used to preprocess the POLSAR data. Then the polarimetric information and Fuzzy C-Means (FCM) clustering algorithm are used to classify the preprocessed images. The advantages of this algorithm are the automated classification, its high classification accuracy, fast convergence and high stability. The effectiveness of this algorithm is demonstrated by experiments using SIR-C/X-SAR (Spaceborne Imaging Radar-C/X-band Synthetic Aperture Radar) data.

  7. Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data

    Directory of Open Access Journals (Sweden)

    Viswanath Satish

    2012-02-01

    Full Text Available Abstract Background Dimensionality reduction (DR enables the construction of a lower dimensional space (embedding from a higher dimensional feature space while preserving object-class discriminability. However several popular DR approaches suffer from sensitivity to choice of parameters and/or presence of noise in the data. In this paper, we present a novel DR technique known as consensus embedding that aims to overcome these problems by generating and combining multiple low-dimensional embeddings, hence exploiting the variance among them in a manner similar to ensemble classifier schemes such as Bagging. We demonstrate theoretical properties of consensus embedding which show that it will result in a single stable embedding solution that preserves information more accurately as compared to any individual embedding (generated via DR schemes such as Principal Component Analysis, Graph Embedding, or Locally Linear Embedding. Intelligent sub-sampling (via mean-shift and code parallelization are utilized to provide for an efficient implementation of the scheme. Results Applications of consensus embedding are shown in the context of classification and clustering as applied to: (1 image partitioning of white matter and gray matter on 10 different synthetic brain MRI images corrupted with 18 different combinations of noise and bias field inhomogeneity, (2 classification of 4 high-dimensional gene-expression datasets, (3 cancer detection (at a pixel-level on 16 image slices obtained from 2 different high-resolution prostate MRI datasets. In over 200 different experiments concerning classification and segmentation of biomedical data, consensus embedding was found to consistently outperform both linear and non-linear DR methods within all applications considered. Conclusions We have presented a novel framework termed consensus embedding which leverages ensemble classification theory within dimensionality reduction, allowing for application to a wide range

  8. An Improved BP Algorithm and Its Application in Classification of Surface Defects of Steel Plate

    Institute of Scientific and Technical Information of China (English)

    ZHAO Xiang-yang; LAI Kang-sheng; DAI Dong-ming

    2007-01-01

    Artificial neural network is a new approach to pattern recognition and classification. The model of multilayer perceptron (MLP) and back-propagation (BP) is used to train the algorithm in the artificial neural network. An improved fast algorithm of the BP network was presented, which adopts a singular value decomposition (SVD) and a generalized inverse matrix. It not only increases the speed of network learning but also achieves a satisfying precision. The simulation and experiment results show the effect of improvement of BP algorithm on the classification of the surface defects of steel plate.

  9. Ice classification algorithm development and verification for the Alaska SAR Facility using aircraft imagery

    Science.gov (United States)

    Holt, Benjamin; Kwok, Ronald; Rignot, Eric

    1989-01-01

    The Alaska SAR Facility (ASF) at the University of Alaska, Fairbanks is a NASA program designed to receive, process, and archive SAR data from ERS-1 and to support investigations that will use this regional data. As part of ASF, specialized subsystems and algorithms to produce certain geophysical products from the SAR data are under development. Of particular interest are ice motion, ice classification, and ice concentration. This work focuses on the algorithm under development for ice classification, and the verification of the algorithm using C-band aircraft SAR imagery recently acquired over the Alaskan arctic.

  10. A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis

    Directory of Open Access Journals (Sweden)

    Bendi Venkata Ramana

    2011-03-01

    Full Text Available Patients with Liver disease have been continuously increasing because of excessive consumption ofalcohol, inhale of harmful gases, intake of contaminated food, pickles and drugs. Automatic classificationtools may reduce burden on doctors. This paper evaluates the selected classification algorithms for theclassification of some liver patient datasets. The classification algorithms considered here are Naïve Bayesclassifier, C4.5, Back propagation Neural Network algorithm, and Support Vector Machines. Thesealgorithms are evaluated based on four criteria: Accuracy, Precision, Sensitivity and Specificity

  11. Packet Classification by Multilevel Cutting of the Classification Space: An Algorithmic-Architectural Solution for IP Packet Classification in Next Generation Networks

    Directory of Open Access Journals (Sweden)

    Motasem Aldiab

    2008-01-01

    Full Text Available Traditionally, the Internet provides only a “best-effort” service, treating all packets going to the same destination equally. However, providing differentiated services for different users based on their quality requirements is increasingly becoming a demanding issue. For this, routers need to have the capability to distinguish and isolate traffic belonging to different flows. This ability to determine the flow each packet belongs to is called packet classification. Technology vendors are reluctant to support algorithmic solutions for classification due to their nondeterministic performance. Although content addressable memories (CAMs are favoured by technology vendors due to their deterministic high-lookup rates, they suffer from the problems of high-power consumption and high-silicon cost. This paper provides a new algorithmic-architectural solution for packet classification that mixes CAMs with algorithms based on multilevel cutting of the classification space into smaller spaces. The provided solution utilizes the geometrical distribution of rules in the classification space. It provides the deterministic performance of CAMs, support for dynamic updates, and added flexibility for system designers.

  12. Application of ant colony algorithm in plant leaves classification based on infrared spectroscopy

    Science.gov (United States)

    Guo, Tiantai; Hong, Bo; Kong, Ming; Zhao, Jun

    2014-04-01

    This paper proposes to use ant colony algorithm in the analysis of spectral data of plant leaves to achieve the best classification of different plants within a short time. Intelligent classification is realized according to different components of featured information included in near infrared spectrum data of plants. The near infrared diffusive emission spectrum curves of the leaves of Cinnamomum camphora and Acer saccharum Marsh are acquired, which have 75 leaves respectively, and are divided into two groups. Then, the acquired data are processed using ant colony algorithm and the same kind of leaves can be classified as a class by ant colony clustering algorithm. Finally, the two groups of data are classified into two classes. Experiment results show that the algorithm can distinguish different species up to the percentage of 100%. The classification of plant leaves has important application value in agricultural development, research of species invasion, floriculture etc.

  13. Research of Plant-Leaves Classification Algorithm Based on Supervised LLE

    Directory of Open Access Journals (Sweden)

    Yan Qing

    2013-06-01

    Full Text Available A new supervised LLE method based on the fisher projection was proposed in this paper, and combined it with a new classification algorithm based on manifold learning to realize the recognition of the plant leaves. Firstly,the method utilizes the Fisher projection distance to replace the sample's geodesic distance, and a new supervised LLE algorithm is obtained .Then, a classification algorithm which uses the manifold reconstruction error to distinguish the sample classification directly is adopted. This algorithm can utilize the category information better,and improve recognition rate effectively. At the same time, it has the advantage of the easily parameter estimation. The experimental results based on the real-world plant leaf databases shows its average accuracy of recognition was up to 95.17%.

  14. Two-step Classification Algorithm Based on Decision-Theoretic Rough Set Theory

    Directory of Open Access Journals (Sweden)

    Jun Wang

    2013-07-01

    Full Text Available This paper introduces rough set theory and decision-theoretic rough set theory. Then based on the latter, a two-step classification algorithm is proposed. Compared with primitive DTRST algorithms, our method decreases the range of negative domain and employs a two-steps strategy in classification. New samples and unknown samples can be estimated whether it belongs to the negative domain when they are found. Then, fewer wrong samples will be classified in negative domain. Therefore, error rate and loss of classification is lowered. Compared with traditional information filtering methods, such as Naive Bayes algorithm and primitive DTRST algorithm, the proposed method can gain high accuracy and low loss.  

  15. Application of a Genetic Algorithm to Nearest Neighbour Classification

    NARCIS (Netherlands)

    Simkin, S.; Verwaart, D.; Vrolijk, H.C.J.

    2005-01-01

    This paper describes the application of a genetic algorithm to nearest-neighbour based imputation of sample data into a census data dataset. The genetic algorithm optimises the selection and weights of variables used for measuring distance. The results show that the measure of fit can be improved by

  16. Tomato classification based on laser metrology and computer algorithms

    Science.gov (United States)

    Igno Rosario, Otoniel; Muñoz Rodríguez, J. Apolinar; Martínez Hernández, Haydeé P.

    2011-08-01

    An automatic technique for tomato classification is presented based on size and color. The size is determined based on surface contouring by laser line scanning. Here, a Bezier network computes the tomato height based on the line position. The tomato color is determined by CIELCH color space and the components red and green. Thus, the tomato size is classified in large, medium and small. Also, the tomato is classified into six colors associated with its maturity. The performance and accuracy of the classification system is evaluated based on methods reported in the recent years. The technique is tested and experimental results are presented.

  17. Classification of Noisy Data: An Approach Based on Genetic Algorithms and Voronoi Tessellation

    DEFF Research Database (Denmark)

    Khan, Abdul Rauf; Schiøler, Henrik; Knudsen, Torben;

    2016-01-01

    Classification is one of the major constituents of the data-mining toolkit. The well-known methods for classification are built on either the principle of logic or statistical/mathematical reasoning for classification. In this article we propose: (1) a different strategy, which is based......). The results of this study suggest that our proposed methodology is specialized to deal with the classification problem of highly imbalanced classes with significant overlap....... on the portioning of information space; and (2) use of the genetic algorithm to solve combinatorial problems for classification. In particular, we will implement our methodology to solve complex classification problems and compare the performance of our classifier with other well-known methods (SVM, KNN, and ANN...

  18. Study on Increasing the Accuracy of Classification Based on Ant Colony algorithm

    Science.gov (United States)

    Yu, M.; Chen, D.-W.; Dai, C.-Y.; Li, Z.-L.

    2013-05-01

    The application for GIS advances the ability of data analysis on remote sensing image. The classification and distill of remote sensing image is the primary information source for GIS in LUCC application. How to increase the accuracy of classification is an important content of remote sensing research. Adding features and researching new classification methods are the ways to improve accuracy of classification. Ant colony algorithm based on mode framework defined, agents of the algorithms in nature-inspired computation field can show a kind of uniform intelligent computation mode. It is applied in remote sensing image classification is a new method of preliminary swarm intelligence. Studying the applicability of ant colony algorithm based on more features and exploring the advantages and performance of ant colony algorithm are provided with very important significance. The study takes the outskirts of Fuzhou with complicated land use in Fujian Province as study area. The multi-source database which contains the integration of spectral information (TM1-5, TM7, NDVI, NDBI) and topography characters (DEM, Slope, Aspect) and textural information (Mean, Variance, Homogeneity, Contrast, Dissimilarity, Entropy, Second Moment, Correlation) were built. Classification rules based different characters are discovered from the samples through ant colony algorithm and the classification test is performed based on these rules. At the same time, we compare with traditional maximum likelihood method, C4.5 algorithm and rough sets classifications for checking over the accuracies. The study showed that the accuracy of classification based on the ant colony algorithm is higher than other methods. In addition, the land use and cover changes in Fuzhou for the near term is studied and display the figures by using remote sensing technology based on ant colony algorithm. In addition, the land use and cover changes in Fuzhou for the near term is studied and display the figures by using

  19. Analysis and Evaluation of IKONOS Image Fusion Algorithm Based on Land Cover Classification

    Institute of Scientific and Technical Information of China (English)

    Xia; JING; Yan; BAO

    2015-01-01

    Different fusion algorithm has its own advantages and limitations,so it is very difficult to simply evaluate the good points and bad points of the fusion algorithm. Whether an algorithm was selected to fuse object images was also depended upon the sensor types and special research purposes. Firstly,five fusion methods,i. e. IHS,Brovey,PCA,SFIM and Gram-Schmidt,were briefly described in the paper. And then visual judgment and quantitative statistical parameters were used to assess the five algorithms. Finally,in order to determine which one is the best suitable fusion method for land cover classification of IKONOS image,the maximum likelihood classification( MLC) was applied using the above five fusion images. The results showed that the fusion effect of SFIM transform and Gram-Schmidt transform were better than the other three image fusion methods in spatial details improvement and spectral information fidelity,and Gram-Schmidt technique was superior to SFIM transform in the aspect of expressing image details. The classification accuracy of the fused image using Gram-Schmidt and SFIM algorithms was higher than that of the other three image fusion methods,and the overall accuracy was greater than 98%. The IHS-fused image classification accuracy was the lowest,the overall accuracy and kappa coefficient were 83. 14% and 0. 76,respectively. Thus the IKONOS fusion images obtained by the Gram-Schmidt and SFIM were better for improving the land cover classification accuracy.

  20. Spatiotemporal representations of rapid visual target detection: a single-trial EEG classification algorithm.

    Science.gov (United States)

    Fuhrmann Alpert, Galit; Manor, Ran; Spanier, Assaf B; Deouell, Leon Y; Geva, Amir B

    2014-08-01

    Brain computer interface applications, developed for both healthy and clinical populations, critically depend on decoding brain activity in single trials. The goal of the present study was to detect distinctive spatiotemporal brain patterns within a set of event related responses. We introduce a novel classification algorithm, the spatially weighted FLD-PCA (SWFP), which is based on a two-step linear classification of event-related responses, using fisher linear discriminant (FLD) classifier and principal component analysis (PCA) for dimensionality reduction. As a benchmark algorithm, we consider the hierarchical discriminant component Analysis (HDCA), introduced by Parra, et al. 2007. We also consider a modified version of the HDCA, namely the hierarchical discriminant principal component analysis algorithm (HDPCA). We compare single-trial classification accuracies of all the three algorithms, each applied to detect target images within a rapid serial visual presentation (RSVP, 10 Hz) of images from five different object categories, based on single-trial brain responses. We find a systematic superiority of our classification algorithm in the tested paradigm. Additionally, HDPCA significantly increases classification accuracies compared to the HDCA. Finally, we show that presenting several repetitions of the same image exemplars improve accuracy, and thus may be important in cases where high accuracy is crucial.

  1. Improved Algorithm of Pattern Classification and Recognition Applied in a Coal Dust Sensor

    Institute of Scientific and Technical Information of China (English)

    MA Feng-ying; SONG Shu

    2007-01-01

    To resolve the conflicting requirements of measurement precision and real-time performance speed, an improved algorithm for pattern classification and recognition was developed. The angular distribution of diffracted light varies with particle size. These patterns could be classified into groups with an innovative classification based upon reference dust samples. After such classification patterns could be recognized easily and rapidly by minimizing the variance between the reference pattern and dust sample eigenvectors. Simulation showed that the maximum recognition speed improves 20 fold. This enables the use of a single-chip, real-time inversion algorithm. An increased number of reference patterns reduced the errors in total and respiring coal dust measurements. Experiments in coal mine testify that the accuracy of sensor achieves 95%. Results indicate the improved algorithm enhances the precision and real-time capability of the coal dust sensor effectively.

  2. A Supervised Classification Algorithm for Note Onset Detection

    Directory of Open Access Journals (Sweden)

    Douglas Eck

    2007-01-01

    Full Text Available This paper presents a novel approach to detecting onsets in music audio files. We use a supervised learning algorithm to classify spectrogram frames extracted from digital audio as being onsets or nononsets. Frames classified as onsets are then treated with a simple peak-picking algorithm based on a moving average. We present two versions of this approach. The first version uses a single neural network classifier. The second version combines the predictions of several networks trained using different hyperparameters. We describe the details of the algorithm and summarize the performance of both variants on several datasets. We also examine our choice of hyperparameters by describing results of cross-validation experiments done on a custom dataset. We conclude that a supervised learning approach to note onset detection performs well and warrants further investigation.

  3. Analysis of data mining classification by comparison of C4.5 and ID algorithms

    Science.gov (United States)

    Sudrajat, R.; Irianingsih, I.; Krisnawan, D.

    2017-01-01

    The rapid development of information technology, triggered by the intensive use of information technology. For example, data mining widely used in investment. Many techniques that can be used assisting in investment, the method that used for classification is decision tree. Decision tree has a variety of algorithms, such as C4.5 and ID3. Both algorithms can generate different models for similar data sets and different accuracy. C4.5 and ID3 algorithms with discrete data provide accuracy are 87.16% and 99.83% and C4.5 algorithm with numerical data is 89.69%. C4.5 and ID3 algorithms with discrete data provides 520 and 598 customers and C4.5 algorithm with numerical data is 546 customers. From the analysis of the both algorithm it can classified quite well because error rate less than 15%.

  4. Land cover classification using random forest with genetic algorithm-based parameter optimization

    Science.gov (United States)

    Ming, Dongping; Zhou, Tianning; Wang, Min; Tan, Tian

    2016-07-01

    Land cover classification based on remote sensing imagery is an important means to monitor, evaluate, and manage land resources. However, it requires robust classification methods that allow accurate mapping of complex land cover categories. Random forest (RF) is a powerful machine-learning classifier that can be used in land remote sensing. However, two important parameters of RF classification, namely, the number of trees and the number of variables tried at each split, affect classification accuracy. Thus, optimal parameter selection is an inevitable problem in RF-based image classification. This study uses the genetic algorithm (GA) to optimize the two parameters of RF to produce optimal land cover classification accuracy. HJ-1B CCD2 image data are used to classify six different land cover categories in Changping, Beijing, China. Experimental results show that GA-RF can avoid arbitrariness in the selection of parameters. The experiments also compare land cover classification results by using GA-RF method, traditional RF method (with default parameters), and support vector machine method. When the GA-RF method is used, classification accuracies, respectively, improved by 1.02% and 6.64%. The comparison results show that GA-RF is a feasible solution for land cover classification without compromising accuracy or incurring excessive time.

  5. Pap-smear Classification Using Efficient Second Order Neural Network Training Algorithms

    DEFF Research Database (Denmark)

    Ampazis, Nikolaos; Dounias, George; Jantzen, Jan

    2004-01-01

    problem. The classification results obtained from the application of the algorithms on a standard benchmark pap-smear data set reveal the power of the two methods to obtain excellent solutions in difficult classification problems whereas other standard computational intelligence techniques achieve......In this paper we make use of two highly efficient second order neural network training algorithms, namely the LMAM (Levenberg-Marquardt with Adaptive Momentum) and OLMAM (Optimized Levenberg-Marquardt with Adaptive Momentum), for the construction of an efficient pap-smear test classifier...

  6. A RBF classification method of remote sensing image based on genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    The remote sensing image classification has stimulated considerable interest as an effective method for better retrieving information from the rapidly increasing large volume, complex and distributed satellite remote imaging data of large scale and cross-time, due to the increase of remote image quantities and image resolutions. In the paper, the genetic algorithms were employed to solve the weighting of the radial basis faction networks in order to improve the precision of remote sensing image classification. The remote sensing image classification was also introduced for the GIS spatial analysis and the spatial online analytical processing (OLAP) ,and the resulted effectiveness was demonstrated in the analysis of land utilization variation of Daqing city.

  7. ASTErIsM: application of topometric clustering algorithms in automatic galaxy detection and classification

    Science.gov (United States)

    Tramacere, A.; Paraficz, D.; Dubath, P.; Kneib, J.-P.; Courbin, F.

    2016-12-01

    We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm, which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional groups related to segments of spiral arms. We use two different supervised ensemble classification algorithms: Random Forest and Gradient Boosting. Using a sample of ≃24 000 galaxies taken from the Galaxy Zoo 2 main sample with spectroscopic redshifts, and we test our classification against the Galaxy Zoo 2 catalogue. We find that features extracted from our pipeline give, on average, an accuracy of ≃93 per cent, when testing on a test set with a size of 20 per cent of our full data set, with features deriving from the angular distribution of density attractor ranking at the top of the discrimination power.

  8. PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2011-01-01

    Full Text Available Packet classification plays a crucial role for a number of network services such as policy-based routing, firewalls, and traffic billing, to name a few. However, classification can be a bottleneck in the above-mentioned applications if not implemented properly and efficiently. In this paper, we propose PCIU, a novel classification algorithm, which improves upon previously published work. PCIU provides lower preprocessing time, lower memory consumption, ease of incremental rule update, and reasonable classification time compared to state-of-the-art algorithms. The proposed algorithm was evaluated and compared to RFC and HiCut using several benchmarks. Results obtained indicate that PCIU outperforms these algorithms in terms of speed, memory usage, incremental update capability, and preprocessing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Two such implementations are detailed and discussed in this paper. The results indicate that a hardware/software codesign approach results in a slower, but easier to optimize and improve within time constraints, PCIU solution. A hardware accelerator based on an ESL approach using Handel-C, on the other hand, resulted in a 31x speed-up over a pure software implementation running on a state of the art Xeon processor.

  9. Data classification with radial basis function networks based on a novel kernel density estimation algorithm.

    Science.gov (United States)

    Oyang, Yen-Jen; Hwang, Shien-Ching; Ou, Yu-Yen; Chen, Chien-Yu; Chen, Zhi-Wei

    2005-01-01

    This paper presents a novel learning algorithm for efficient construction of the radial basis function (RBF) networks that can deliver the same level of accuracy as the support vector machines (SVMs) in data classification applications. The proposed learning algorithm works by constructing one RBF subnetwork to approximate the probability density function of each class of objects in the training data set. With respect to algorithm design, the main distinction of the proposed learning algorithm is the novel kernel density estimation algorithm that features an average time complexity of O(n log n), where n is the number of samples in the training data set. One important advantage of the proposed learning algorithm, in comparison with the SVM, is that the proposed learning algorithm generally takes far less time to construct a data classifier with an optimized parameter setting. This feature is of significance for many contemporary applications, in particular, for those applications in which new objects are continuously added into an already large database. Another desirable feature of the proposed learning algorithm is that the RBF networks constructed are capable of carrying out data classification with more than two classes of objects in one single run. In other words, unlike with the SVM, there is no need to resort to mechanisms such as one-against-one or one-against-all for handling datasets with more than two classes of objects. The comparison with SVM is of particular interest, because it has been shown in a number of recent studies that SVM generally are able to deliver higher classification accuracy than the other existing data classification algorithms. As the proposed learning algorithm is instance-based, the data reduction issue is also addressed in this paper. One interesting observation in this regard is that, for all three data sets used in data reduction experiments, the number of training samples remaining after a naive data reduction mechanism is

  10. Research of information classification and strategy intelligence extract algorithm based on military strategy hall

    Science.gov (United States)

    Chen, Lei; Li, Dehua; Yang, Jie

    2007-12-01

    Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.

  11. Classification of posture maintenance data with fuzzy clustering algorithms

    Science.gov (United States)

    Bezdek, James C.

    1992-01-01

    Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.

  12. Walking pattern classification and walking distance estimation algorithms using gait phase information.

    Science.gov (United States)

    Wang, Jeen-Shing; Lin, Che-Wei; Yang, Ya-Ting C; Ho, Yu-Jen

    2012-10-01

    This paper presents a walking pattern classification and a walking distance estimation algorithm using gait phase information. A gait phase information retrieval algorithm was developed to analyze the duration of the phases in a gait cycle (i.e., stance, push-off, swing, and heel-strike phases). Based on the gait phase information, a decision tree based on the relations between gait phases was constructed for classifying three different walking patterns (level walking, walking upstairs, and walking downstairs). Gait phase information was also used for developing a walking distance estimation algorithm. The walking distance estimation algorithm consists of the processes of step count and step length estimation. The proposed walking pattern classification and walking distance estimation algorithm have been validated by a series of experiments. The accuracy of the proposed walking pattern classification was 98.87%, 95.45%, and 95.00% for level walking, walking upstairs, and walking downstairs, respectively. The accuracy of the proposed walking distance estimation algorithm was 96.42% over a walking distance.

  13. Data classification using metaheuristic Cuckoo Search technique for Levenberg Marquardt back propagation (CSLM) algorithm

    Science.gov (United States)

    Nawi, Nazri Mohd.; Khan, Abdullah; Rehman, M. Z.

    2015-05-01

    A nature inspired behavior metaheuristic techniques which provide derivative-free solutions to solve complex problems. One of the latest additions to the group of nature inspired optimization procedure is Cuckoo Search (CS) algorithm. Artificial Neural Network (ANN) training is an optimization task since it is desired to find optimal weight set of a neural network in training process. Traditional training algorithms have some limitation such as getting trapped in local minima and slow convergence rate. This study proposed a new technique CSLM by combining the best features of two known algorithms back-propagation (BP) and Levenberg Marquardt algorithm (LM) for improving the convergence speed of ANN training and avoiding local minima problem by training this network. Some selected benchmark classification datasets are used for simulation. The experiment result show that the proposed cuckoo search with Levenberg Marquardt algorithm has better performance than other algorithm used in this study.

  14. Engineering and Image Classification Framework Using Multi Instance Learning with KCCA Algorithm

    Directory of Open Access Journals (Sweden)

    P. Bhuvaneswari

    2012-12-01

    Full Text Available Image classification is a challenging task with many applications in computer vision. Images are annotated with multiple keywords that may or may not correlated. Therefore, image classification may be naturally modelled as Multiple Instance Learning problem. The main challenge of this problem is that usually classes are overlapped and correlated. In single label classification the correlation among instance is not taken into account. In an image the instance may belongs to several classes. The correlations among different tags can significantly help predicting precise labels for improving the performance of multi label image classification. This study proposes a method Kernel Canonical Correlation Analysis (KCCA and Multi Instance Learning for multi label image classification, for improving the performance of classification accuracy. The proposed framework comprises an input image which can be partitioned into image patches and features can be extracted. It breaks the original training set into several disjoint clusters of data. It then trains a multilabel classifier from the data of each cluster. The K means clustering is used to perform automatic instance cluster. Kernel canonical Correlation analysis can be made between disjoint clusters to know exact correspondence between image patches. Multi Instance Learning is one potential solution to address the issue of huge inter-concept visual similarity and improve the classification accuracy. The proposed approach reduces the training time of standard multi-label classification algorithms, particularly in the case of large number of labels.

  15. Improved Fault Classification in Series Compensated Transmission Line: Comparative Evaluation of Chebyshev Neural Network Training Algorithms.

    Science.gov (United States)

    Vyas, Bhargav Y; Das, Biswarup; Maheshwari, Rudra Prakash

    2016-08-01

    This paper presents the Chebyshev neural network (ChNN) as an improved artificial intelligence technique for power system protection studies and examines the performances of two ChNN learning algorithms for fault classification of series compensated transmission line. The training algorithms are least-square Levenberg-Marquardt (LSLM) and recursive least-square algorithm with forgetting factor (RLSFF). The performances of these algorithms are assessed based on their generalization capability in relating the fault current parameters with an event of fault in the transmission line. The proposed algorithm is fast in response as it utilizes postfault samples of three phase currents measured at the relaying end corresponding to half-cycle duration only. After being trained with only a small part of the generated fault data, the algorithms have been tested over a large number of fault cases with wide variation of system and fault parameters. Based on the studies carried out in this paper, it has been found that although the RLSFF algorithm is faster for training the ChNN in the fault classification application for series compensated transmission lines, the LSLM algorithm has the best accuracy in testing. The results prove that the proposed ChNN-based method is accurate, fast, easy to design, and immune to the level of compensations. Thus, it is suitable for digital relaying applications.

  16. Model classification rate control algorithm for video coding

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    A model classification rate control method for video coding is proposed. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model.The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps,such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macroblocks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG and H. 264 rate control.

  17. Improved neural network algorithm for classification of UAV imagery related to Wenchuan earthquake

    Science.gov (United States)

    Lin, Na; Yang, Wunian; Wang, Bin

    2009-06-01

    When Wenchuan earthquake struck, the terrain of the region changed violently. Unmanned aerial vehicles (UAV) remote sensing is effective in extracting first hand information. The high resolution images are of great importance in disaster management and relief operations. Back propagation (BP) neural network is an artificial neural network which combines multi-layer feed-forward network and error back-propagation algorithm. It has a strong input-output mapping capability, and does not require the object to be identified obeying certain distribution law. It has strong non-linear features and error-tolerant capabilities. Remotely-sensed image classification can achieve high accuracy and satisfactory error-tolerant capabilities. But it also has drawbacks such as slow convergence speed and can probably be trapped by local minimum points. In order to solve these problems, we have improved this algorithm through setting up self-adaptive training rate and adding momentum factor. UAV high-resolution aerial image in Taoguan District of Wenchuan County is used as data source. First, we preprocess UAV aerial images and rectify geometric distortion in images. Training samples were selected and purified. The image is then classified using the improved BP neural network algorithm. Finally, we compare such classification result with the maximum likelihood classification (MLC) result. Numerical comparison shows that the overall accuracy of maximum likelihood classification is 83.8%, while the improved BP neural network classification is 89.7%. The testing results indicate that the latter is better.

  18. A Comparative Study of Classification and Regression Algorithms for Modelling Students' Academic Performance

    Science.gov (United States)

    Strecht, Pedro; Cruz, Luís; Soares, Carlos; Mendes-Moreira, João; Abreu, Rui

    2015-01-01

    Predicting the success or failure of a student in a course or program is a problem that has recently been addressed using data mining techniques. In this paper we evaluate some of the most popular classification and regression algorithms on this problem. We address two problems: prediction of approval/failure and prediction of grade. The former is…

  19. Experiments in Discourse Analysis Impact on Information Classification and Retrieval Algorithms.

    Science.gov (United States)

    Morato, Jorge; Llorens, J.; Genova, G.; Moreiro, J. A.

    2003-01-01

    Discusses the inclusion of contextual information in indexing and retrieval systems to improve results and the ability to carry out text analysis by means of linguistic knowledge. Presents research that investigated whether discourse variables have an impact on information and retrieval and classification algorithms. (Author/LRW)

  20. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis

    Directory of Open Access Journals (Sweden)

    Benjamin W. Y. Lo

    2016-01-01

    Conclusions: A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.

  1. The surgical algorithm for the AOSpine thoracolumbar spine injury classification system

    NARCIS (Netherlands)

    Vaccaro, Alexander R.; Schroeder, Gregory D.; Kepler, Christopher K.; Cumhur Oner, F.; Vialle, Luiz R.; Kandziora, Frank; Koerner, John D.; Kurd, Mark F.; Reinhold, Max; Schnake, Klaus J.; Chapman, Jens; Aarabi, Bizhan; Fehlings, Michael G.; Dvorak, Marcel F.

    2016-01-01

    Purpose: The goal of the current study is to establish a surgical algorithm to accompany the AOSpine thoracolumbar spine injury classification system. Methods: A survey was sent to AOSpine members from the six AO regions of the world, and surgeons were asked if a patient should undergo an initial tr

  2. Classification of EEG Signals using adaptive weighted distance nearest neighbor algorithm

    Directory of Open Access Journals (Sweden)

    E. Parvinnia

    2014-01-01

    Full Text Available Electroencephalogram (EEG signals are often used to diagnose diseases such as seizure, alzheimer, and schizophrenia. One main problem with the recorded EEG samples is that they are not equally reliable due to the artifacts at the time of recording. EEG signal classification algorithms should have a mechanism to handle this issue. It seems that using adaptive classifiers can be useful for the biological signals such as EEG. In this paper, a general adaptive method named weighted distance nearest neighbor (WDNN is applied for EEG signal classification to tackle this problem. This classification algorithm assigns a weight to each training sample to control its influence in classifying test samples. The weights of training samples are used to find the nearest neighbor of an input query pattern. To assess the performance of this scheme, EEG signals of thirteen schizophrenic patients and eighteen normal subjects are analyzed for the classification of these two groups. Several features including, fractal dimension, band power and autoregressive (AR model are extracted from EEG signals. The classification results are evaluated using Leave one (subject out cross validation for reliable estimation. The results indicate that combination of WDNN and selected features can significantly outperform the basic nearest-neighbor and the other methods proposed in the past for the classification of these two groups. Therefore, this method can be a complementary tool for specialists to distinguish schizophrenia disorder.

  3. Consistent Classification of Landsat Time Series with an Improved Automatic Adaptive Signature Generalization Algorithm

    Directory of Open Access Journals (Sweden)

    Matthew P. Dannenberg

    2016-08-01

    Full Text Available Classifying land cover is perhaps the most common application of remote sensing, yet classification at frequent temporal intervals remains a challenging task due to radiometric differences among scenes, time and budget constraints, and semantic differences among class definitions from different dates. The automatic adaptive signature generalization (AASG algorithm overcomes many of these limitations by locating stable sites between two images and using them to adapt class spectral signatures from a high-quality reference classification to a new image, which mitigates the impacts of radiometric and phenological differences between images and ensures that class definitions remain consistent between the two classifications. We refined AASG to adapt stable site identification parameters to each individual land cover class, while also incorporating improved input data and a random forest classifier. In the Research Triangle region of North Carolina, our new version of AASG demonstrated an improved ability to update existing land cover classifications compared to the initial version of AASG, particularly for low intensity developed, mixed forest, and woody wetland classes. Topographic indices were particularly important for distinguishing woody wetlands from other forest types, while multi-seasonal imagery contributed to improved classification of water, developed, forest, and hay/pasture classes. These results demonstrate both the flexibility of the AASG algorithm and the potential for using it to produce high-quality land cover classifications that can utilize the entire temporal range of the Landsat archive in an automated fashion while maintaining consistent class definitions through time.

  4. A Novel Algorithm for Imbalance Data Classification Based on Neighborhood Hypergraph

    Directory of Open Access Journals (Sweden)

    Feng Hu

    2014-01-01

    Full Text Available The classification problem for imbalance data is paid more attention to. So far, many significant methods are proposed and applied to many fields. But more efficient methods are needed still. Hypergraph may not be powerful enough to deal with the data in boundary region, although it is an efficient tool to knowledge discovery. In this paper, the neighborhood hypergraph is presented, combining rough set theory and hypergraph. After that, a novel classification algorithm for imbalance data based on neighborhood hypergraph is developed, which is composed of three steps: initialization of hyperedge, classification of training data set, and substitution of hyperedge. After conducting an experiment of 10-fold cross validation on 18 data sets, the proposed algorithm has higher average accuracy than others.

  5. Multispectral image classification of MRI data using an empirically-derived clustering algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Horn, K.M.; Osbourn, G.C.; Bouchard, A.M. [Sandia National Labs., Albuquerque, NM (United States); Sanders, J.A. [Univ. of New Mexico, Albuquerque, NM (United States)]|[VA Hospital, Albuquerque, NM (United States)

    1998-08-01

    Multispectral image analysis of magnetic resonance imaging (MRI) data has been performed using an empirically-derived clustering algorithm. This algorithm groups image pixels into distinct classes which exhibit similar response in the T{sub 2} 1st and 2nd-echo, and T{sub 1} (with ad without gadolinium) MRI images. The grouping is performed in an n-dimensional mathematical space; the n-dimensional volumes bounding each class define each specific tissue type. The classification results are rendered again in real-space by colored-coding each grouped class of pixels (associated with differing tissue types). This classification method is especially well suited for class volumes with complex boundary shapes, and is also expected to robustly detect abnormal tissue classes. The classification process is demonstrated using a three dimensional data set of MRI scans of a human brain tumor.

  6. Synthesis of supervised classification algorithm using intelligent and statistical tools

    Directory of Open Access Journals (Sweden)

    Ali Douik

    2009-09-01

    Full Text Available A fundamental task in detecting foreground objects in both static and dynamic scenes is to take the best choice of color system representation and the efficient technique for background modeling. We propose in this paper a non-parametric algorithm dedicated to segment and to detect objects in color images issued from a football sports meeting. Indeed segmentation by pixel concern many applications and revealed how the method is robust to detect objects, even in presence of strong shadows and highlights. In the other hand to refine their playing strategy such as in football, handball, volley ball, Rugby, the coach need to have a maximum of technical-tactics information about the on-going of the game and the players. We propose in this paper a range of algorithms allowing the resolution of many problems appearing in the automated process of team identification, where each player is affected to his corresponding team relying on visual data. The developed system was tested on a match of the Tunisian national competition. This work is prominent for many next computer vision studies as it's detailed in this study.

  7. Synthesis of supervised classification algorithm using intelligent and statistical tools

    CERN Document Server

    Douik, Ali

    2009-01-01

    A fundamental task in detecting foreground objects in both static and dynamic scenes is to take the best choice of color system representation and the efficient technique for background modeling. We propose in this paper a non-parametric algorithm dedicated to segment and to detect objects in color images issued from a football sports meeting. Indeed segmentation by pixel concern many applications and revealed how the method is robust to detect objects, even in presence of strong shadows and highlights. In the other hand to refine their playing strategy such as in football, handball, volley ball, Rugby..., the coach need to have a maximum of technical-tactics information about the on-going of the game and the players. We propose in this paper a range of algorithms allowing the resolution of many problems appearing in the automated process of team identification, where each player is affected to his corresponding team relying on visual data. The developed system was tested on a match of the Tunisian national c...

  8. [Comparative efficiency of algorithms based on support vector machines for binary classification].

    Science.gov (United States)

    Kadyrova, N O; Pavlova, L V

    2015-01-01

    Methods of construction of support vector machines require no further a priori infoimation and provide big data processing, what is especially important for various problems in computational biology. The question of the quality of learning algorithms is considered. The main algorithms of support vector machines for binary classification are reviewed and they were comparatively explored for their efficiencies. The critical analysis of the results of this study revealed the most effective support-vector-classifiers. The description of the recommended algorithms, sufficient for their practical implementation, is presented.

  9. Land use mapping from CBERS-2 images with open source tools by applying different classification algorithms

    Science.gov (United States)

    Sanhouse-García, Antonio J.; Rangel-Peraza, Jesús Gabriel; Bustos-Terrones, Yaneth; García-Ferrer, Alfonso; Mesas-Carrascosa, Francisco J.

    2016-02-01

    Land cover classification is often based on different characteristics between their classes, but with great homogeneity within each one of them. This cover is obtained through field work or by mean of processing satellite images. Field work involves high costs; therefore, digital image processing techniques have become an important alternative to perform this task. However, in some developing countries and particularly in Casacoima municipality in Venezuela, there is a lack of geographic information systems due to the lack of updated information and high costs in software license acquisition. This research proposes a low cost methodology to develop thematic mapping of local land use and types of coverage in areas with scarce resources. Thematic mapping was developed from CBERS-2 images and spatial information available on the network using open source tools. The supervised classification method per pixel and per region was applied using different classification algorithms and comparing them among themselves. Classification method per pixel was based on Maxver algorithms (maximum likelihood) and Euclidean distance (minimum distance), while per region classification was based on the Bhattacharya algorithm. Satisfactory results were obtained from per region classification, where overall reliability of 83.93% and kappa index of 0.81% were observed. Maxver algorithm showed a reliability value of 73.36% and kappa index 0.69%, while Euclidean distance obtained values of 67.17% and 0.61% for reliability and kappa index, respectively. It was demonstrated that the proposed methodology was very useful in cartographic processing and updating, which in turn serve as a support to develop management plans and land management. Hence, open source tools showed to be an economically viable alternative not only for forestry organizations, but for the general public, allowing them to develop projects in economically depressed and/or environmentally threatened areas.

  10. Comparison of GOES Cloud Classification Algorithms Employing Explicit and Implicit Physics

    Science.gov (United States)

    Bankert, Richard L.; Mitrescu, Cristian; Miller, Steven D.; Wade, Robert H.

    2009-01-01

    Cloud-type classification based on multispectral satellite imagery data has been widely researched and demonstrated to be useful for distinguishing a variety of classes using a wide range of methods. The research described here is a comparison of the classifier output from two very different algorithms applied to Geostationary Operational Environmental Satellite (GOES) data over the course of one year. The first algorithm employs spectral channel thresholding and additional physically based tests. The second algorithm was developed through a supervised learning method with characteristic features of expertly labeled image samples used as training data for a 1-nearest-neighbor classification. The latter's ability to identify classes is also based in physics, but those relationships are embedded implicitly within the algorithm. A pixel-to-pixel comparison analysis was done for hourly daytime scenes within a region in the northeastern Pacific Ocean. Considerable agreement was found in this analysis, with many of the mismatches or disagreements providing insight to the strengths and limitations of each classifier. Depending upon user needs, a rule-based or other postprocessing system that combines the output from the two algorithms could provide the most reliable cloud-type classification.

  11. Active Learning Algorithms for the Classification of Hyperspectral Sea Ice Images

    Directory of Open Access Journals (Sweden)

    Yanling Han

    2015-01-01

    Full Text Available Sea ice is one of the most critical marine disasters, especially in the polar and high latitude regions. Hyperspectral image is suitable for monitoring the sea ice, which contains continuous spectrum information and has better ability of target recognition. The principal bottleneck for the classification of hyperspectral image is a large number of labeled training samples required. However, the collection of labeled samples is time consuming and costly. In order to solve this problem, we apply the active learning (AL algorithm to hyperspectral sea ice detection which can select the most informative samples. Moreover, we propose a novel investigated AL algorithm based on the evaluation of two criteria: uncertainty and diversity. The uncertainty criterion is based on the difference between the probabilities of the two classes having the highest estimated probabilities, while the diversity criterion is based on a kernel k-means clustering technology. In the experiments of Baffin Bay in northwest Greenland on April 12, 2014, our proposed AL algorithm achieves the highest classification accuracy of 89.327% compared with other AL algorithms and random sampling, while achieving the same classification accuracy, the proposed AL algorithm needs less labeling cost.

  12. Image processing and classification algorithm for yeast cell morphology in a microfluidic chip

    Science.gov (United States)

    Yang Yu, Bo; Elbuken, Caglar; Ren, Carolyn L.; Huissoon, Jan P.

    2011-06-01

    The study of yeast cell morphology requires consistent identification of cell cycle phases based on cell bud size. A computer-based image processing algorithm is designed to automatically classify microscopic images of yeast cells in a microfluidic channel environment. The images were enhanced to reduce background noise, and a robust segmentation algorithm is developed to extract geometrical features including compactness, axis ratio, and bud size. The features are then used for classification, and the accuracy of various machine-learning classifiers is compared. The linear support vector machine, distance-based classification, and k-nearest-neighbor algorithm were the classifiers used in this experiment. The performance of the system under various illumination and focusing conditions were also tested. The results suggest it is possible to automatically classify yeast cells based on their morphological characteristics with noisy and low-contrast images.

  13. Fanning - A classification algorithm for mixture landscapes applied to Landsat data of Maine forests

    Science.gov (United States)

    Ungar, S. G.; Bryant, E.

    1981-01-01

    It is pointed out that typical landscapes include a relatively small number of 'pure' land cover types which combine in various proportions to form a myriad of mixture types. Most Landsat classifications algorithms used today require a separate user specification for each category, including mixture categories. Attention is given to a simpler approach, which would require the user to specify only the 'pure' types. Mixture pixels would be classified on the basis of the proportion of the area covered by each pure type within the pixel. The 'fanning' algorithm quantifies varying proportions of two 'pure' land cover types in selected mixture pixels. This algorithm was applied to 200,000 ha of forest land in Maine, taking into account a comparison with standard inventory information. Results compared well with a discrete categories classification of the same area.

  14. Image processing and classification algorithm for yeast cell morphology in a microfluidic chip.

    Science.gov (United States)

    Yang Yu, Bo; Elbuken, Caglar; Ren, Carolyn L; Huissoon, Jan P

    2011-06-01

    The study of yeast cell morphology requires consistent identification of cell cycle phases based on cell bud size. A computer-based image processing algorithm is designed to automatically classify microscopic images of yeast cells in a microfluidic channel environment. The images were enhanced to reduce background noise, and a robust segmentation algorithm is developed to extract geometrical features including compactness, axis ratio, and bud size. The features are then used for classification, and the accuracy of various machine-learning classifiers is compared. The linear support vector machine, distance-based classification, and k-nearest-neighbor algorithm were the classifiers used in this experiment. The performance of the system under various illumination and focusing conditions were also tested. The results suggest it is possible to automatically classify yeast cells based on their morphological characteristics with noisy and low-contrast images.

  15. Robust algorithm for arrhythmia classification in ECG using extreme learning machine

    Directory of Open Access Journals (Sweden)

    Shin Kwangsoo

    2009-10-01

    Full Text Available Abstract Background Recently, extensive studies have been carried out on arrhythmia classification algorithms using artificial intelligence pattern recognition methods such as neural network. To improve practicality, many studies have focused on learning speed and the accuracy of neural networks. However, algorithms based on neural networks still have some problems concerning practical application, such as slow learning speeds and unstable performance caused by local minima. Methods In this paper we propose a novel arrhythmia classification algorithm which has a fast learning speed and high accuracy, and uses Morphology Filtering, Principal Component Analysis and Extreme Learning Machine (ELM. The proposed algorithm can classify six beat types: normal beat, left bundle branch block, right bundle branch block, premature ventricular contraction, atrial premature beat, and paced beat. Results The experimental results of the entire MIT-BIH arrhythmia database demonstrate that the performances of the proposed algorithm are 98.00% in terms of average sensitivity, 97.95% in terms of average specificity, and 98.72% in terms of average accuracy. These accuracy levels are higher than or comparable with those of existing methods. We make a comparative study of algorithm using an ELM, back propagation neural network (BPNN, radial basis function network (RBFN, or support vector machine (SVM. Concerning the aspect of learning time, the proposed algorithm using ELM is about 290, 70, and 3 times faster than an algorithm using a BPNN, RBFN, and SVM, respectively. Conclusion The proposed algorithm shows effective accuracy performance with a short learning time. In addition we ascertained the robustness of the proposed algorithm by evaluating the entire MIT-BIH arrhythmia database.

  16. Seasonal cultivated and fallow cropland mapping using MODIS-based automated cropland classification algorithm

    Science.gov (United States)

    Wu, Zhuoting; Thenkabail, Prasad S.; Mueller, Rick; Zakzeski, Audra; Melton, Forrest; Johnson, Lee; Rosevelt, Carolyn; Dwyer, John; Jones, Jeanine; Verdin, James P.

    2013-01-01

    Increasing drought occurrences and growing populations demand accurate, routine, and consistent cultivated and fallow cropland products to enable water and food security analysis. The overarching goal of this research was to develop and test automated cropland classification algorithm (ACCA) that provide accurate, consistent, and repeatable information on seasonal cultivated as well as seasonal fallow cropland extents and areas based on the Moderate Resolution Imaging Spectroradiometer remote sensing data. Seasonal ACCA development process involves writing series of iterative decision tree codes to separate cultivated and fallow croplands from noncroplands, aiming to accurately mirror reliable reference data sources. A pixel-by-pixel accuracy assessment when compared with the U.S. Department of Agriculture (USDA) cropland data showed, on average, a producer’s accuracy of 93% and a user’s accuracy of 85% across all months. Further, ACCA-derived cropland maps agreed well with the USDA Farm Service Agency crop acreage-reported data for both cultivated and fallow croplands with R-square values over 0.7 and field surveys with an accuracy of ≥95% for cultivated croplands and ≥76% for fallow croplands. Our results demonstrated the ability of ACCA to generate cropland products, such as cultivated and fallow cropland extents and areas, accurately, automatically, and repeatedly throughout the growing season.

  17. Liver disorder diagnosis using linear, nonlinear and decision tree classification algorithms

    Directory of Open Access Journals (Sweden)

    Aman Singh

    2016-10-01

    Full Text Available In India and across the globe, liver disease is a serious area of concern in medicine. Therefore, it becomes essential to use classification algorithms for assessing the disease in order to improve the efficiency of medical diagnosis which eventually leads to appropriate and timely treatment. The study accordingly implemented various classification algorithms including linear discriminant analysis (LDA, diagonal linear discriminant analysis (DLDA, quadratic discriminant analysis (QDA, diagonal quadratic discriminant analysis (DQDA, naive bayes (NB, feed-forward neural network (FFNN and classification and regression tree (CART in an attempt to enhance the diagnostic accuracy of liver disorder and to reduce the inefficiencies caused by false diagnosis. The results demonstrated that CART had emerged as the best model by achieving higher diagnostic accuracy than LDA, DLDA, QDA, DQDA, NB and FFNN. FFNN stood second in comparison and performed better than rest of the classifiers. After evaluation, it can be said that the precision of a classification algorithm depends on the type and features of a dataset. For the given dataset, decision tree classifier CART outperforms all other linear and nonlinear classifiers. It also showed the capability of assisting clinicians in determining the existence of liver disorder, in attaining better diagnosis and in avoiding delay in treatment.

  18. A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification

    CERN Document Server

    Weber, I; Henzinger, M; Baykan, E

    2011-01-01

    Given only the URL of a Web page, can we identify its topic? We study this problem in detail by exploring a large number of different feature sets and algorithms on several datasets. We also show that the inherent overlap between topics and the sparsity of the information in URLs makes this a very challenging problem. Web page classification without a page's content is desirable when the content is not available at all, when a classification is needed before obtaining the content, or when classification speed is of utmost importance. For our experiments we used five different corpora comprising a total of about 3 million (URL, classification) pairs. We evaluated several techniques for feature generation and classification algorithms. The individual binary classifiers were then combined via boosting into metabinary classifiers. We achieve typical F-measure values between 80 and 85, and a typical precision of around 86. The precision can be pushed further over 90 while maintaining a typical level of recall betw...

  19. Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data.

    Science.gov (United States)

    Zintzaras, Elias; Kowald, Axel

    2010-05-01

    Classification into multiple classes when the measured variables are outnumbered is a major methodological challenge in -omics studies. Two algorithms that overcome the dimensionality problem are presented: the forest classification tree (FCT) and the forest support vector machines (FSVM). In FCT, a set of variables is randomly chosen and a classification tree (CT) is grown using a forward classification algorithm. The process is repeated and a forest of CTs is derived. Finally, the most frequent variables from the trees with the smallest apparent misclassification rate (AMR) are used to construct a productive tree. In FSVM, the CTs are replaced by SVMs. The methods are demonstrated using prostate gene expression data for classifying tissue samples into four tumor types. For threshold split value 0.001 and utilizing 100 markers the productive CT consisted of 29 terminal nodes and achieved perfect classification (AMR=0). When the threshold value was set to 0.01, a tree with 17 terminal nodes was constructed based on 15 markers (AMR=7%). In FSVM, reducing the fraction of the forest that was used to construct the best classifier from the top 80% to the top 20% reduced the misclassification to 25% (when using 200 markers). The proposed methodologies may be used for identifying important variables in high dimensional data. Furthermore, the FCT allows exploring the data structure and provides a decision rule.

  20. The Application of Multiobjective Genetic Algorithm to the Parameter Optimization of Single-Well Potential Stochastic Resonance Algorithm Aimed at Simultaneous Determination of Multiple Weak Chromatographic Peaks

    Directory of Open Access Journals (Sweden)

    Haishan Deng

    2014-01-01

    Full Text Available Simultaneous determination of multiple weak chromatographic peaks via stochastic resonance algorithm attracts much attention in recent years. However, the optimization of the parameters is complicated and time consuming, although the single-well potential stochastic resonance algorithm (SSRA has already reduced the number of parameters to only one and simplified the process significantly. Even worse, it is often difficult to keep amplified peaks with beautiful peak shape. Therefore, multiobjective genetic algorithm was employed to optimize the parameter of SSRA for multiple optimization objectives (i.e., S/N and peak shape and multiple chromatographic peaks. The applicability of the proposed method was evaluated with an experimental data set of Sudan dyes, and the results showed an excellent quantitative relationship between different concentrations and responses.

  1. Spectral Classification of Similar Materials using the Tetracorder Algorithm: The Calcite-Epidote-Chlorite Problem

    Science.gov (United States)

    Dalton, J. Brad; Bove, Dana; Mladinich, Carol; Clark, Roger; Rockwell, Barnaby; Swayze, Gregg; King, Trude; Church, Stanley

    2001-01-01

    Recent work on automated spectral classification algorithms has sought to distinguish ever-more similar materials. From modest beginnings separating shade, soil, rock and vegetation to ambitious attempts to discriminate mineral types and specific plant species, the trend seems to be toward using increasingly subtle spectral differences to perform the classification. Rule-based expert systems exploiting the underlying physics of spectroscopy such as the US Geological Society Tetracorder system are now taking advantage of the high spectral resolution and dimensionality of current imaging spectrometer designs to discriminate spectrally similar materials. The current paper details recent efforts to discriminate three minerals having absorptions centered at the same wavelength, with encouraging results.

  2. Classification of Aerosol Retrievals from Spaceborne Polarimetry Using a Multiparameter Algorithm

    Science.gov (United States)

    Russell, Philip B.; Kacenelenbogen, Meloe; Livingston, John M.; Hasekamp, Otto P.; Burton, Sharon P.; Schuster, Gregory L.; Johnson, Matthew S.; Knobelspiesse, Kirk D.; Redemann, Jens; Ramachandran, S.; Holben, Brent

    2013-01-01

    In this presentation, we demonstrate application of a new aerosol classification algorithm to retrievals from the POLDER-3 polarimter on the PARASOL spacecraft. Motivation and method: Since the development of global aerosol measurements by satellites and AERONET, classification of observed aerosols into several types (e.g., urban-industrial, biomass burning, mineral dust, maritime, and various subtypes or mixtures of these) has proven useful to: understanding aerosol sources, transformations, effects, and feedback mechanisms; improving accuracy of satellite retrievals and quantifying assessments of aerosol radiative impacts on climate.

  3. Comparison of some classification algorithms based on deterministic and nondeterministic decision rules

    KAUST Repository

    Delimata, Paweł

    2010-01-01

    We discuss two, in a sense extreme, kinds of nondeterministic rules in decision tables. The first kind of rules, called as inhibitory rules, are blocking only one decision value (i.e., they have all but one decisions from all possible decisions on their right hand sides). Contrary to this, any rule of the second kind, called as a bounded nondeterministic rule, can have on the right hand side only a few decisions. We show that both kinds of rules can be used for improving the quality of classification. In the paper, two lazy classification algorithms of polynomial time complexity are considered. These algorithms are based on deterministic and inhibitory decision rules, but the direct generation of rules is not required. Instead of this, for any new object the considered algorithms extract from a given decision table efficiently some information about the set of rules. Next, this information is used by a decision-making procedure. The reported results of experiments show that the algorithms based on inhibitory decision rules are often better than those based on deterministic decision rules. We also present an application of bounded nondeterministic rules in construction of rule based classifiers. We include the results of experiments showing that by combining rule based classifiers based on minimal decision rules with bounded nondeterministic rules having confidence close to 1 and sufficiently large support, it is possible to improve the classification quality. © 2010 Springer-Verlag.

  4. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  5. Human Talent Prediction in HRM using C4.5 Classification Algorithm

    Directory of Open Access Journals (Sweden)

    Hamidah Jantan,

    2010-11-01

    Full Text Available In HRM, among the challenges for HR professionals is to manage an organization’s talents, especially to ensure the right person for the right job at the right time. Human talent prediction is an alternative to handle this issue. Due to that reason, classification and prediction in data mining which is commonly used in many areas can also be implemented to human talent. There are many classification techniques in data mining techniques such as Decision Tree, Neural Network, Rough Set Theory, Bayesian theory and Fuzzy logic. Decision tree is among the popular classification techniques, which can produce the interpretable rules or logic statement. Thegenerated rules from the selected technique can be used for future prediction. In this article, we present the study on how the potential human talent can be predicted using a decision tree classifier. By using this technique, the pattern of talent performance can be identified through the classification process. In that case, the hidden and valuable knowledge discovered in the related databases will be summarized in the decision tree structure. In this study, we use decision tree C4.5 classification algorithm to generate the classification rules for human talent performance records. Finally, the generated rules are evaluated using the unseen data in order to estimate the accuracy of the prediction result.

  6. Application of the probability-based covering algorithm model in text classification

    Institute of Scientific and Technical Information of China (English)

    ZHOU; Ying

    2009-01-01

    The probability-based covering algorithm(PBCA)is a new algorithm based on probability distribution.It decides,by voting,the class of the tested samples on the border of the coverage area,based on the probability of training samples.When using the original covering algorithm(CA),many tested samples that are located on the border of the coverage cannot be classified by the spherical neighborhood gained.The network structure of PBCA is a mixed structure composed of both a feed-forward network and a feedback network.By using this method of adding some heterogeneous samples and enlarging the coverage radius,it is possible to decrease the number of rejected samples and improve the rate of recognition accuracy.Relevant computer experiments indicate that the algorithm improves the study precision and achieves reasonably good results in text classification.

  7. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    Science.gov (United States)

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-09-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p  < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

  8. A Region-Based GeneSIS Segmentation Algorithm for the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    Stelios K. Mylonas

    2015-03-01

    Full Text Available This paper proposes an object-based segmentation/classification scheme for remotely sensed images, based on a novel variant of the recently proposed Genetic Sequential Image Segmentation (GeneSIS algorithm. GeneSIS segments the image in an iterative manner, whereby at each iteration a single object is extracted via a genetic-based object extraction algorithm. Contrary to the previous pixel-based GeneSIS where the candidate objects to be extracted were evaluated through the fuzzy content of their included pixels, in the newly developed region-based GeneSIS algorithm, a watershed-driven fine segmentation map is initially obtained from the original image, which serves as the basis for the forthcoming GeneSIS segmentation. Furthermore, in order to enhance the spatial search capabilities, we introduce a more descriptive encoding scheme in the object extraction algorithm, where the structural search modules are represented by polygonal shapes. Our objectives in the new framework are posed as follows: enhance the flexibility of the algorithm in extracting more flexible object shapes, assure high level classification accuracies, and reduce the execution time of the segmentation, while at the same time preserving all the inherent attributes of the GeneSIS approach. Finally, exploiting the inherent attribute of GeneSIS to produce multiple segmentations, we also propose two segmentation fusion schemes that operate on the ensemble of segmentations generated by GeneSIS. Our approaches are tested on an urban and two agricultural images. The results show that region-based GeneSIS has considerably lower computational demands compared to the pixel-based one. Furthermore, the suggested methods achieve higher classification accuracies and good segmentation maps compared to a series of existing algorithms.

  9. Algorithms for the Automatic Classification and Sorting of Conifers in the Garden Nursery Industry

    DEFF Research Database (Denmark)

    Petri, Stig

    , resulting in a prototype data acquisition system that can possibly be integrated into a production line (conveyor) system. The developed software includes the necessary functions for acquiring images, normalizing these, extracting features, creating and optimizing classification models, and evaluating...... was used as the basis for evaluating the constructed feature extraction algorithms. Through an analysis of the construction of a machine vision system suitable for classifying and sorting plants, the needs with regard to physical frame, lighting system, camera and software algorithms have been uncovered...

  10. A simulation of remote sensor systems and data processing algorithms for spectral feature classification

    Science.gov (United States)

    Arduini, R. F.; Aherron, R. M.; Samms, R. W.

    1984-01-01

    A computational model of the deterministic and stochastic processes involved in multispectral remote sensing was designed to evaluate the performance of sensor systems and data processing algorithms for spectral feature classification. Accuracy in distinguishing between categories of surfaces or between specific types is developed as a means to compare sensor systems and data processing algorithms. The model allows studies to be made of the effects of variability of the atmosphere and of surface reflectance, as well as the effects of channel selection and sensor noise. Examples of these effects are shown.

  11. Algorithm for optimizing bipolar interconnection weights with applications in associative memories and multitarget classification.

    Science.gov (United States)

    Chang, S; Wong, K W; Zhang, W; Zhang, Y

    1999-08-10

    An algorithm for optimizing a bipolar interconnection weight matrix with the Hopfield network is proposed. The effectiveness of this algorithm is demonstrated by computer simulation and optical implementation. In the optical implementation of the neural network the interconnection weights are biased to yield a nonnegative weight matrix. Moreover, a threshold subchannel is added so that the system can realize, in real time, the bipolar weighted summation in a single channel. Preliminary experimental results obtained from the applications in associative memories and multitarget classification with rotation invariance are shown.

  12. Study on the classification algorithm of degree of arteriosclerosis based on fuzzy pattern recognition

    Science.gov (United States)

    Ding, Li; Zhou, Runjing; Liu, Guiying

    2010-08-01

    Pulse wave of human body contains large amount of physiological and pathological information, so the degree of arteriosclerosis classification algorithm is study based on fuzzy pattern recognition in this paper. Taking the human's pulse wave as the research object, we can extract the characteristic of time and frequency domain of pulse signal, and select the parameters with a better clustering effect for arteriosclerosis identification. Moreover, the validity of characteristic parameters is verified by fuzzy ISODATA clustering method (FISOCM). Finally, fuzzy pattern recognition system can quantitatively distinguish the degree of arteriosclerosis with patients. By testing the 50 samples in the built pulse database, the experimental result shows that the algorithm is practical and achieves a good classification recognition result.

  13. On the Automated Segmentation of Epicardial and Mediastinal Cardiac Adipose Tissues Using Classification Algorithms.

    Science.gov (United States)

    Rodrigues, Érick Oliveira; Cordeiro de Morais, Felipe Fernandes; Conci, Aura

    2015-01-01

    The quantification of fat depots on the surroundings of the heart is an accurate procedure for evaluating health risk factors correlated with several diseases. However, this type of evaluation is not widely employed in clinical practice due to the required human workload. This work proposes a novel technique for the automatic segmentation of cardiac fat pads. The technique is based on applying classification algorithms to the segmentation of cardiac CT images. Furthermore, we extensively evaluate the performance of several algorithms on this task and discuss which provided better predictive models. Experimental results have shown that the mean accuracy for the classification of epicardial and mediastinal fats has been 98.4% with a mean true positive rate of 96.2%. On average, the Dice similarity index, regarding the segmented patients and the ground truth, was equal to 96.8%. Therfore, our technique has achieved the most accurate results for the automatic segmentation of cardiac fats, to date.

  14. Activity recognition in planetary navigation field tests using classification algorithms applied to accelerometer data.

    Science.gov (United States)

    Song, Wen; Ade, Carl; Broxterman, Ryan; Barstow, Thomas; Nelson, Thomas; Warren, Steve

    2012-01-01

    Accelerometer data provide useful information about subject activity in many different application scenarios. For this study, single-accelerometer data were acquired from subjects participating in field tests that mimic tasks that astronauts might encounter in reduced gravity environments. The primary goal of this effort was to apply classification algorithms that could identify these tasks based on features present in their corresponding accelerometer data, where the end goal is to establish methods to unobtrusively gauge subject well-being based on sensors that reside in their local environment. In this initial analysis, six different activities that involve leg movement are classified. The k-Nearest Neighbors (kNN) algorithm was found to be the most effective, with an overall classification success rate of 90.8%.

  15. Granular computing classification algorithms based on distance measures between granules from the view of set.

    Science.gov (United States)

    Liu, Hongbing; Liu, Chunhua; Wu, Chang-an

    2014-01-01

    Granular computing classification algorithms are proposed based on distance measures between two granules from the view of set. Firstly, granules are represented as the forms of hyperdiamond, hypersphere, hypercube, and hyperbox. Secondly, the distance measure between two granules is defined from the view of set, and the union operator between two granules is formed to obtain the granule set including the granules with different granularity. Thirdly the threshold of granularity determines the union between two granules and is used to form the granular computing classification algorithms based on distance measures (DGrC). The benchmark datasets in UCI Machine Learning Repository are used to verify the performance of DGrC, and experimental results show that DGrC improved the testing accuracies.

  16. Classification performance of a block-compressive sensing algorithm for hyperspectral data processing

    Science.gov (United States)

    Arias, Fernando X.; Sierra, Heidy; Arzuaga, Emmanuel

    2016-05-01

    Compressive Sensing is an area of great recent interest for efficient signal acquisition, manipulation and reconstruction tasks in areas where sensor utilization is a scarce and valuable resource. The current work shows that approaches based on this technology can improve the efficiency of manipulation, analysis and storage processes already established for hyperspectral imagery, with little discernible loss in data performance upon reconstruction. We present the results of a comparative analysis of classification performance between a hyperspectral data cube acquired by traditional means, and one obtained through reconstruction from compressively sampled data points. To obtain a broad measure of the classification performance of compressively sensed cubes, we classify a commonly used scene in hyperspectral image processing algorithm evaluation using a set of five classifiers commonly used in hyperspectral image classification. Global accuracy statistics are presented and discussed, as well as class-specific statistical properties of the evaluated data set.

  17. Analysis of magnetic source localization of P300 using the MUSIC (multiple signal classification) algorithm

    OpenAIRE

    魚橋, 哲夫

    2006-01-01

    The authors studied the localization of P300 magnetic sources using the multiple signal classification (MUSIC) algorithm. Six healthy subjects (aged 24–34 years old) were investigated with 148-channel whole-head type magnetencephalography using an auditory oddball paradigm in passive mode. The authors also compared six stimulus combinations in order to find the optimal stimulus parameters for P300 magnetic field (P300m) in passive mode. Bilateral MUSIC peaks were located on the mesial tempora...

  18. Classification of EEG Signals using adaptive weighted distance nearest neighbor algorithm

    OpenAIRE

    E. Parvinnia; M. Sabeti; M. Zolghadri Jahromi; Boostani, R

    2014-01-01

    Electroencephalogram (EEG) signals are often used to diagnose diseases such as seizure, alzheimer, and schizophrenia. One main problem with the recorded EEG samples is that they are not equally reliable due to the artifacts at the time of recording. EEG signal classification algorithms should have a mechanism to handle this issue. It seems that using adaptive classifiers can be useful for the biological signals such as EEG. In this paper, a general adaptive method named weighted distance near...

  19. GriMa: a Grid Mining Algorithm for Bag-of-Grid-Based Classification

    OpenAIRE

    Deville, Romain; Fromont, Elisa; Jeudy, Baptiste; Solnon, Christine

    2016-01-01

    International audience; General-purpose exhaustive graph mining algorithms have seldom been used in real life contexts due to the high complexity of the process that is mostly based on costly isomorphism tests and countless expansion possibilities. In this paper, we explain how to exploit grid-based representations of problems to efficiently extract frequent grid subgraphs and create Bag-of-Grids which can be used as new features for classification purposes. We provide an efficient grid minin...

  20. Hybrid SPR algorithm to select predictive genes for effectual cancer classification

    OpenAIRE

    2012-01-01

    Designing an automated system for classifying DNA microarray data is an extremely challenging problem because of its high dimension and low amount of sample data. In this paper, a hybrid statistical pattern recognition algorithm is proposed to reduce the dimensionality and select the predictive genes for the classification of cancer. Colon cancer gene expression profiles having 62 samples of 2000 genes were used for the experiment. A gene subset of 6 highly informative genes was selecte...

  1. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection

    Directory of Open Access Journals (Sweden)

    Wang Haiyan

    2013-01-01

    Full Text Available Abstract Background One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP, k-Top Scoring Pairs (k-TSP, Support Vector Machines (SVM, and prediction analysis of microarrays (PAM are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes (Chi-TSG classifier simplified as TSG. Results The TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations. Conclusions Redefining the scores for gene set and the classification rules in TSP family

  2. Genetic Algorithm Optimized Back Propagation Neural Network for Knee Osteoarthritis Classification

    Directory of Open Access Journals (Sweden)

    Jian WeiKoh

    2014-10-01

    Full Text Available Osteoarthritis (OA is the most common form of arthritis that caused by degeneration of articular cartilage, which function as shock absorption cushion in our joint. The most common joints that infected by osteoarthritis are hand, hip, spine and knee. Knee osteoarthritis is the focus in this study. These days, Magnetic Resonance Imaging (MRI technique is widely applied in diagnosis the progression of osteoarthritis due to the ability to display the contrast between bone and cartilage. Traditionally, interpretation of MR image is done manually by physicians who are very inconsistent and time consuming. Hence, automated classifier is needed for minimize the processing time of classification. In this study, genetic algorithm optimized neural network technique is used for the knee osteoarthritis classification. This classifier consists of 4 stages, which are feature extraction by Discrete Wavelet Transform (DWT, training stage of neural network, testing stage of neural network and optimization stage by Genetic Algorithm (GA. This technique obtained 98.5% of classification accuracy when training and 94.67% on testing stage. Besides, classification time is reduced by 17.24% after optimization of the neural network.

  3. Automatic classification of pathological gait patterns using ground reaction forces and machine learning algorithms.

    Science.gov (United States)

    Alaqtash, Murad; Sarkodie-Gyan, Thompson; Yu, Huiying; Fuentes, Olac; Brower, Richard; Abdelgawad, Amr

    2011-01-01

    An automated gait classification method is developed in this study, which can be applied to analysis and to classify pathological gait patterns using 3D ground reaction force (GRFs) data. The study involved the discrimination of gait patterns of healthy, cerebral palsy (CP) and multiple sclerosis subjects. The acquired 3D GRFs data were categorized into three groups. Two different algorithms were used to extract the gait features; the GRFs parameters and the discrete wavelet transform (DWT), respectively. Nearest neighbor classifier (NNC) and artificial neural networks (ANN) were also investigated for the classification of gait features in this study. Furthermore, different feature sets were formed using a combination of the 3D GRFs components (mediolateral, anterioposterior, and vertical) and their various impacts on the acquired results were evaluated. The best leave-one-out (LOO) classification accuracy 85% was achieved. The results showed some improvement through the application of a features selection algorithm based on M-shaped value of vertical force and the statistical test ANOVA of mediolateral and anterioposterior forces. The optimal feature set of six features enhanced the accuracy to 95%. This work can provide an automated gait classification tool that may be useful to the clinician in the diagnosis and identification of pathological gait impairments.

  4. Classification and authentication of unknown water samples using machine learning algorithms.

    Science.gov (United States)

    Kundu, Palash K; Panchariya, P C; Kundu, Madhusree

    2011-07-01

    This paper proposes the development of water sample classification and authentication, in real life which is based on machine learning algorithms. The proposed techniques used experimental measurements from a pulse voltametry method which is based on an electronic tongue (E-tongue) instrumentation system with silver and platinum electrodes. E-tongue include arrays of solid state ion sensors, transducers even of different types, data collectors and data analysis tools, all oriented to the classification of liquid samples and authentication of unknown liquid samples. The time series signal and the corresponding raw data represent the measurement from a multi-sensor system. The E-tongue system, implemented in a laboratory environment for 6 numbers of different ISI (Bureau of Indian standard) certified water samples (Aquafina, Bisleri, Kingfisher, Oasis, Dolphin, and McDowell) was the data source for developing two types of machine learning algorithms like classification and regression. A water data set consisting of 6 numbers of sample classes containing 4402 numbers of features were considered. A PCA (principal component analysis) based classification and authentication tool was developed in this study as the machine learning component of the E-tongue system. A proposed partial least squares (PLS) based classifier, which was dedicated as well; to authenticate a specific category of water sample evolved out as an integral part of the E-tongue instrumentation system. The developed PCA and PLS based E-tongue system emancipated an overall encouraging authentication percentage accuracy with their excellent performances for the aforesaid categories of water samples.

  5. Evolving Neural Network Using Variable String Genetic Algorithm for Color Infrared Aerial Image Classification

    Institute of Scientific and Technical Information of China (English)

    FU Xiaoyang; P E R Dale; ZHANG Shuqing

    2008-01-01

    Coastal wetlands are characterized by complex patterns both in their geomorphic and ecological features.Besides field observations,it is necessary to analyze the land cover of wetlands through the color infrared (CIR) aerial photography or remote sensing image.In this paper,we designed an evolving neural network classifier using variable string genetic algorithm (VGA) for the land cover classification of CIR aerial image.With the VGA,the classifier that we designed is able to evolve automatically the appropriate number of hidden nodes for modeling the neural network topology optimally and to find a near-optimal set of connection weights globally.Then,with backpropagation algorithm (BP),it can find the best connection weights.The VGA-BP classifier,which is derived from hybrid algorithms mentioned above,is demonstrated on CIR images classification effectively.Compared with standard classifiers,such as Bayes maximum-likelihood classifier,VGA classifier and BP-MLP (multi-layer perception) classifier,it has shown that the VGA-BP classifier can have better performance on highly resolution land cover classification.

  6. Detecting cognitive impairment by eye movement analysis using automatic classification algorithms.

    Science.gov (United States)

    Lagun, Dmitry; Manzanares, Cecelia; Zola, Stuart M; Buffalo, Elizabeth A; Agichtein, Eugene

    2011-09-30

    The Visual Paired Comparison (VPC) task is a recognition memory test that has shown promise for the detection of memory impairments associated with mild cognitive impairment (MCI). Because patients with MCI often progress to Alzheimer's Disease (AD), the VPC may be useful in predicting the onset of AD. VPC uses noninvasive eye tracking to identify how subjects view novel and repeated visual stimuli. Healthy control subjects demonstrate memory for the repeated stimuli by spending more time looking at the novel images, i.e., novelty preference. Here, we report an application of machine learning methods from computer science to improve the accuracy of detecting MCI by modeling eye movement characteristics such as fixations, saccades, and re-fixations during the VPC task. These characteristics are represented as features provided to automatic classification algorithms such as Support Vector Machines (SVMs). Using the SVM classification algorithm, in tandem with modeling the patterns of fixations, saccade orientation, and regression patterns, our algorithm was able to automatically distinguish age-matched normal control subjects from MCI subjects with 87% accuracy, 97% sensitivity and 77% specificity, compared to the best available classification performance of 67% accuracy, 60% sensitivity, and 73% specificity when using only the novelty preference information. These results demonstrate the effectiveness of applying machine-learning techniques to the detection of MCI, and suggest a promising approach for detection of cognitive impairments associated with other disorders.

  7. A Hybrid Multiobjective Differential Evolution Algorithm and Its Application to the Optimization of Grinding and Classification

    Directory of Open Access Journals (Sweden)

    Yalin Wang

    2013-01-01

    Full Text Available The grinding-classification is the prerequisite process for full recovery of the nonrenewable minerals with both production quality and quantity objectives concerned. Its natural formulation is a constrained multiobjective optimization problem of complex expression since the process is composed of one grinding machine and two classification machines. In this paper, a hybrid differential evolution (DE algorithm with multi-population is proposed. Some infeasible solutions with better performance are allowed to be saved, and they participate randomly in the evolution. In order to exploit the meaningful infeasible solutions, a functionally partitioned multi-population mechanism is designed to find an optimal solution from all possible directions. Meanwhile, a simplex method for local search is inserted into the evolution process to enhance the searching strategy in the optimization process. Simulation results from the test of some benchmark problems indicate that the proposed algorithm tends to converge quickly and effectively to the Pareto frontier with better distribution. Finally, the proposed algorithm is applied to solve a multiobjective optimization model of a grinding and classification process. Based on the technique for order performance by similarity to ideal solution (TOPSIS, the satisfactory solution is obtained by using a decision-making method for multiple attributes.

  8. Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms.

    Science.gov (United States)

    Lin, Kuan-Cheng; Hsieh, Yi-Hsiu

    2015-10-01

    The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.

  9. Classification of frontal cortex haemodynamic responses during cognitive tasks using wavelet transforms and machine learning algorithms.

    Science.gov (United States)

    Abibullaev, Berdakh; An, Jinung

    2012-12-01

    Recent advances in neuroimaging demonstrate the potential of functional near-infrared spectroscopy (fNIRS) for use in brain-computer interfaces (BCIs). fNIRS uses light in the near-infrared range to measure brain surface haemoglobin concentrations and thus determine human neural activity. Our primary goal in this study is to analyse brain haemodynamic responses for application in a BCI. Specifically, we develop an efficient signal processing algorithm to extract important mental-task-relevant neural features and obtain the best possible classification performance. We recorded brain haemodynamic responses due to frontal cortex brain activity from nine subjects using a 19-channel fNIRS system. Our algorithm is based on continuous wavelet transforms (CWTs) for multi-scale decomposition and a soft thresholding algorithm for de-noising. We adopted three machine learning algorithms and compared their performance. Good performance can be achieved by using the de-noised wavelet coefficients as input features for the classifier. Moreover, the classifier performance varied depending on the type of mother wavelet used for wavelet decomposition. Our quantitative results showed that CWTs can be used efficiently to extract important brain haemodynamic features at multiple frequencies if an appropriate mother wavelet function is chosen. The best classification results were obtained by a specific combination of input feature type and classifier.

  10. Clinical significance of bcl-2 protein expression and classification algorithm in diffuse large B-cell lymphoma

    Institute of Scientific and Technical Information of China (English)

    李敏

    2013-01-01

    Objective To investigate the clinical significance of bcl-2 protein expression and three classification algorithms including Hans model,Chan model and Muris model in patients with diffuse large B-cell lymphoma(DLBCL).

  11. A BENCHMARK TO SELECT DATA MINING BASED CLASSIFICATION ALGORITHMS FOR BUSINESS INTELLIGENCE AND DECISION SUPPORT SYSTEMS

    Directory of Open Access Journals (Sweden)

    Pardeep Kumar

    2012-09-01

    Full Text Available In today’s business scenario, we percept major changes in how managers use computerized support inmaking decisions. As more number of decision-makers use computerized support in decision making,decision support systems (DSS is developing from its starting as a personal support tool and is becomingthe common resource in an organization. DSS serve the management, operations, and planning levels of anorganization and help to make decisions, which may be rapidly changing and not easily specified inadvance. Data mining has a vital role to extract important information to help in decision making of adecision support system. It has been the active field of research in the last two-three decades. Integration ofdata mining and decision support systems (DSS can lead to the improved performance and can enable thetackling of new types of problems. Artificial Intelligence methods are improving the quality of decisionsupport, and have become embedded in many applications ranges from ant locking automobile brakes tothese days interactive search engines. It provides various machine learning techniques to support datamining. The classification is one of the main and valuable tasks of data mining. Several types ofclassification algorithms have been suggested, tested and compared to determine the future trends based onunseen data. There has been no single algorithm found to be superior over all others for all data sets.Various issues such as predictive accuracy, training time to build the model, robustness and scalabilitymust be considered and can have tradeoffs, further complex the quest for an overall superior method. Theobjective of this paper is to compare various classification algorithms that have been frequently used indata mining for decision support systems. Three decision trees based algorithms, one artificial neuralnetwork, one statistical, one support vector machines with and without adaboost and one clusteringalgorithm are tested and compared on

  12. Kernel Clustering with a Differential Harmony Search Algorithm for Scheme Classification

    Directory of Open Access Journals (Sweden)

    Yu Feng

    2017-01-01

    Full Text Available This paper presents a kernel fuzzy clustering with a novel differential harmony search algorithm to coordinate with the diversion scheduling scheme classification. First, we employed a self-adaptive solution generation strategy and differential evolution-based population update strategy to improve the classical harmony search. Second, we applied the differential harmony search algorithm to the kernel fuzzy clustering to help the clustering method obtain better solutions. Finally, the combination of the kernel fuzzy clustering and the differential harmony search is applied for water diversion scheduling in East Lake. A comparison of the proposed method with other methods has been carried out. The results show that the kernel clustering with the differential harmony search algorithm has good performance to cooperate with the water diversion scheduling problems.

  13. Classification decision tree algorithm assisting in diagnosing solitary pulmonary nodule by SPECT/CT fusion imaging

    Institute of Scientific and Technical Information of China (English)

    Qiang Yongqian; Guo Youmin; Jin Chenwang; Liu Min; Yang Aimin; Wang Qiuping; Niu Gang

    2008-01-01

    Objective To develop a classification tree algorithm to improve diagnostic performances of 99mTc-MIBI SPECT/CT fusion imaging in differentiating solitary pulmonary nodules (SPNs). Methods Forty-four SPNs, including 30 malignant cases and 14 benign ones that were eventually pathologically identified, were included in this prospective study. All patients received 99Tcm-MIBI SPECT/CT scanning at an early stage and a delayed stage before operation. Thirty predictor variables, including 11 clinical variables, 4 variables of emission and 15 variables of transmission information from SPECT/CT scanning, were analyzed independently by the classification tree algorithm and radiological residents. Diagnostic rules were demonstrated in tree-topology, and diagnostic performances were compared with Area under Curve (AUC) of Receiver Operating Characteristic Curve (ROC). Results A classification decision tree with lowest relative cost of 0.340 was developed for 99Tcm-MIBI SPECT/CT scanning in which the value of Target/Normal region of 99Tcm-MIBI uptake in the delayed stage and in the early stage, age, cough and specula sign were five most important contributors. The sensitivity and specificity were 93.33% and 78. 57e, respectively, a little higher than those of the expert. The sensitivity and specificity by residents of Grade one were 76.67% and 28.57%, respectively, and AUC of CART and expert was 0.886±0.055 and 0.829±0.062, respectively, and the corresponding AUC of residents was 0.566±0.092. Comparisons of AUCs suggest that performance of CART was similar to that of expert (P=0.204), but greater than that of residents (P<0.001). Conclusion Our data mining technique using classification decision tree has a much higher accuracy than residents. It suggests that the application of this algorithm will significantly improve the diagnostic performance of residents.

  14. Improved SMO Text Classification Algorithm%改进的SMO文本分类算法

    Institute of Scientific and Technical Information of China (English)

    王欣欣; 赖惠成

    2011-01-01

    The support vector maehine(SVM) text classification algorithm is widely applied, and its special case is the sequence of minimum optimization(SMO) algorithm. SMO algorithm, with blocks and decomposition technology, is simple and easy to implement but slow in convergence, and has many iterative times. The solution for this is to improve the selection algorithm in the working set of SMO algorithm, and update the step factor, thus to make objective function decline as much as possible. With this goal, the improved SMO algorithm is proposed, thus to further improve the SVM training speed and the classification accuracy.%支持向量机(SVM)的文本分类算法被广泛应用,其中序列最小优化算法(SMO)是它的一个特例。SMO算法使用了块与分解技术,简单并且容易实现,但是它的收敛较慢,迭代次数较多。解决的办法是改进SMO算法中工作集的选择算法,并更新步长因子,目的是为了使目标函数尽可能地下降。文中基于这个目标提出了改进的SMO算法来进一步提高SVM的训练速度和分类的准确程度。

  15. Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.

    Science.gov (United States)

    Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

    2016-01-01

    Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.

  16. Optimal Combination of Classification Algorithms and Feature Ranking Methods for Object-Based Classification of Submeter Resolution Z/I-Imaging DMC Imagery

    OpenAIRE

    Fulgencio Cánovas-García; Francisco Alonso-Sarría

    2015-01-01

    Object-based image analysis allows several different features to be calculated for the resulting objects. However, a large number of features means longer computing times and might even result in a loss of classification accuracy. In this study, we use four feature ranking methods (maximum correlation, average correlation, Jeffries–Matusita distance and mean decrease in the Gini index) and five classification algorithms (linear discriminant analysis, naive Bayes, weighted k-nearest neighbors,...

  17. Ensemble learning algorithms for classification of mtDNA into haplogroups.

    Science.gov (United States)

    Wong, Carol; Li, Yuran; Lee, Chih; Huang, Chun-Hsi

    2011-01-01

    Classification of mitochondrial DNA (mtDNA) into their respective haplogroups allows the addressing of various anthropologic and forensic issues. Unique to mtDNA is its abundance and non-recombining uni-parental mode of inheritance; consequently, mutations are the only changes observed in the genetic material. These individual mutations are classified into their cladistic haplogroups allowing the tracing of different genetic branch points in human (and other organisms) evolution. Due to the large number of samples, it becomes necessary to automate the classification process. Using 5-fold cross-validation, we investigated two classification techniques on the consented database of 21 141 samples published by the Genographic project. The support vector machines (SVM) algorithm achieved a macro-accuracy of 88.06% and micro-accuracy of 96.59%, while the random forest (RF) algorithm achieved a macro-accuracy of 87.35% and micro-accuracy of 96.19%. In addition to being faster and more memory-economic in making predictions, SVM and RF are better than or comparable to the nearest-neighbor method employed by the Genographic project in terms of prediction accuracy.

  18. SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES

    Directory of Open Access Journals (Sweden)

    Hugo Leonardo Pereira Rufino

    2016-04-01

    Full Text Available Most classification tools assume that data distribution be balanced or with similar costs, when not properly classified. Nevertheless, in practical terms, the existence of database where unbalanced classes occur is commonplace, such as in the diagnosis of diseases, in which the confirmed cases are usually rare when compared with a healthy population. Other examples are the detection of fraudulent calls and the detection of system intruders. In these cases, the improper classification of a minority class (for instance, to diagnose a person with cancer as healthy may result in more serious consequences that incorrectly classify a majority class. Therefore, it is important to treat the database where unbalanced classes occur. This paper presents the SMOTE_Easy algorithm, which can classify data, even if there is a high level of unbalancing between different classes. In order to prove its efficiency, a comparison with the main algorithms to treat classification issues was made, where unbalanced data exist. This process was successful in nearly all tested databases

  19. Defining and evaluating classification algorithm for high-dimensional data based on latent topics.

    Directory of Open Access Journals (Sweden)

    Le Luo

    Full Text Available Automatic text categorization is one of the key techniques in information retrieval and the data mining field. The classification is usually time-consuming when the training dataset is large and high-dimensional. Many methods have been proposed to solve this problem, but few can achieve satisfactory efficiency. In this paper, we present a method which combines the Latent Dirichlet Allocation (LDA algorithm and the Support Vector Machine (SVM. LDA is first used to generate reduced dimensional representation of topics as feature in VSM. It is able to reduce features dramatically but keeps the necessary semantic information. The Support Vector Machine (SVM is then employed to classify the data based on the generated features. We evaluate the algorithm on 20 Newsgroups and Reuters-21578 datasets, respectively. The experimental results show that the classification based on our proposed LDA+SVM model achieves high performance in terms of precision, recall and F1 measure. Further, it can achieve this within a much shorter time-frame. Our process improves greatly upon the previous work in this field and displays strong potential to achieve a streamlined classification process for a wide range of applications.

  20. Aims & Scope

    Institute of Scientific and Technical Information of China (English)

    2014-01-01

    Asian Pacific Journal of Tropical Biomedicine(APJTB)aims to set up and provide an international academic communication platform for physicians,medical scientists,allied health scientists and public health workers,especially those in the Asian Pacific region and worldwide on tropical biomedicine,infectious diseases and public health,and to meet the growing challenges

  1. Feasibility of Genetic Algorithm for Textile Defect Classification Using Neural Network

    Directory of Open Access Journals (Sweden)

    Md. Tarek Habib

    2012-07-01

    Full Text Available The global market for textile industry is highly competitive nowadays. Quality control in production process in textile industry has been a key factor for retaining existence in such competitive market. Automated textile inspection systems are very useful in this respect, because manual inspection is time consuming and not accurate enough. Hence, automated textile inspection systems have been drawing plenty of attention of the researchers of different countries in order to replace manual inspection. Defect detection and defect classification are the two major problems that are posed by the research of automated textile inspection systems. In this paper, we perform an extensive investigation on the applicability of genetic algorithm (GA in the context of textile defect classification using neural network (NN. We observe the effect of tuning different network parameters and explain the reasons. We empirically find a suitable NN model in the context of textile defect classification. We compare the performance of this model with that of the classification models implemented by others.

  2. Contact-state classification in human-demonstrated robot compliant motion tasks using the boosting algorithm.

    Science.gov (United States)

    Cabras, Stefano; Castellanos, María Eugenia; Staffetti, Ernesto

    2010-10-01

    Robot programming by demonstration is a robot programming paradigm in which a human operator directly demonstrates the task to be performed. In this paper, we focus on programming by demonstration of compliant motion tasks, which are tasks that involve contacts between an object manipulated by the robot and the environment in which it operates. Critical issues in this paradigm are to distinguish essential actions from those that are not relevant for the correct execution of the task and to transform this information into a robot-independent representation. Essential actions in compliant motion tasks are the contacts that take place, and therefore, it is important to understand the sequence of contact states that occur during a demonstration, called contact classification or contact segmentation. We propose a contact classification algorithm based on a supervised learning algorithm, in particular on a stochastic gradient boosting algorithm. The approach described in this paper is accurate and does not depend on the geometric model of the objects involved in the demonstration. It neither relies on the kinestatic model of the contact interactions nor on the contact state graph, whose computation is usually of prohibitive complexity even for very simple geometric object models.

  3. Operational algorithm for ice-water classification on dual-polarized RADARSAT-2 images

    Science.gov (United States)

    Zakhvatkina, Natalia; Korosov, Anton; Muckenhuber, Stefan; Sandven, Stein; Babiker, Mohamed

    2017-01-01

    Synthetic Aperture Radar (SAR) data from RADARSAT-2 (RS2) in dual-polarization mode provide additional information for discriminating sea ice and open water compared to single-polarization data. We have developed an automatic algorithm based on dual-polarized RS2 SAR images to distinguish open water (rough and calm) and sea ice. Several technical issues inherent in RS2 data were solved in the pre-processing stage, including thermal noise reduction in HV polarization and correction of angular backscatter dependency in HH polarization. Texture features were explored and used in addition to supervised image classification based on the support vector machines (SVM) approach. The study was conducted in the ice-covered area between Greenland and Franz Josef Land. The algorithm has been trained using 24 RS2 scenes acquired in winter months in 2011 and 2012, and the results were validated against manually derived ice charts of the Norwegian Meteorological Institute. The algorithm was applied on a total of 2705 RS2 scenes obtained from 2013 to 2015, and the validation results showed that the average classification accuracy was 91 ± 4 %.

  4. Spectral areas and ratios classifier algorithm for pancreatic tissue classification using optical spectroscopy

    Science.gov (United States)

    Chandra, Malavika; Scheiman, James; Simeone, Diane; McKenna, Barbara; Purdy, Julianne; Mycek, Mary-Ann

    2010-01-01

    Pancreatic adenocarcinoma is one of the leading causes of cancer death, in part because of the inability of current diagnostic methods to reliably detect early-stage disease. We present the first assessment of the diagnostic accuracy of algorithms developed for pancreatic tissue classification using data from fiber optic probe-based bimodal optical spectroscopy, a real-time approach that would be compatible with minimally invasive diagnostic procedures for early cancer detection in the pancreas. A total of 96 fluorescence and 96 reflectance spectra are considered from 50 freshly excised tissue sites-including human pancreatic adenocarcinoma, chronic pancreatitis (inflammation), and normal tissues-on nine patients. Classification algorithms using linear discriminant analysis are developed to distinguish among tissues, and leave-one-out cross-validation is employed to assess the classifiers' performance. The spectral areas and ratios classifier (SpARC) algorithm employs a combination of reflectance and fluorescence data and has the best performance, with sensitivity, specificity, negative predictive value, and positive predictive value for correctly identifying adenocarcinoma being 85, 89, 92, and 80%, respectively.

  5. Classification accuracy of algorithms for blood chemistry data for three aquaculture-affected marine fish species.

    Science.gov (United States)

    Coz-Rakovac, R; Topic Popovic, N; Smuc, T; Strunjak-Perovic, I; Jadan, M

    2009-11-01

    The objective of this study was determination and discrimination of biochemical data among three aquaculture-affected marine fish species (sea bass, Dicentrarchus labrax; sea bream, Sparus aurata L., and mullet, Mugil spp.) based on machine-learning methods. The approach relying on machine-learning methods gives more usable classification solutions and provides better insight into the collected data. So far, these new methods have been applied to the problem of discrimination of blood chemistry data with respect to season and feed of a single species. This is the first time these classification algorithms have been used as a framework for rapid differentiation among three fish species. Among the machine-learning methods used, decision trees provided the clearest model, which correctly classified 210 samples or 85.71%, and incorrectly classified 35 samples or 14.29% and clearly identified three investigated species from their biochemical traits.

  6. Aims & Scope

    Institute of Scientific and Technical Information of China (English)

    2015-01-01

    <正>Asian Pacific Journal of Tropical Biomedicine(APJTB)aims to set up and provide an international academic communication platform for physicians,medical scientists,allied health scientists and public health workers,especially those in the Asian Pacific region and worldwide on tropical biomedicine,infectious diseases and public health,and to meet the growing challenges of understanding,preventing and controlling the dramatic global emergence and reemergence of infectious diseases in the Asian Pacific region.

  7. Aims & Scope

    Institute of Scientific and Technical Information of China (English)

    2013-01-01

    <正>Asian Pacific Journal of Tropical Biomedicine(APJTB)aims to set up and provide an international academic communication platform for physicians,medical scientists,allied health scientists and public health workers,especially those in the Asian Pacific region and worldwide on tropical biomedicine,infectious diseases and public health,and to meet the growing challenges of understanding,preventing and controlling the dramatic global emergence and reemergence of infectious diseases in the Asian Pacific region.

  8. PMSVM: An Optimized Support Vector Machine Classification Algorithm Based on PCA and Multilevel Grid Search Methods

    Directory of Open Access Journals (Sweden)

    Yukai Yao

    2015-01-01

    Full Text Available We propose an optimized Support Vector Machine classifier, named PMSVM, in which System Normalization, PCA, and Multilevel Grid Search methods are comprehensively considered for data preprocessing and parameters optimization, respectively. The main goals of this study are to improve the classification efficiency and accuracy of SVM. Sensitivity, Specificity, Precision, and ROC curve, and so forth, are adopted to appraise the performances of PMSVM. Experimental results show that PMSVM has relatively better accuracy and remarkable higher efficiency compared with traditional SVM algorithms.

  9. Gastric Cancer Risk Analysis in Unhealthy Habits Data with Classification Algorithms

    OpenAIRE

    2015-01-01

    Data mining methods are applied to a medical task that seeks for the information about the influence of Helicobacter Pylori on the gastric cancer risk increase by analysing the adverse factors of individual lifestyle. In the process of data pre-processing, the data are cleared of noise and other factors, reduced in dimensionality, as well as transformed for the task and cleared of non-informative attributes. Data classification using C4.5, CN2 and k-nearest neighbour algorithms is carried out...

  10. Classification decision tree algorithm assisting in diagnosing solitary pulmonary nodule by SPECT/CT fusion imaging

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Objective To develop a classification tree algorithm to improve diagnostic performances of 99mTc-MIBI SPECT/CT fusion imaging in differentiating solitary pulmonary nodules(SPNs).Methods Forty-four SPNs,including 30 malignant cases and 14 benign ones that were eventually pathologically identified,were included in this prospective study.All patients received 99Tcm-MIBI SPECT/CT scanning at an early stage and a delayed stage before operation.Thirty predictor variables,including 11 clinical variables,4 variable...

  11. A classification system based on a new wrapper feature selection algorithm for the diagnosis of primary and secondary polycythemia.

    Science.gov (United States)

    Korfiatis, Vasileios Ch; Asvestas, Pantelis A; Delibasis, Konstantinos K; Matsopoulos, George K

    2013-12-01

    Primary and Secondary Polycythemia are diseases of the bone marrow that affect the blood's composition and prohibit patients from becoming blood donors. Since these diseases may become fatal, their early diagnosis is important. In this paper, a classification system for the diagnosis of Primary and Secondary Polycythemia is proposed. The proposed system classifies input data into three classes; Healthy, Primary Polycythemic (PP) and Secondary Polycythemic (SP) and is implemented using two separate binary classification levels. The first level performs the Healthy/non-Healthy classification and the second level the PP/SP classification. To this end, a novel wrapper feature selection algorithm, called the LM-FM algorithm, is presented in order to maximize the classifier's performance. The algorithm is comprised of two stages that are applied sequentially: the Local Maximization (LM) stage and the Floating Maximization (FM) stage. The LM stage finds the best possible subset of a fixed predefined size, which is then used as an input for the next stage. The FM stage uses a floating size technique to search for an even better solution by varying the initially provided subset size. Then, the Support Vector Machine (SVM) classifier is used for the discrimination of the data at each classification level. The proposed classification system is compared with various well-established feature selection techniques such as the Sequential Floating Forward Selection (SFFS) and the Maximum Output Information (MOI) wrapper schemes, and with standalone classification techniques such as the Multilayer Perceptron (MLP) and SVM classifier. The proposed LM-FM feature selection algorithm combined with the SVM classifier increases the overall performance of the classification system, scoring up to 98.9% overall accuracy at the first classification level and up to 96.6% at the second classification level. Moreover, it provides excellent robustness regardless of the size of the input feature

  12. Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

    Energy Technology Data Exchange (ETDEWEB)

    Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.; Pounds, Joel G.

    2013-09-01

    This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the set of predictors to around 8 factors that can be validated using reputable medical and public health resources.

  13. Code Syntax-Comparison Algorithm Based on Type-Redefinition-Preprocessing and Rehash Classification

    Directory of Open Access Journals (Sweden)

    Baojiang Cui

    2011-08-01

    Full Text Available The code comparison technology plays an important role in the fields of software security protection and plagiarism detection. Nowadays, there are mainly FIVE approaches of plagiarism detection, file-attribute-based, text-based, token-based, syntax-based and semantic-based. The prior three approaches have their own limitations, while the technique based on syntax has its shortage of detection ability and low efficiency that all of these approaches cannot meet the requirements on large-scale software plagiarism detection. Based on our prior research, we propose an algorithm on type redefinition plagiarism detection, which could detect the level of simple type redefinition, repeating pattern redefinition, and the redefinition of type with pointer. Besides, this paper also proposes a code syntax-comparison algorithm based on rehash classification, which enhances the node storage structure of the syntax tree, and greatly improves the efficiency.

  14. Generation of a Supervised Classification Algorithm for Time-Series Variable Stars with an Application to the LINEAR Dataset

    CERN Document Server

    Johnston, Kyle B

    2016-01-01

    With the advent of digital astronomy, new benefits and new problems have been presented to the modern day astronomer. While data can be captured in a more efficient and accurate manor using digital means, the efficiency of data retrieval has led to an overload of scientific data for processing and storage. This paper will focus on the construction and application of a supervised pattern classification algorithm for the identification of variable stars. Given the reduction of a survey of stars into a standard feature space, the problem of using prior patterns to identify new observed patterns can be reduced to time tested classification methodologies and algorithms. Such supervised methods, so called because the user trains the algorithms prior to application using patterns with known classes or labels, provide a means to probabilistically determine the estimated class type of new observations. This paper will demonstrate the construction and application of a supervised classification algorithm on variable sta...

  15. Optimal Combination of Classification Algorithms and Feature Ranking Methods for Object-Based Classification of Submeter Resolution Z/I-Imaging DMC Imagery

    Directory of Open Access Journals (Sweden)

    Fulgencio Cánovas-García

    2015-04-01

    Full Text Available Object-based image analysis allows several different features to be calculated for the resulting objects. However, a large number of features means longer computing times and might even result in a loss of classification accuracy. In this study, we use four feature ranking methods (maximum correlation, average correlation, Jeffries–Matusita distance and mean decrease in the Gini index and five classification algorithms (linear discriminant analysis, naive Bayes, weighted k-nearest neighbors, support vector machines and random forest. The objective is to discover the optimal algorithm and feature subset to maximize accuracy when classifying a set of 1,076,937 objects, produced by the prior segmentation of a 0.45-m resolution multispectral image, with 356 features calculated on each object. The study area is both large (9070 ha and diverse, which increases the possibility to generalize the results. The mean decrease in the Gini index was found to be the feature ranking method that provided highest accuracy for all of the classification algorithms. In addition, support vector machines and random forest obtained the highest accuracy in the classification, both using their default parameters. This is a useful result that could be taken into account in the processing of high-resolution images in large and diverse areas to obtain a land cover classification.

  16. Fuzzy Classification of Ocean Color Satellite Data for Bio-optical Algorithm Constituent Retrievals

    Science.gov (United States)

    Campbell, Janet W.

    1998-01-01

    The ocean has been traditionally viewed as a 2 class system. Morel and Prieur (1977) classified ocean water according to the dominant absorbent particle suspended in the water column. Case 1 is described as having a high concentration of phytoplankton (and detritus) relative to other particles. Conversely, case 2 is described as having inorganic particles such as suspended sediments in high concentrations. Little work has gone into the problem of mixing bio-optical models for these different water types. An approach is put forth here to blend bio-optical algorithms based on a fuzzy classification scheme. This scheme involves two procedures. First, a clustering procedure identifies classes and builds class statistics from in-situ optical measurements. Next, a classification procedure assigns satellite pixels partial memberships to these classes based on their ocean color reflectance signature. These membership assignments can be used as the basis for a weighting retrievals from class-specific bio-optical algorithms. This technique is demonstrated with in-situ optical measurements and an image from the SeaWiFS ocean color satellite.

  17. A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS AND PERKS

    Directory of Open Access Journals (Sweden)

    Mitali Desai

    2016-03-01

    Full Text Available The social networking sites have brought a new horizon for expressing views and opinions of individuals. Moreover, they provide medium to students to share their sentiments including struggles and joy during the learning process. Such informal information has a great venue for decision making. The large and growing scale of information needs automatic classification techniques. Sentiment analysis is one of the automated techniques to classify large data. The existing predictive sentiment analysis techniques are highly used to classify reviews on E-commerce sites to provide business intelligence. However, they are not much useful to draw decisions in education system since they classify the sentiments into merely three pre-set categories: positive, negative and neutral. Moreover, classifying the students’ sentiments into positive or negative category does not provide deeper insight into their problems and perks. In this paper, we propose a novel Hybrid Classification Algorithm to classify engineering students’ sentiments. Unlike traditional predictive sentiment analysis techniques, the proposed algorithm makes sentiment analysis process descriptive. Moreover, it classifies engineering students’ perks in addition to problems into several categories to help future students and education system in decision making.

  18. Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

    Directory of Open Access Journals (Sweden)

    Irena Jekova

    2005-12-01

    Full Text Available In this study we investigated the adequacy of two non-orthogonal ECG leads from Holter recordings to provide reliable vectorcardiogram (VCG parameters. The VCG loop was constructed using the QRS samples in a fixed-size window around the fiducial point. We developed an algorithm for fast approximation of the VCG loop, estimation of its area and calculation of relative VCG characteristics, which are expected to be minimally dependent on the patient individuality and the ECG recording conditions. Moreover, in order to obtain independent from the heart rate temporal QRS characteristics, we introduced a parameter for estimation of the differences of the interbeat RR intervals. The statistical assessment of the proposed VCG and RR interval parameters showed distinguishing distributions for N and PVC beats. The reliability for PVC detection of the extracted parameter set was estimated independently with two classification methods - a stepwise discriminant analysis and a decision-tree-like classification algorithm, using the publicly available MIT-BIH arrhythmia database. The accuracy achieved with the stepwise discriminant analysis presented sensitivity of 91% and specificity of 95.6%, while the decision-tree-like technique assured sensitivity of 93.3% and specificity of 94.6%. We suggested possibilities for accuracy improvement with adequate electrodes placement of the Holter leads, supplementary analysis of the type of the predominant beats in the reference VCG matrix and smaller step for VCG loop approximation.

  19. HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups.

    Science.gov (United States)

    Kloss-Brandstätter, Anita; Pacher, Dominic; Schönherr, Sebastian; Weissensteiner, Hansi; Binna, Robert; Specht, Günther; Kronenberg, Florian

    2011-01-01

    An ongoing source of controversy in mitochondrial DNA (mtDNA) research is based on the detection of numerous errors in mtDNA profiles that led to erroneous conclusions and false disease associations. Most of these controversies could be avoided if the samples' haplogroup status would be taken into consideration. Knowing the mtDNA haplogroup affiliation is a critical prerequisite for studying mechanisms of human evolution and discovering genes involved in complex diseases, and validating phylogenetic consistency using haplogroup classification is an important step in quality control. However, despite the availability of Phylotree, a regularly updated classification tree of global mtDNA variation, the process of haplogroup classification is still time-consuming and error-prone, as researchers have to manually compare the polymorphisms found in a population sample to those summarized in Phylotree, polymorphism by polymorphism, sample by sample. We present HaploGrep, a fast, reliable and straight-forward algorithm implemented in a Web application to determine the haplogroup affiliation of thousands of mtDNA profiles genotyped for the entire mtDNA or any part of it. HaploGrep uses the latest version of Phylotree and offers an all-in-one solution for quality assessment of mtDNA profiles in clinical genetics, population genetics and forensics. HaploGrep can be accessed freely at http://haplogrep.uibk.ac.at.

  20. Genetic algorithm for the optimization of features and neural networks in ECG signals classification

    Science.gov (United States)

    Li, Hongqiang; Yuan, Danyang; Ma, Xiangdong; Cui, Dianyin; Cao, Lu

    2017-01-01

    Feature extraction and classification of electrocardiogram (ECG) signals are necessary for the automatic diagnosis of cardiac diseases. In this study, a novel method based on genetic algorithm-back propagation neural network (GA-BPNN) for classifying ECG signals with feature extraction using wavelet packet decomposition (WPD) is proposed. WPD combined with the statistical method is utilized to extract the effective features of ECG signals. The statistical features of the wavelet packet coefficients are calculated as the feature sets. GA is employed to decrease the dimensions of the feature sets and to optimize the weights and biases of the back propagation neural network (BPNN). Thereafter, the optimized BPNN classifier is applied to classify six types of ECG signals. In addition, an experimental platform is constructed for ECG signal acquisition to supply the ECG data for verifying the effectiveness of the proposed method. The GA-BPNN method with the MIT-BIH arrhythmia database achieved a dimension reduction of nearly 50% and produced good classification results with an accuracy of 97.78%. The experimental results based on the established acquisition platform indicated that the GA-BPNN method achieved a high classification accuracy of 99.33% and could be efficiently applied in the automatic identification of cardiac arrhythmias.

  1. SVM-based multimodal classification of activities of daily living in Health Smart Homes: sensors, algorithms, and first experimental results.

    Science.gov (United States)

    Fleury, Anthony; Vacher, Michel; Noury, Norbert

    2010-03-01

    By 2050, about one third of the French population will be over 65. Our laboratory's current research focuses on the monitoring of elderly people at home, to detect a loss of autonomy as early as possible. Our aim is to quantify criteria such as the international activities of daily living (ADL) or the French Autonomie Gerontologie Groupes Iso-Ressources (AGGIR) scales, by automatically classifying the different ADL performed by the subject during the day. A Health Smart Home is used for this. Our Health Smart Home includes, in a real flat, infrared presence sensors (location), door contacts (to control the use of some facilities), temperature and hygrometry sensor in the bathroom, and microphones (sound classification and speech recognition). A wearable kinematic sensor also informs postural transitions (using pattern recognition) and walk periods (frequency analysis). This data collected from the various sensors are then used to classify each temporal frame into one of the ADL that was previously acquired (seven activities: hygiene, toilet use, eating, resting, sleeping, communication, and dressing/undressing). This is done using support vector machines. We performed a 1-h experimentation with 13 young and healthy subjects to determine the models of the different activities, and then we tested the classification algorithm (cross validation) with real data.

  2. Discrimination of Rice Varieties using LS-SVM Classification Algorithms and Hyperspectral Data

    Directory of Open Access Journals (Sweden)

    Jin Xiaming

    2015-03-01

    Full Text Available Fast discrimination of rice varieties plays a key role in the rice processing industry and benefits the management of rice in the supermarket. In order to discriminate rice varieties in a fast and nondestructive way, hyperspectral technology and several classification algorithms were used in this study. The hyperspectral data of 250 rice samples of 5 varieties were obtained using FieldSpec®3 spectrometer. Multiplication Scatter Correction (MSC was used to preprocess the raw spectra. Principal Component Analysis (PCA was used to reduce the dimension of raw spectra. To investigate the influence of different linear and non-linear classification algorithms on the discrimination results, K-Nearest Neighbors (KNN, Support Vector Machine (SVM and Least Square Support Vector Machine (LS-SVM were used to develop the discrimination models respectively. Then the performances of these three multivariate classification methods were compared according to the discrimination accuracy. The number of Principal Components (PCs and K parameter of KNN, kernel function of SVM or LS-SVM, were optimized by cross-validation in corresponding models. One hundred and twenty five rice samples (25 of each variety were chosen as calibration set and the remaining 125 rice samples were prediction set. The experiment results showed that, the optimal PCs was 8 and the cross-validation accuracy of KNN (K = 2, SVM, LS-SVM were 94.4, 96.8 and 100%, respectively, while the prediction accuracy of KNN (K = 2, SVM, LS-SVM were 89.6, 93.6 and 100%, respectively. The results indicated that LS-SVM performed the best in the discrimination of rice varieties.

  3. Contracted Nose after Silicone Implantation: A New Classification System and Treatment Algorithm

    Science.gov (United States)

    Kim, Yong Kyu; Shin, Seungho; Kim, Joo Heon

    2017-01-01

    Background Silicone implants are frequently used in augmentation rhinoplasty in Asians. A common complication of silicone augmentation rhinoplasty is capsular contracture. This is similar to the capsular contracture after augmentation mammoplasty, but a classification for secondary contracture after augmentation rhinoplasty with silicone implants has not yet been established, and treatment algorithms by grade or severity have yet to be developed. Methods Photographs of 695 patients who underwent augmentation rhinoplasty with a silicone implant from May 2001 to May 2015 were analyzed. The mean observation period was 11.4 months. Of the patients, 81 were male and 614 were female, with a mean age of 35.9 years. Grades were assigned according to postoperative appearance. Grade I was a natural appearance, as if an implant had not been inserted. Grade II was an unnatural lateral margin of the implant. Clearly identifiable implant deviation was classified as grade III, and short nose deformation was grade IV. Results Grade I outcomes were found in 498 patients (71.7%), grade II outcomes in 101 (14.5%), grade III outcomes in 75 (10.8%), and grade IV outcomes in 21 patients (3.0%). Revision surgery was indicated for the 13.8% of all patients who had grade III or IV outcomes. Conclusions It is important to clinically classify the deformations due to secondary contracture after surgery and to establish treatment algorithms to improve scientific communication among rhinoplasty surgeons. In this study, we suggest guidelines for the clinical classification of secondary capsular contracture after augmentation rhinoplasty, and also propose a treatment algorithm. PMID:28194349

  4. Development of an algorithm for heartbeats detection and classification in Holter records based on temporal and morphological features

    Science.gov (United States)

    García, A.; Romano, H.; Laciar, E.; Correa, R.

    2011-12-01

    In this work a detection and classification algorithm for heartbeats analysis in Holter records was developed. First, a QRS complexes detector was implemented and their temporal and morphological characteristics were extracted. A vector was built with these features; this vector is the input of the classification module, based on discriminant analysis. The beats were classified in three groups: Premature Ventricular Contraction beat (PVC), Atrial Premature Contraction beat (APC) and Normal Beat (NB). These beat categories represent the most important groups of commercial Holter systems. The developed algorithms were evaluated in 76 ECG records of two validated open-access databases "arrhythmias MIT BIH database" and "MIT BIH supraventricular arrhythmias database". A total of 166343 beats were detected and analyzed, where the QRS detection algorithm provides a sensitivity of 99.69 % and a positive predictive value of 99.84 %. The classification stage gives sensitivities of 97.17% for NB, 97.67% for PCV and 92.78% for APC.

  5. AN UNSUPERVISED CLASSIFICATION FOR FULLY POLARIMETRIC SAR DATA USING SPAN/H/α IHSL TRANSFORM AND THE FCM ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Wu Yirong; Cao Fang; Hong Wen

    2007-01-01

    In this paper, the IHSL transform and the Fuzzy C-Means (FCM) segmentation algorithm are combined together to perform the unsupervised classification for fully polarimetric Synthetic Aperture Rader (SAR) data. We apply the IHSL colour transform to H/α/SPAN space to obtain a new space (RGB colour space) which has a uniform distinguishability among inner parameters and contains the whole polarimetric information in H/α/SPAN. Then the FCM algorithm is applied to this RGB space to finish the classification procedure. The main advantages of this method are that the parameters in the color space have similar interclass distinguishability, thus it can achieve a high performance in the pixel based segmentation algorithm, and since we can treat the parameters in the same way, the segmentation procedure can be simplified. The experiments show that it can provide an improved classification result compared with the method which uses the H/α/SPAN space directly during the segmentation procedure.

  6. Investigation into machine learning algorithms as applied to motor cortex signals for classification of movement stages.

    Science.gov (United States)

    Hollingshead, Robert L; Putrino, David; Ghosh, Soumya; Tan, Tele

    2014-01-01

    Neuroinformatics has recently emerged as a powerful field for the statistical analysis of neural data. This study uses machine learning techniques to analyze neural spiking activities within a population of neurons with the aim of finding spiking patterns associated with different stages of movement. Neural data was recorded during many experimental trials of a cat performing a skilled reach and withdrawal task. Using Weka and the LibSVM classifier, movement stages of the skilled task were identified with a high degree of certainty achieving an area-under-curve (AUC) of the Receiver Operating Characteristic of between 0.900 and 0.997 for the combined data set. Through feature selection, the identification of significant neurons has been made easier. Given this encouraging classification performance, the extension to automatic classification and updating of control models for use with neural prostheses will enable regular adjustments capable of compensating for neural changes.

  7. Slow Learner Prediction using Multi-Variate Naïve Bayes Classification Algorithm

    Directory of Open Access Journals (Sweden)

    Shiwani Rana

    2016-12-01

    Full Text Available Machine Learning is a field of computer science that learns from data by studying algorithms and their constructions. In machine learning, for specific inputs, algorithms help to make predictions. Classification is a supervised learning approach, which maps a data item into predefined classes. For predicting slow learners in an institute, a modified Naïve Bayes algorithm implemented. The implementation is carried sing Python.  It takes into account a combination of likewise multi-valued attributes. A dataset of the 60 students of BE (Information Technology Third Semester for the subject of Digital Electronics of University Institute of Engineering and Technology (UIET, Panjab University (PU, Chandigarh, India is taken to carry out the simulations. The analysis is done by choosing most significant forty-eight attributes. The experimental results have shown that the modified Naïve Bayes model has outperformed the Naïve Bayes Classifier in accuracy but requires significant improvement in the terms of elapsed time. By using Modified Naïve Bayes approach, the accuracy is found out to be 71.66% whereas it is calculated 66.66% using existing Naïve Bayes model. Further, a comparison is drawn by using WEKA tool. Here, an accuracy of Naïve Bayes is obtained as 58.33 %.

  8. An Automated Algorithm to Screen Massive Training Samples for a Global Impervious Surface Classification

    Science.gov (United States)

    Tan, Bin; Brown de Colstoun, Eric; Wolfe, Robert E.; Tilton, James C.; Huang, Chengquan; Smith, Sarah E.

    2012-01-01

    An algorithm is developed to automatically screen the outliers from massive training samples for Global Land Survey - Imperviousness Mapping Project (GLS-IMP). GLS-IMP is to produce a global 30 m spatial resolution impervious cover data set for years 2000 and 2010 based on the Landsat Global Land Survey (GLS) data set. This unprecedented high resolution impervious cover data set is not only significant to the urbanization studies but also desired by the global carbon, hydrology, and energy balance researches. A supervised classification method, regression tree, is applied in this project. A set of accurate training samples is the key to the supervised classifications. Here we developed the global scale training samples from 1 m or so resolution fine resolution satellite data (Quickbird and Worldview2), and then aggregate the fine resolution impervious cover map to 30 m resolution. In order to improve the classification accuracy, the training samples should be screened before used to train the regression tree. It is impossible to manually screen 30 m resolution training samples collected globally. For example, in Europe only, there are 174 training sites. The size of the sites ranges from 4.5 km by 4.5 km to 8.1 km by 3.6 km. The amount training samples are over six millions. Therefore, we develop this automated statistic based algorithm to screen the training samples in two levels: site and scene level. At the site level, all the training samples are divided to 10 groups according to the percentage of the impervious surface within a sample pixel. The samples following in each 10% forms one group. For each group, both univariate and multivariate outliers are detected and removed. Then the screen process escalates to the scene level. A similar screen process but with a looser threshold is applied on the scene level considering the possible variance due to the site difference. We do not perform the screen process across the scenes because the scenes might vary due to

  9. A Void Reference Sensor-Multiple Signal Classification Algorithm for More Accurate Direction of Arrival Estimation of Low Altitude Target

    Institute of Scientific and Technical Information of China (English)

    XIAO Hui; SUN Jin-cai; YUAN Jun; NIU Yi-long

    2007-01-01

    There exists MUSIC (multiple signal classification) algorithm for direction of arrival (DOA) estimation. This paper is to present a different MUSIC algorithm for more accurate estimation of low altitude target. The possibility of better performance is analyzed using a void reference sensor (VRS) in MUSIC algorithm. The following two topics are discussed: 1) the time delay formula and VRS-MUSIC algorithm with VRS located on the minus of z-axes; 2) the DOA estimation results of VRS-MUSIC and MUSIC algorithms. The simulation results show VRS-MUSIC algorithm has three advantages compared with MUSIC: 1 ) When the signal to noise ratio (SNR) is more than - 5 dB, the direction estimation error is 1/2 as much as that obtained by MUSIC; 2) The side lobe is more lower and the stability is better; 3) The size of array that the algorithm requires is smaller.

  10. Classification

    Science.gov (United States)

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  11. Classification of Atrial Septal Defect and Ventricular Septal Defect with Documented Hemodynamic Parameters via Cardiac Catheterization by Genetic Algorithms and Multi-Layered Artificial Neural Network

    Directory of Open Access Journals (Sweden)

    Mustafa Yıldız

    2012-08-01

    Full Text Available Introduction: We aimed to develop a classification method to discriminate ventricular septal defect and atrial septal defect by using severalhemodynamic parameters.Patients and Methods: Forty three patients (30 atrial septal defect, 13 ventricular septal defect; 26 female, 17 male with documentedhemodynamic parameters via cardiac catheterization are included to study. Such parameters as blood pressure values of different areas,gender, age and Qp/Qs ratios are used for classification. Parameters, we used in classification are determined by divergence analysismethod. Those parameters are; i pulmonary artery diastolic pressure, ii Qp/Qs ratio, iii right atrium pressure, iv age, v pulmonary arterysystolic pressure, vi left ventricular sistolic pressure, vii aorta mean pressure, viii left ventricular diastolic pressure, ix aorta diastolicpressure, x aorta systolic pressure. Those parameters detected from our study population, are uploaded to multi-layered artificial neuralnetwork and the network was trained by genetic algorithm.Results: Trained cluster consists of 14 factors (7 atrial septal defect and 7 ventricular septal defect. Overall success ratio is 79.2%, andwith a proper instruction of artificial neural network this ratio increases up to 89%.Conclusion: Parameters, belonging to artificial neural network, which are needed to be detected by the investigator in classical methods,can easily be detected with the help of genetic algorithms. During the instruction of artificial neural network by genetic algorithms, boththe topology of network and factors of network can be determined. During the test stage, elements, not included in instruction cluster, areassumed as in test cluster, and as a result of this study, we observed that multi-layered artificial neural network can be instructed properly,and neural network is a successful method for aimed classification.

  12. Improving accuracy for cancer classification with a new algorithm for genes selection

    Directory of Open Access Journals (Sweden)

    Zhang Hongyan

    2012-11-01

    feature space that includes possible interactions of many genes. Performance of the algorithm applied to 9 datasets suggests that it is possible to improve the accuracy of cancer classification by a big margin when joint effects of many genes are considered.

  13. E-mail Spam Classification With Artificial Neural Network and Negative Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Ismaila Idris

    2011-12-01

    Full Text Available This paper apply neural network and spam model based on Negative selection algorithm for solving complex problems in spam detection. This is achieved by distinguishing spam from non-spam (self from non-self. We propose an optimized technique for e-mail classification; The e-mail are classified as self and non-self whose redundancy was removed from the detector set in the previous research to generate a self and non-self detector memory. A vector with an array of two element self and non-self concentration vector are generated into a feature vector used as an input in neural network classifier to classify the self and non-self feature vector of self and nonself program. The hybridization of both neural network and our previous model will further enhance our spam detector by improving the false rate and also enable the two different detectors to have a uniform platform for effective performance rate.

  14. Correlation Equations for Condensing Heat Exchangers Based on an Algorithmic Performance-Data Classification

    Science.gov (United States)

    Pacheco-Vega, Arturo

    2016-09-01

    In this work a new set of correlation equations is developed and introduced to accurately describe the thermal performance of compact heat exchangers with possible condensation. The feasible operating conditions for the thermal system correspond to dry- surface, dropwise condensation, and film condensation. Using a prescribed form for each condition, a global regression analysis for the best-fit correlation to experimental data is carried out with a simulated annealing optimization technique. The experimental data were taken from the literature and algorithmically classified into three groups -related to the possible operating conditions- with a previously-introduced Gaussian-mixture-based methodology. Prior to their use in the analysis, the correct data classification was assessed and confirmed via artificial neural networks. Predictions from the correlations obtained for the different conditions are within the uncertainty of the experiments and substantially more accurate than those commonly used.

  15. HOS network-based classification of power quality events via regression algorithms

    Science.gov (United States)

    Palomares Salas, José Carlos; González de la Rosa, Juan José; Sierra Fernández, José María; Pérez, Agustín Agüera

    2015-12-01

    This work compares seven regression algorithms implemented in artificial neural networks (ANNs) supported by 14 power-quality features, which are based in higher-order statistics. Combining time and frequency domain estimators to deal with non-stationary measurement sequences, the final goal of the system is the implementation in the future smart grid to guarantee compatibility between all equipment connected. The principal results are based in spectral kurtosis measurements, which easily adapt to the impulsive nature of the power quality events. These results verify that the proposed technique is capable of offering interesting results for power quality (PQ) disturbance classification. The best results are obtained using radial basis networks, generalized regression, and multilayer perceptron, mainly due to the non-linear nature of data.

  16. Traumatic subarachnoid pleural fistula in children: case report, algorithm and classification proposal

    Directory of Open Access Journals (Sweden)

    Moscote-Salazar Luis Rafael

    2016-06-01

    Full Text Available Subarachnoid pleural fistulas are rare. They have been described as complications of thoracic surgery, penetrating injuries and spinal surgery, among others. We present the case of a 3-year-old female child, who suffer spinal cord trauma secondary to a car accident, developing a posterior subarachnoid pleural fistula. To our knowledge this is the first reported case of a pediatric patient with subarachnoid pleural fistula resulting from closed trauma, requiring intensive multimodal management. We also present a management algorithm and a proposed classification. The diagnosis of this pathology is difficult when not associated with neurological deficit. A high degree of suspicion, multidisciplinary management and timely surgical intervention allow optimal management.

  17. Gastric Cancer Risk Analysis in Unhealthy Habits Data with Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Kirshners Arnis

    2015-12-01

    Full Text Available Data mining methods are applied to a medical task that seeks for the information about the influence of Helicobacter Pylori on the gastric cancer risk increase by analysing the adverse factors of individual lifestyle. In the process of data preprocessing, the data are cleared of noise and other factors, reduced in dimensionality, as well as transformed for the task and cleared of non-informative attributes. Data classification using C4.5, CN2 and k-nearest neighbour algorithms is carried out to find relationships between the analysed attributes and the descriptive class attribute – Helicobacter Pylori presence that could have influence on the cancer development risk. Experimental analysis is carried out using the data of the Latvian-based project “Interdisciplinary Research Group for Early Cancer Detection and Cancer Prevention” database.

  18. Study of Fault Diagnosis Method for Wind Turbine with Decision Classification Algorithms and Expert System

    Directory of Open Access Journals (Sweden)

    Feng Yongxin

    2012-09-01

    Full Text Available Study on the fault diagnosis method through the combination of decision classification algorithms and expert system. The method of extracting diagnosis rules with the CTree software was given, and a fault diagnosis system based on CLIPS was developed. In order to verify the feasibility of the method, at first the sample data was got through the simulations under fault of direct-drive wind turbine and gearbox, then the diagnosis rules was extracted with the CTree software, at last the fault diagnosis system proposed and the rules was used to extracted to diagnose the fault simulated. Test results showed that the misdiagnosis rate both within 5%, thus the feasibility of the method was verified.

  19. Multiple Signal Classification Algorithm (MUSICAL) for super-resolution fluorescence microscopy

    CERN Document Server

    Agarwal, Krishna

    2016-01-01

    Super-resolution microscopy is providing unprecedented insights into biology by resolving details much below the diffraction limit. State-of-the-art Single Molecule Localization Microscopy (SMLM) techniques for super-resolution are restricted by long acquisition and computational times, or the need of special fluorophores or chemical environments. Here, we propose a novel statistical super-resolution technique of wide-field fluorescence microscopy called MUltiple SIgnal Classification ALgorithm (MUSICAL) which has several advantages over SMLM techniques. MUSICAL provides resolution down to at least 50 nm, has low requirements on number of frames and excitation power and works even at high fluorophore concentrations. Further, it works with any fluorophore that exhibits blinking on the time scale of the recording. We compare imaging results of MUSICAL with SMLM and four contemporary statistical super-resolution methods for experiments of in-vitro actin filaments and datasets provided by independent research gro...

  20. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification

    Directory of Open Access Journals (Sweden)

    D. Ramyachitra

    2015-09-01

    Full Text Available Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM, K-nearest neighbor (KNN, Interval Valued Classification (IVC and the improvised Interval Value based Particle Swarm Optimization (IVPSO algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  1. Content-based and Algorithmic Classifications of Journals: Perspectives on the Dynamics of Scientific Communication and Indexer Effects

    CERN Document Server

    Rafols, Ismael

    2008-01-01

    The aggregated journal-journal citation matrix -based on the Journal Citation Reports (JCR) of the Science Citation Index- can be decomposed by indexers and/or algorithmically. In this study, we test the results of two recently available algorithms for the decomposition of large matrices against two content-based classifications of journals: the ISI Subject Categories and the field/subfield classification of Glaenzel & Schubert (2003). The content-based schemes allow for the attribution of more than a single category to a journal, whereas the algorithms maximize the ratio of within-category citations over between-category citations in the aggregated category-category citation matrix. By adding categories, indexers generate between-category citations, which may enrich the database, for example, in the case of inter-disciplinary developments. The consequent indexer effects are significant in sparse areas of the matrix more than in denser ones. Algorithmic decompositions, on the other hand, are more heavily ...

  2. Interval-value Based Particle Swarm Optimization algorithm for cancer-type specific gene selection and sample classification.

    Science.gov (United States)

    Ramyachitra, D; Sofia, M; Manikandan, P

    2015-09-01

    Microarray technology allows simultaneous measurement of the expression levels of thousands of genes within a biological tissue sample. The fundamental power of microarrays lies within the ability to conduct parallel surveys of gene expression using microarray data. The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes is usually very high compared to the number of data samples. Thus the difficulty that lies with data are of high dimensionality and the sample size is small. This research work addresses the problem by classifying resultant dataset using the existing algorithms such as Support Vector Machine (SVM), K-nearest neighbor (KNN), Interval Valued Classification (IVC) and the improvised Interval Value based Particle Swarm Optimization (IVPSO) algorithm. Thus the results show that the IVPSO algorithm outperformed compared with other algorithms under several performance evaluation functions.

  3. Reliable classification of two-class cancer data using evolutionary algorithms.

    Science.gov (United States)

    Deb, Kalyanmoy; Raji Reddy, A

    2003-11-01

    In the area of bioinformatics, the identification of gene subsets responsible for classifying available disease samples to two or more of its variants is an important task. Such problems have been solved in the past by means of unsupervised learning methods (hierarchical clustering, self-organizing maps, k-mean clustering, etc.) and supervised learning methods (weighted voting approach, k-nearest neighbor method, support vector machine method, etc.). Such problems can also be posed as optimization problems of minimizing gene subset size to achieve reliable and accurate classification. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, here we treat the problem as a multiobjective optimization problem of minimizing the gene subset size and minimizing the number of misclassified samples. Moreover, for a more reliable classification, we consider multiple training sets in evaluating a classifier. Contrary to the past studies, the use of a multiobjective EA (NSGA-II) has enabled us to discover a smaller gene subset size (such as four or five) to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). We have also extended the NSGA-II to obtain multiple non-dominated solutions discovering as much as 352 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to have further confidence in the identification task, we have also introduced a prediction strength threshold for determining a sample's belonging to one class or the other. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multiobjective EA for the gene subset identification task.

  4. Robust evaluation of time series classification algorithms for structural health monitoring

    Science.gov (United States)

    Harvey, Dustin Y.; Worden, Keith; Todd, Michael D.

    2014-03-01

    Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and mechanical infrastructure through analysis of structural response measurements. The supervised learning methodology for data-driven SHM involves computation of low-dimensional, damage-sensitive features from raw measurement data that are then used in conjunction with machine learning algorithms to detect, classify, and quantify damage states. However, these systems often suffer from performance degradation in real-world applications due to varying operational and environmental conditions. Probabilistic approaches to robust SHM system design suffer from incomplete knowledge of all conditions a system will experience over its lifetime. Info-gap decision theory enables nonprobabilistic evaluation of the robustness of competing models and systems in a variety of decision making applications. Previous work employed info-gap models to handle feature uncertainty when selecting various components of a supervised learning system, namely features from a pre-selected family and classifiers. In this work, the info-gap framework is extended to robust feature design and classifier selection for general time series classification through an efficient, interval arithmetic implementation of an info-gap data model. Experimental results are presented for a damage type classification problem on a ball bearing in a rotating machine. The info-gap framework in conjunction with an evolutionary feature design system allows for fully automated design of a time series classifier to meet performance requirements under maximum allowable uncertainty.

  5. A Population Classification Evolution Algorithm for the Parameter Extraction of Solar Cell Models

    Directory of Open Access Journals (Sweden)

    Yiqun Zhang

    2016-01-01

    Full Text Available To quickly and precisely extract the parameters for solar cell models, inspired by simplified bird mating optimizer (SBMO, a new optimization technology referred to as population classification evolution (PCE is proposed. PCE divides the population into two groups, elite and ordinary, to reach a better compromise between exploitation and exploration. For the evolution of elite individuals, we adopt the idea of parthenogenesis in nature to afford a fast exploitation. For the evolution of ordinary individuals, we adopt an effective differential evolution strategy and a random movement of small probability is added to strengthen the ability to jump out of a local optimum, which affords a fast exploration. The proposed PCE is first estimated on 13 classic benchmark functions. The experimental results demonstrate that PCE yields the best results on 11 functions by comparing it with six evolutional algorithms. Then, PCE is applied to extract the parameters for solar cell models, that is, the single diode and the double diode. The experimental analyses demonstrate that the proposed PCE is superior when comparing it with other optimization algorithms for parameter identification. Moreover, PCE is tested using three different sources of data with good accuracy.

  6. Application of supervised machine learning algorithms for the classification of regulatory RNA riboswitches.

    Science.gov (United States)

    Singh, Swadha; Singh, Raghvendra

    2016-04-03

    Riboswitches, the small structured RNA elements, were discovered about a decade ago. It has been the subject of intense interest to identify riboswitches, understand their mechanisms of action and use them in genetic engineering. The accumulation of genome and transcriptome sequence data and comparative genomics provide unprecedented opportunities to identify riboswitches in the genome. In the present study, we have evaluated the following six machine learning algorithms for their efficiency to classify riboswitches: J48, BayesNet, Naïve Bayes, Multilayer Perceptron, sequential minimal optimization, hidden Markov model (HMM). For determining effective classifier, the algorithms were compared on the statistical measures of specificity, sensitivity, accuracy, F-measure and receiver operating characteristic (ROC) plot analysis. The classifier Multilayer Perceptron achieved the best performance, with the highest specificity, sensitivity, F-score and accuracy, and with the largest area under the ROC curve, whereas HMM was the poorest performer. At present, the available tools for the prediction and classification of riboswitches are based on covariance model, support vector machine and HMM. The present study determines Multilayer Perceptron as a better classifier for the genome-wide riboswitch searches.

  7. Effective classification of microRNA precursors using feature mining and AdaBoost algorithms.

    Science.gov (United States)

    Zhong, Ling; Wang, Jason T L; Wen, Dongrong; Aris, Virginie; Soteropoulos, Patricia; Shapiro, Bruce A

    2013-09-01

    MicroRNAs play important roles in most biological processes, including cell proliferation, tissue differentiation, and embryonic development, among others. They originate from precursor transcripts (pre-miRNAs), which contain phylogenetically conserved stem-loop structures. An important bioinformatics problem is to distinguish the pre-miRNAs from pseudo pre-miRNAs that have similar stem-loop structures. We present here a novel method for tackling this bioinformatics problem. Our method, named MirID, accepts an RNA sequence as input, and classifies the RNA sequence either as positive (i.e., a real pre-miRNA) or as negative (i.e., a pseudo pre-miRNA). MirID employs a feature mining algorithm for finding combinations of features suitable for building pre-miRNA classification models. These models are implemented using support vector machines, which are combined to construct a classifier ensemble. The accuracy of the classifier ensemble is further enhanced by the utilization of an AdaBoost algorithm. When compared with two closely related tools on twelve species analyzed with these tools, MirID outperforms the existing tools on the majority of the twelve species. MirID was also tested on nine additional species, and the results showed high accuracies on the nine species. The MirID web server is fully operational and freely accessible at http://bioinformatics.njit.edu/MirID/ . Potential applications of this software in genomics and medicine are also discussed.

  8. The CR‐Ω+ Classification Algorithm for Spatio‐Temporal Prediction of Criminal Activity

    Directory of Open Access Journals (Sweden)

    S. Godoy‐Calderón

    2010-04-01

    Full Text Available We present a spatio‐temporal prediction model that allows forecasting of the criminal activity behavior in a particular region byusing supervised classification. The degree of membership of each pattern is interpreted as the forecasted increase or decreasein the criminal activity for the specified time and location. The proposed forecasting model (CR‐Ω+ is based on the family ofKora‐Ω Logical‐Combinatorial algorithms operating on large data volumes from several heterogeneous sources using aninductive learning process. We propose several modifications to the original algorithms by Bongard and Baskakova andZhuravlëv which improve the prediction performance on the studied dataset of criminal activity. We perform two analyses:punctual prediction and tendency analysis, which show that it is possible to predict punctually one of four crimes to beperpetrated (crime family, in a specific space and time, and 66% of effectiveness in the prediction of the place of crime, despiteof the noise of the dataset. The tendency analysis yielded an STRMSE (Spatio‐Temporal RMSE of less than 1.0.

  9. Classification algorithms to improve the accuracy of identifying patients hospitalized with community-acquired pneumonia using administrative data.

    Science.gov (United States)

    Yu, O; Nelson, J C; Bounds, L; Jackson, L A

    2011-09-01

    In epidemiological studies of community-acquired pneumonia (CAP) that utilize administrative data, cases are typically defined by the presence of a pneumonia hospital discharge diagnosis code. However, not all such hospitalizations represent true CAP cases. We identified 3991 hospitalizations during 1997-2005 in a managed care organization, and validated them as CAP or not by reviewing medical records. To improve the accuracy of CAP identification, classification algorithms that incorporated additional administrative information associated with the hospitalization were developed using the classification and regression tree analysis. We found that a pneumonia code designated as the primary discharge diagnosis and duration of hospital stay improved the classification of CAP hospitalizations. Compared to the commonly used method that is based on the presence of a primary discharge diagnosis code of pneumonia alone, these algorithms had higher sensitivity (81-98%) and positive predictive values (82-84%) with only modest decreases in specificity (48-82%) and negative predictive values (75-90%).

  10. Land Use Classification using Support Vector Machine and Maximum Likelihood Algorithms by Landsat 5 TM Images

    Directory of Open Access Journals (Sweden)

    Abbas TAATI

    2015-08-01

    Full Text Available Nowadays, remote sensing images have been identified and exploited as the latest information to study land cover and land uses. These digital images are of significant importance, since they can present timely information, and capable of providing land use maps. The aim of this study is to create land use classification using a support vector machine (SVM and maximum likelihood classifier (MLC in Qazvin, Iran, by TM images of the Landsat 5 satellite. In the pre-processing stage, the necessary corrections were applied to the images. In order to evaluate the accuracy of the 2 algorithms, the overall accuracy and kappa coefficient were used. The evaluation results verified that the SVM algorithm with an overall accuracy of 86.67 % and a kappa coefficient of 0.82 has a higher accuracy than the MLC algorithm in land use mapping. Therefore, this algorithm has been suggested to be applied as an optimal classifier for extraction of land use maps due to its higher accuracy and better consistency within the study area.

  11. Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

    Directory of Open Access Journals (Sweden)

    Mustafa Serter Uzer

    2013-01-01

    Full Text Available This paper offers a hybrid approach that uses the artificial bee colony (ABC algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.

  12. Classification algorithms with multi-modal data fusion could accurately distinguish neuromyelitis optica from multiple sclerosis.

    Science.gov (United States)

    Eshaghi, Arman; Riyahi-Alam, Sadjad; Saeedi, Roghayyeh; Roostaei, Tina; Nazeri, Arash; Aghsaei, Aida; Doosti, Rozita; Ganjgahi, Habib; Bodini, Benedetta; Shakourirad, Ali; Pakravan, Manijeh; Ghana'ati, Hossein; Firouznia, Kavous; Zarei, Mojtaba; Azimi, Amir Reza; Sahraian, Mohammad Ali

    2015-01-01

    Neuromyelitis optica (NMO) exhibits substantial similarities to multiple sclerosis (MS) in clinical manifestations and imaging results and has long been considered a variant of MS. With the advent of a specific biomarker in NMO, known as anti-aquaporin 4, this assumption has changed; however, the differential diagnosis remains challenging and it is still not clear whether a combination of neuroimaging and clinical data could be used to aid clinical decision-making. Computer-aided diagnosis is a rapidly evolving process that holds great promise to facilitate objective differential diagnoses of disorders that show similar presentations. In this study, we aimed to use a powerful method for multi-modal data fusion, known as a multi-kernel learning and performed automatic diagnosis of subjects. We included 30 patients with NMO, 25 patients with MS and 35 healthy volunteers and performed multi-modal imaging with T1-weighted high resolution scans, diffusion tensor imaging (DTI) and resting-state functional MRI (fMRI). In addition, subjects underwent clinical examinations and cognitive assessments. We included 18 a priori predictors from neuroimaging, clinical and cognitive measures in the initial model. We used 10-fold cross-validation to learn the importance of each modality, train and finally test the model performance. The mean accuracy in differentiating between MS and NMO was 88%, where visible white matter lesion load, normal appearing white matter (DTI) and functional connectivity had the most important contributions to the final classification. In a multi-class classification problem we distinguished between all of 3 groups (MS, NMO and healthy controls) with an average accuracy of 84%. In this classification, visible white matter lesion load, functional connectivity, and cognitive scores were the 3 most important modalities. Our work provides preliminary evidence that computational tools can be used to help make an objective differential diagnosis of NMO and MS.

  13. Classification algorithms with multi-modal data fusion could accurately distinguish neuromyelitis optica from multiple sclerosis

    Directory of Open Access Journals (Sweden)

    Arman Eshaghi

    2015-01-01

    Full Text Available Neuromyelitis optica (NMO exhibits substantial similarities to multiple sclerosis (MS in clinical manifestations and imaging results and has long been considered a variant of MS. With the advent of a specific biomarker in NMO, known as anti-aquaporin 4, this assumption has changed; however, the differential diagnosis remains challenging and it is still not clear whether a combination of neuroimaging and clinical data could be used to aid clinical decision-making. Computer-aided diagnosis is a rapidly evolving process that holds great promise to facilitate objective differential diagnoses of disorders that show similar presentations. In this study, we aimed to use a powerful method for multi-modal data fusion, known as a multi-kernel learning and performed automatic diagnosis of subjects. We included 30 patients with NMO, 25 patients with MS and 35 healthy volunteers and performed multi-modal imaging with T1-weighted high resolution scans, diffusion tensor imaging (DTI and resting-state functional MRI (fMRI. In addition, subjects underwent clinical examinations and cognitive assessments. We included 18 a priori predictors from neuroimaging, clinical and cognitive measures in the initial model. We used 10-fold cross-validation to learn the importance of each modality, train and finally test the model performance. The mean accuracy in differentiating between MS and NMO was 88%, where visible white matter lesion load, normal appearing white matter (DTI and functional connectivity had the most important contributions to the final classification. In a multi-class classification problem we distinguished between all of 3 groups (MS, NMO and healthy controls with an average accuracy of 84%. In this classification, visible white matter lesion load, functional connectivity, and cognitive scores were the 3 most important modalities. Our work provides preliminary evidence that computational tools can be used to help make an objective differential diagnosis

  14. Algorithms of Expert Classification Applied in Quickbird Satellite Images for Land Use Mapping Algoritmos de Clasificación Experta Aplicados en Imágenes Satelitales Quickbird para el Mapeo de la Cobertura de la Tierra

    Directory of Open Access Journals (Sweden)

    Alberto Jesús Perea

    2009-09-01

    Full Text Available The objective of this paper was the development of a methodology for the classification of digital aerial images, which, with the aid of object-based classification and the Normalized Difference Vegetation Index (NDVI, can quantify agricultural areas, by using algorithms of expert classification, with the aim of improving the final results of thematic classifications. QuickBird satellite images and data of 2532 plots in Hinojosa del Duque, Spain, were used to validate the different classifications, obtaining an overall classification accuracy of 91.9% and an excellent Kappa statistic (87.6% for the algorithm of expert classification.El objetivo del presente trabajo fue poner a punto una metodología de clasificación de imágenes de satélite, que auxiliada por la clasificación orientada a objetos y el índice de vegetación de diferencia normalizada (normalized difference vegetation index, NDVI, permita cuantificar las áreas agrícolas de la región utilizando algoritmos de clasificación experta, con vistas a mejorar los resultados finales de las clasificaciones temáticas. Se utilizaron imágenes satelitales Quickbird y datos de 2532 parcelas en Hinojosa del Duque, España, para validar las clasificaciones, consiguiendo una precisión total del 91,9% y un excelente estadístico Kappa (87,6% para el algoritmo de clasificación experta.

  15. The Self-Directed Violence Classification System and the Columbia Classification Algorithm for Suicide Assessment: A Crosswalk

    Science.gov (United States)

    Matarazzo, Bridget B.; Clemans, Tracy A.; Silverman, Morton M.; Brenner, Lisa A.

    2013-01-01

    The lack of a standardized nomenclature for suicide-related thoughts and behaviors prompted the Centers for Disease Control and Prevention, with the Veterans Integrated Service Network 19 Mental Illness Research Education and Clinical Center, to create the Self-Directed Violence Classification System (SDVCS). SDVCS has been adopted by the…

  16. A comparison of classification techniques for glacier change detection using multispectral images

    OpenAIRE

    Rahul Nijhawan; Pradeep Garg; Praveen Thakur

    2016-01-01

    Main aim of this paper is to compare the classification accuracies of glacier change detection by following classifiers: sub-pixel classification algorithm, indices based supervised classification and object based algorithm using Landsat imageries. It was observed that shadow effect was not removed in sub-pixel based classification which was removed by the indices method. Further the accuracy was improved by object based classification. Objective of the paper is to analyse different classific...

  17. A Novel Position-based Sentiment Classification Algorithm for Facebook Comments

    Directory of Open Access Journals (Sweden)

    Khunishkah Surroop

    2016-10-01

    Full Text Available With the popularisation of social networks, people are now more at ease to share their thoughts, ideas, opinions and views about all kinds of topics on public platforms. Millions of users are connected each day on social networks and they often contribute to online crimes by their comments or posts through cyberbullying, identity theft, online blackmailing, etc. Mauritius has also registered a surge in the number of cybercrime cases during the past decade. In this study, a trilingual dataset of 1031 comments was extracted from public pages on Facebook. This dataset was manually categorised into four different sentiment classes: positive, negative, very negative and neutral, using a novel sentiment classification algorithm. Out of these 1031 comments, it was found that 97.8% of the very negative sentiments, 70.7% of the negative sentiments and 77.0% of the positive sentiments were correctly extracted. Despite the added complexity of our dataset, the accuracy of our system is slightly better than similar works in the field. The accuracy of the lexicon-based approach was also much higher than when we used machine learning techniques. The outcome of this research work can be used by the Mauritius Police Force to track down potential cases of cybercrime on social networks. Decisive actions can then be implemented in time.

  18. Evolutionary Algorithm Based Feature Optimization for Multi-Channel EEG Classification

    Science.gov (United States)

    Wang, Yubo; Veluvolu, Kalyana C.

    2017-01-01

    The most BCI systems that rely on EEG signals employ Fourier based methods for time-frequency decomposition for feature extraction. The band-limited multiple Fourier linear combiner is well-suited for such band-limited signals due to its real-time applicability. Despite the improved performance of these techniques in two channel settings, its application in multiple-channel EEG is not straightforward and challenging. As more channels are available, a spatial filter will be required to eliminate the noise and preserve the required useful information. Moreover, multiple-channel EEG also adds the high dimensionality to the frequency feature space. Feature selection will be required to stabilize the performance of the classifier. In this paper, we develop a new method based on Evolutionary Algorithm (EA) to solve these two problems simultaneously. The real-valued EA encodes both the spatial filter estimates and the feature selection into its solution and optimizes it with respect to the classification error. Three Fourier based designs are tested in this paper. Our results show that the combination of Fourier based method with covariance matrix adaptation evolution strategy (CMA-ES) has the best overall performance. PMID:28203141

  19. Verdict Accuracy of Quick Reduct Algorithm using Clustering and Classification Techniques for Gene Expression Data

    Directory of Open Access Journals (Sweden)

    T.Chandrasekhar

    2012-01-01

    Full Text Available In most gene expression data, the number of training samples is very small compared to the large number of genes involved in the experiments. However, among the large amount of genes, only a small fraction is effective for performing a certain task. Furthermore, a small subset of genes is desirable in developing gene expression based diagnostic tools for delivering reliable and understandable results. With the gene selection results, the cost of biological experiment and decision can be greatly reduced by analyzing only the marker genes. An important application of gene expression data in functional genomics is to classify samples according to their gene expression profiles. Feature selection (FS is a process which attempts to select more informative features. It is one of the important steps in knowledge discovery. Conventional supervised FS methods evaluate various feature subsets using an evaluation function or metric to select only those features which are related to the decision classes of the data under consideration. This paper studies a feature selection method based on rough set theory. Further K-Means, Fuzzy C-Means (FCM algorithm have implemented for the reduced feature set without considering class labels. Then the obtained results are compared with the original class labels. Back Propagation Network (BPN has also been used for classification. Then the performance of K-Means, FCM, and BPN are analyzed through the confusion matrix. It is found that the BPN is performing well comparatively.

  20. Performance analysis of image processing algorithms for classification of natural vegetation in the mountains of southern California

    Science.gov (United States)

    Yool, S. R.; Star, J. L.; Estes, J. E.; Botkin, D. B.; Eckhardt, D. W.

    1986-01-01

    The earth's forests fix carbon from the atmosphere during photosynthesis. Scientists are concerned that massive forest removals may promote an increase in atmospheric carbon dioxide, with possible global warming and related environmental effects. Space-based remote sensing may enable the production of accurate world forest maps needed to examine this concern objectively. To test the limits of remote sensing for large-area forest mapping, we use Landsat data acquired over a site in the forested mountains of southern California to examine the relative capacities of a variety of popular image processing algorithms to discriminate different forest types. Results indicate that certain algorithms are best suited to forest classification. Differences in performance between the algorithms tested appear related to variations in their sensitivities to spectral variations caused by background reflectance, differential illumination, and spatial pattern by species. Results emphasize the complexity between the land-cover regime, remotely sensed data and the algorithms used to process these data.

  1. Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer

    Science.gov (United States)

    Ruske, Simon; Topping, David O.; Foot, Virginia E.; Kaye, Paul H.; Stanley, Warren R.; Crawford, Ian; Morse, Andrew P.; Gallagher, Martin W.

    2017-03-01

    Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was

  2. Development and validation of a computerized algorithm for International Standards for Neurological Classification of Spinal Cord Injury (ISNCSCI)

    DEFF Research Database (Denmark)

    Walden, K; Bélanger, L M; Biering-Sørensen, F

    2016-01-01

    a standardized method to accurately derive the level and severity of SCI from the raw data of the ISNCSCI examination. The web interface assists in maximizing usability while minimizing the impact of human error in classifying SCI. SPONSORSHIP: This study is sponsored by the Rick Hansen Institute and supported......STUDY DESIGN: Validation study. OBJECTIVES: To describe the development and validation of a computerized application of the international standards for neurological classification of spinal cord injury (ISNCSCI). SETTING: Data from acute and rehabilitation care. METHODS: The Rick Hansen Institute......-ISNCSCI Algorithm (RHI-ISNCSCI Algorithm) was developed based on the 2011 version of the ISNCSCI and the 2013 version of the worksheet. International experts developed the design and logic with a focus on usability and features to standardize the correct classification of challenging cases. A five-phased process...

  3. Business Analysis and Decision Making Through Unsupervised Classification of Mixed Data Type of Attributes Through Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Rohit Rastogi

    2014-01-01

    Full Text Available Grouping or unsupervised classification has variety of demands in which the major one is the capability of the chosen clustering approach to deal with scalability and to handle the mixed variety of data set. There are variety of data sets like categorical/nominal, ordinal, binary (symmetric or asymmetric, ratio and interval scaled variables. In the present scenario, latest approaches of unsupervised classification are Swarm Optimization based, Customer Segmentation based, Soft Computing methods like Fuzzy Based and GA based, Entropy Based methods and hierarchical approaches. These approaches have two serious bottlenecks…Either they are hybrid mathematical techniques or large computation demanding which increases their complexity and hence compromises with accuracy. It is very easy to compare and analyze that unsupervised classification by Genetic Algorithm is feasible, suitable and efficient for high-dimensional data sets with mixed data values that are obtained from real life results, events and happenings.

  4. Classification of Aerosol Retrievals from Spaceborne Polarimetry Using a Multi-Parameter Algorithm

    Science.gov (United States)

    Russell, P. B.; Kacenelenbogen, M. S.; Livingston, J. M.; Hasekamp, O.; Burton, S. P.; Schuster, G. L.; Redemann, J.; Ramachandran, S.; Holben, B. N.

    2013-12-01

    In this presentation we demonstrate application of a new aerosol classification algorithm to retrievals from the POLDER-3 polarimeter on the PARASOL spacecraft. Motivation and method: Since the development of global aerosol measurements by satellites and AERONET, classification of observed aerosols into several types (e,g., urban-industrial, biomass burning, mineral dust, maritime, and various subtypes or mixtures of these) has proven useful to: understanding aerosol sources, transformations, effects, and feedback mechanisms; improving accuracy of satellite retrievals; and quantifying assessments of aerosol radiative impacts on climate. With ongoing improvements in satellite measurement capability, the number of aerosol parameters retrieved from spaceborne sensors has been growing, from the initial aerosol optical depth at one or a few wavelengths to a list that now includes complex refractive index, single scattering albedo (SSA), and depolarization of backscatter, each at several wavelengths; wavelength dependences of extinction, scattering, absorption, SSA, and backscatter; and several particle size and shape parameters. Making optimal use of these varied data products requires objective, multi-dimensional analysis methods. We describe such a method, which uses a modified Mahalanobis distance to quantify how far a data point described by N aerosol parameters is from each of several prespecified classes. The method makes explicit use of uncertainties in input parameters, treating a point and its N-dimensional uncertainty as an extended data point or pseudo-cluster E. It then uses a modified Mahalanobis distance, DEC, to assign that observation to the class (cluster) C that has minimum DEC from the point (equivalently, the class to which the point has maximum probability of belonging). The method also uses Wilks' overall lambda to indicate how well the input data lend themselves to separation into classes and Wilks' partial lambda to indicate the relative

  5. Gaussian maximum likelihood and contextual classification algorithms for multicrop classification experiments using thematic mapper and multispectral scanner sensor data

    Science.gov (United States)

    Di Zenzo, Silvano; Degloria, Stephen D.; Bernstein, R.; Kolsky, Harwood G.

    1987-01-01

    The paper presents the results of a four-factor two-level analysis of a variance experiment designed to evaluate the combined effect of the improved quality of remote-sensor data and the use of context by the classifier on classification accuracy. The improvement achievable by using the context via relaxation techniques is significantly smaller than that provided by an increase of the radiometric resolution of the sensor from 6 to 8 bits per sample (the relative increase in radiometric resolution of TM relative to MSS). It is almost equal to that achievable by an increase in the spectral coverage as provided by TM relative to MSS.

  6. A global aerosol classification algorithm incorporating multiple satellite data sets of aerosol and trace gas abundances

    Directory of Open Access Journals (Sweden)

    M. J. M. Penning de Vries

    2015-09-01

    Full Text Available Detecting the optical properties of aerosols using passive satellite-borne measurements alone is a difficult task due to the broadband effect of aerosols on the measured spectra and the influences of surface and cloud reflection. We present another approach to determine aerosol type, namely by studying the relationship of aerosol optical depth (AOD with trace gas abundance, aerosol absorption, and mean aerosol size. Our new Global Aerosol Classification Algorithm, GACA, examines relationships between aerosol properties (AOD and extinction Ångström exponent from the Moderate Resolution Imaging Spectroradiometer (MODIS, UV Aerosol Index from the second Global Ozone Monitoring Experiment, GOME-2 and trace gas column densities (NO2, HCHO, SO2 from GOME-2, and CO from MOPITT, the Measurements of Pollution in the Troposphere instrument on a monthly mean basis. First, aerosol types are separated based on size (Ångström exponent and absorption (UV Aerosol Index, then the dominating sources are identified based on mean trace gas columns and their correlation with AOD. In this way, global maps of dominant aerosol type and main source type are constructed for each season and compared with maps of aerosol composition from the global MACC (Monitoring Atmospheric Composition and Climate model. Although GACA cannot correctly characterize transported or mixed aerosols, GACA and MACC show good agreement regarding the global seasonal cycle, particularly for urban/industrial aerosols. The seasonal cycles of both aerosol type and source are also studied in more detail for selected 5° × 5° regions. Again, good agreement between GACA and MACC is found for all regions, but some systematic differences become apparent: the variability of aerosol composition (yearly and/or seasonal is often not well captured by MACC, the amount of mineral dust outside of the dust belt appears to be overestimated, and the abundance of secondary organic aerosols is underestimated in

  7. A global aerosol classification algorithm incorporating multiple satellite data sets of aerosol and trace gas abundances

    Directory of Open Access Journals (Sweden)

    M. J. M. Penning de Vries

    2015-05-01

    Full Text Available Detecting the optical properties of aerosols using passive satellite-borne measurements alone is a difficult task due to the broad-band effect of aerosols on the measured spectra and the influences of surface and cloud reflection. We present another approach to determine aerosol type, namely by studying the relationship of aerosol optical depth (AOD with trace gas abundance, aerosol absorption, and mean aerosol size. Our new Global Aerosol Classification Algorithm, GACA, examines relationships between aerosol properties (AOD and extinction Ångström exponent from the Moderate Resolution Imaging Spectroradiometer (MODIS, UV Aerosol Index from the second Global Ozone Monitoring Experiment, GOME-2 and trace gas column densities (NO2, HCHO, SO2 from GOME-2, and CO from MOPITT, the Measurements of Pollution in the Troposphere instrument on a monthly mean basis. First, aerosol types are separated based on size (Ångström exponent and absorption (UV Aerosol Index, then the dominating sources are identified based on mean trace gas columns and their correlation with AOD. In this way, global maps of dominant aerosol type and main source type are constructed for each season and compared with maps of aerosol composition from the global MACC (Monitoring Atmospheric Composition and Climate model. Although GACA cannot correctly characterize transported or mixed aerosols, GACA and MACC show good agreement regarding the global seasonal cycle, particularly for urban/industrial aerosols. The seasonal cycles of both aerosol type and source are also studied in more detail for selected 5° × 5° regions. Again, good agreement between GACA and MACC is found for all regions, but some systematic differences become apparent: the variability of aerosol composition (yearly and/or seasonal is often not well captured by MACC, the amount of mineral dust outside of the dust belt appears to be overestimated, and the abundance of secondary organic aerosols is underestimated

  8. Classification-based summation of cerebral digital subtraction angiography series for image post-processing algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Schuldhaus, D; Spiegel, M; Polyanskaya, M; Hornegger, J [Pattern Recognition Lab, University Erlangen-Nuremberg (Germany); Redel, T [Siemens AG Healthcare Sector, Forchheim (Germany); Struffert, T; Doerfler, A, E-mail: martin.spiegel@informatik.uni-erlangen.de [Department of Neuroradiology, University Erlangen-Nuremberg (Germany)

    2011-03-21

    X-ray-based 2D digital subtraction angiography (DSA) plays a major role in the diagnosis, treatment planning and assessment of cerebrovascular disease, i.e. aneurysms, arteriovenous malformations and intracranial stenosis. DSA information is increasingly used for secondary image post-processing such as vessel segmentation, registration and comparison to hemodynamic calculation using computational fluid dynamics. Depending on the amount of injected contrast agent and the duration of injection, these DSA series may not exhibit one single DSA image showing the entire vessel tree. The interesting information for these algorithms, however, is usually depicted within a few images. If these images would be combined into one image the complexity of segmentation or registration methods using DSA series would drastically decrease. In this paper, we propose a novel method automatically splitting a DSA series into three parts, i.e. mask, arterial and parenchymal phase, to provide one final image showing all important vessels with less noise and moving artifacts. This final image covers all arterial phase images, either by image summation or by taking the minimum intensities. The phase classification is done by a two-step approach. The mask/arterial phase border is determined by a Perceptron-based method trained from a set of DSA series. The arterial/parenchymal phase border is specified by a threshold-based method. The evaluation of the proposed method is two-sided: (1) comparison between automatic and medical expert-based phase selection and (2) the quality of the final image is measured by gradient magnitudes inside the vessels and signal-to-noise (SNR) outside. Experimental results show a match between expert and automatic phase separation of 93%/50% and an average SNR increase of up to 182% compared to summing up the entire series.

  9. Classification-based summation of cerebral digital subtraction angiography series for image post-processing algorithms

    Science.gov (United States)

    Schuldhaus, D.; Spiegel, M.; Redel, T.; Polyanskaya, M.; Struffert, T.; Hornegger, J.; Doerfler, A.

    2011-03-01

    X-ray-based 2D digital subtraction angiography (DSA) plays a major role in the diagnosis, treatment planning and assessment of cerebrovascular disease, i.e. aneurysms, arteriovenous malformations and intracranial stenosis. DSA information is increasingly used for secondary image post-processing such as vessel segmentation, registration and comparison to hemodynamic calculation using computational fluid dynamics. Depending on the amount of injected contrast agent and the duration of injection, these DSA series may not exhibit one single DSA image showing the entire vessel tree. The interesting information for these algorithms, however, is usually depicted within a few images. If these images would be combined into one image the complexity of segmentation or registration methods using DSA series would drastically decrease. In this paper, we propose a novel method automatically splitting a DSA series into three parts, i.e. mask, arterial and parenchymal phase, to provide one final image showing all important vessels with less noise and moving artifacts. This final image covers all arterial phase images, either by image summation or by taking the minimum intensities. The phase classification is done by a two-step approach. The mask/arterial phase border is determined by a Perceptron-based method trained from a set of DSA series. The arterial/parenchymal phase border is specified by a threshold-based method. The evaluation of the proposed method is two-sided: (1) comparison between automatic and medical expert-based phase selection and (2) the quality of the final image is measured by gradient magnitudes inside the vessels and signal-to-noise (SNR) outside. Experimental results show a match between expert and automatic phase separation of 93%/50% and an average SNR increase of up to 182% compared to summing up the entire series.

  10. Novel round-robin tabu search algorithm for prostate cancer classification and diagnosis using multispectral imagery.

    Science.gov (United States)

    Tahir, Muhammad Atif; Bouridane, Ahmed

    2006-10-01

    Quantitative cell imagery in cancer pathology has progressed greatly in the last 25 years. The application areas are mainly those in which the diagnosis is still critically reliant upon the analysis of biopsy samples, which remains the only conclusive method for making an accurate diagnosis of the disease. Biopsies are usually analyzed by a trained pathologist who, by analyzing the biopsies under a microscope, assesses the normality or malignancy of the samples submitted. Different grades of malignancy correspond to different structural patterns as well as to apparent textures. In the case of prostate cancer, four major groups have to be recognized: stroma, benign prostatic hyperplasia, prostatic intraepithelial neoplasia, and prostatic carcinoma. Recently, multispectral imagery has been used to solve this multiclass problem. Unlike conventional RGB color space, multispectral images allow the acquisition of a large number of spectral bands within the visible spectrum, resulting in a large feature vector size. For such a high dimensionality, pattern recognition techniques suffer from the well-known "curse-of-dimensionality" problem. This paper proposes a novel round-robin tabu search (RR-TS) algorithm to address the curse-of-dimensionality for this multiclass problem. The experiments have been carried out on a number of prostate cancer textured multispectral images, and the results obtained have been assessed and compared with previously reported works. The system achieved 98%-100% classification accuracy when testing on two datasets. It outperformed principal component/linear discriminant classifier (PCA-LDA), tabu search/nearest neighbor classifier (TS-1NN), and bagging/boosting with decision tree (C4.5) classifier.

  11. Classification-based summation of cerebral digital subtraction angiography series for image post-processing algorithms.

    Science.gov (United States)

    Schuldhaus, D; Spiegel, M; Redel, T; Polyanskaya, M; Struffert, T; Hornegger, J; Doerfler, A

    2011-03-21

    X-ray-based 2D digital subtraction angiography (DSA) plays a major role in the diagnosis, treatment planning and assessment of cerebrovascular disease, i.e. aneurysms, arteriovenous malformations and intracranial stenosis. DSA information is increasingly used for secondary image post-processing such as vessel segmentation, registration and comparison to hemodynamic calculation using computational fluid dynamics. Depending on the amount of injected contrast agent and the duration of injection, these DSA series may not exhibit one single DSA image showing the entire vessel tree. The interesting information for these algorithms, however, is usually depicted within a few images. If these images would be combined into one image the complexity of segmentation or registration methods using DSA series would drastically decrease. In this paper, we propose a novel method automatically splitting a DSA series into three parts, i.e. mask, arterial and parenchymal phase, to provide one final image showing all important vessels with less noise and moving artifacts. This final image covers all arterial phase images, either by image summation or by taking the minimum intensities. The phase classification is done by a two-step approach. The mask/arterial phase border is determined by a Perceptron-based method trained from a set of DSA series. The arterial/parenchymal phase border is specified by a threshold-based method. The evaluation of the proposed method is two-sided: (1) comparison between automatic and medical expert-based phase selection and (2) the quality of the final image is measured by gradient magnitudes inside the vessels and signal-to-noise (SNR) outside. Experimental results show a match between expert and automatic phase separation of 93%/50% and an average SNR increase of up to 182% compared to summing up the entire series.

  12. Multispectral imaging burn wound tissue classification system: a comparison of test accuracies between several common machine learning algorithms

    Science.gov (United States)

    Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.

    2016-03-01

    The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care

  13. 基于蚁群规则挖掘算法的多特征遥感数据分类%Study on the multi-feature remote sensing data classification based on ACO rule mining algorithm

    Institute of Scientific and Technical Information of China (English)

    戴芹; 刘建波

    2009-01-01

    蚁群算法作为一种新型的智能优化算法,已经成功应用在许多领域,然而应用蚁群优化算法进行遥感数据处理则是一个新的研究热点.蚁群规则挖掘算法是基于分类规则挖掘进行分类,能够处理多特征的数据.因此,论文将蚁群规则挖掘算法应用到多特征遥感数据分类处理中,并采用北京地区的Landsat TM和Envisat ASAR数据作为实验数据,对选择的遥感数据进行了多特征分类实验.实验结果分别与最大似然分类法、C4.5方法进行对比,分析表明:1)蚁群规则挖掘算法是一种无参数分类的智能方法,具有很好的鲁棒性,2)能够挖掘较简单的分类规则;3)能够充分利用多源遥感数据等.它可以充分利用多特征数据进行土地覆盖分类,从而能够提高分类的效率.%Remote sensing data classification is an important source of land cover map, and remote sensing research focusing on image classification has long attracted the attention of the remote sensing community. For several decades the remote sensing data classification technology has gained a great achievement, but with the more multi-source and multi-di-mensional data, the conventional remote sensing data classification methods based on sta-tistical theory have some weaknesses. For instance, when the remote sensing data does not obey the pre-assumption of normal distribution, the classification result using Maximum Likelihood Classifier (MLC) will deviate from the actual situation, and the classification accuracy will not be satisfied. So in recent years, many artificial intelligence techniques were applied to remote sensing data classification, aiming to reduce the undesired limita-tions of the conventional classification methods.Ant colony algorithm as a novel intelligent optimization algorithm has been used suc-cessfully in many fields, but its application in remote sensing data processing is a new re-search topic. Due to the ant colony rule mining

  14. Prediction of S-Nitrosylation Modification Sites Based on Kernel Sparse Representation Classification and mRMR Algorithm

    Directory of Open Access Journals (Sweden)

    Guohua Huang

    2014-01-01

    Full Text Available Protein S-nitrosylation plays a very important role in a wide variety of cellular biological activities. Hitherto, accurate prediction of S-nitrosylation sites is still of great challenge. In this paper, we presented a framework to computationally predict S-nitrosylation sites based on kernel sparse representation classification and minimum Redundancy Maximum Relevance algorithm. As much as 666 features derived from five categories of amino acid properties and one protein structure feature are used for numerical representation of proteins. A total of 529 protein sequences collected from the open-access databases and published literatures are used to train and test our predictor. Computational results show that our predictor achieves Matthews’ correlation coefficients of 0.1634 and 0.2919 for the training set and the testing set, respectively, which are better than those of k-nearest neighbor algorithm, random forest algorithm, and sparse representation classification algorithm. The experimental results also indicate that 134 optimal features can better represent the peptides of protein S-nitrosylation than the original 666 redundant features. Furthermore, we constructed an independent testing set of 113 protein sequences to evaluate the robustness of our predictor. Experimental result showed that our predictor also yielded good performance on the independent testing set with Matthews’ correlation coefficients of 0.2239.

  15. A Spectral Signature Shape-Based Algorithm for Landsat Image Classification

    Directory of Open Access Journals (Sweden)

    Yuanyuan Chen

    2016-08-01

    Full Text Available Land-cover datasets are crucial for earth system modeling and human-nature interaction research at local, regional and global scales. They can be obtained from remotely sensed data using image classification methods. However, in processes of image classification, spectral values have received considerable attention for most classification methods, while the spectral curve shape has seldom been used because it is difficult to be quantified. This study presents a classification method based on the observation that the spectral curve is composed of segments and certain extreme values. The presented classification method quantifies the spectral curve shape and takes full use of the spectral shape differences among land covers to classify remotely sensed images. Using this method, classification maps from TM (Thematic mapper data were obtained with an overall accuracy of 0.834 and 0.854 for two respective test areas. The approach presented in this paper, which differs from previous image classification methods that were mostly concerned with spectral “value” similarity characteristics, emphasizes the "shape" similarity characteristics of the spectral curve. Moreover, this study will be helpful for classification research on hyperspectral and multi-temporal images.

  16. Data classification algorithm on multi-manifold%多流形上的数据分类算法

    Institute of Scientific and Technical Information of China (English)

    符茂胜; 罗斌; 孔敏; 刘仁金

    2011-01-01

    Unlike most traditional manifold-based data classification algorithms assume that all the data points are on a single manifold, it supposes that multiple classes data may reside on different manifolds.A data classification algorithm on multiple manifolds is presented.The algorithm roughly divides into two steps: learning process and testing process.In learning process, the manifolds are firstly learned for each class separately using linear manifold learning,and then low dimensionality coordinates and mapping matrix of the training data is obtained.In testing process,classification is performed using minimum reconstruction error between test data and its k-nearest neighbors in embedding space.The experimental results on both synthetic data and coil-20 databases show the effectiveness of the proposed algorithm.%与传统的基于流形的数据分类算法大都假设数据位于同一个流形上不同,假设多类数据分别位于不同的流形上.提出了一种基于多流形的数据分类算法,算法大致分为两步:学习过程和测试过程.学习过程采用线性流形学习方法获得训练数据的低维坐标和映射矩阵,测试阶段则利用嵌入空间中对应测试数据点与其k个邻域点的重构误差值来决定其类别.在人工合成数据和coil-20数据库上的实验都表明了该算法的有效性.

  17. Comparision of Clustering Algorithms usingNeural Network Classifier for Satellite Image Classification

    Directory of Open Access Journals (Sweden)

    S.Praveena

    2015-06-01

    Full Text Available This paper presents a hybrid clustering algorithm and feed-forward neural network classifier for land-cover mapping of trees, shade, building and road. It starts with the single step preprocessing procedure to make the image suitable for segmentation. The pre-processed image is segmented using the hybrid genetic-Artificial Bee Colony(ABC algorithm that is developed by hybridizing the ABC and FCM to obtain the effective segmentation in satellite image and classified using neural network . The performance of the proposed hybrid algorithm is compared with the algorithms like, k-means, Fuzzy C means(FCM, Moving K-means, Artificial Bee Colony(ABC algorithm, ABC-GA algorithm, Moving KFCM and KFCM algorithm.

  18. An Automated Cropland Classification Algorithm (ACCA for Tajikistan by Combining Landsat, MODIS, and Secondary Data

    Directory of Open Access Journals (Sweden)

    Prasad S. Thenkabail

    2012-09-01

    Full Text Available The overarching goal of this research was to develop and demonstrate an automated Cropland Classification Algorithm (ACCA that will rapidly, routinely, and accurately classify agricultural cropland extent, areas, and characteristics (e.g., irrigated vs. rainfed over large areas such as a country or a region through combination of multi-sensor remote sensing and secondary data. In this research, a rule-based ACCA was conceptualized, developed, and demonstrated for the country of Tajikistan using mega file data cubes (MFDCs involving data from Landsat Global Land Survey (GLS, Landsat Enhanced Thematic Mapper Plus (ETM+ 30 m, Moderate Resolution Imaging Spectroradiometer (MODIS 250 m time-series, a suite of secondary data (e.g., elevation, slope, precipitation, temperature, and in situ data. First, the process involved producing an accurate reference (or truth cropland layer (TCL, consisting of cropland extent, areas, and irrigated vs. rainfed cropland areas, for the entire country of Tajikistan based on MFDC of year 2005 (MFDC2005. The methods involved in producing TCL included using ISOCLASS clustering, Tasseled Cap bi-spectral plots, spectro-temporal characteristics from MODIS 250 m monthly normalized difference vegetation index (NDVI maximum value composites (MVC time-series, and textural characteristics of higher resolution imagery. The TCL statistics accurately matched with the national statistics of Tajikistan for irrigated and rainfed croplands, where about 70% of croplands were irrigated and the rest rainfed. Second, a rule-based ACCA was developed to replicate the TCL accurately (~80% producer’s and user’s accuracies or within 20% quantity disagreement involving about 10 million Landsat 30 m sized cropland pixels of Tajikistan. Development of ACCA was an iterative process involving series of rules that are coded, refined, tweaked, and re-coded till ACCA derived croplands (ACLs match accurately with TCLs. Third, the ACCA derived

  19. An Automated Cropland Classification Algorithm (ACCA) for Tajikistan by combining Landsat, MODIS, and secondary data

    Science.gov (United States)

    Thenkabail, Prasad S.; Wu, Zhuoting

    2012-01-01

    The overarching goal of this research was to develop and demonstrate an automated Cropland Classification Algorithm (ACCA) that will rapidly, routinely, and accurately classify agricultural cropland extent, areas, and characteristics (e.g., irrigated vs. rainfed) over large areas such as a country or a region through combination of multi-sensor remote sensing and secondary data. In this research, a rule-based ACCA was conceptualized, developed, and demonstrated for the country of Tajikistan using mega file data cubes (MFDCs) involving data from Landsat Global Land Survey (GLS), Landsat Enhanced Thematic Mapper Plus (ETM+) 30 m, Moderate Resolution Imaging Spectroradiometer (MODIS) 250 m time-series, a suite of secondary data (e.g., elevation, slope, precipitation, temperature), and in situ data. First, the process involved producing an accurate reference (or truth) cropland layer (TCL), consisting of cropland extent, areas, and irrigated vs. rainfed cropland areas, for the entire country of Tajikistan based on MFDC of year 2005 (MFDC2005). The methods involved in producing TCL included using ISOCLASS clustering, Tasseled Cap bi-spectral plots, spectro-temporal characteristics from MODIS 250 m monthly normalized difference vegetation index (NDVI) maximum value composites (MVC) time-series, and textural characteristics of higher resolution imagery. The TCL statistics accurately matched with the national statistics of Tajikistan for irrigated and rainfed croplands, where about 70% of croplands were irrigated and the rest rainfed. Second, a rule-based ACCA was developed to replicate the TCL accurately (~80% producer’s and user’s accuracies or within 20% quantity disagreement involving about 10 million Landsat 30 m sized cropland pixels of Tajikistan). Development of ACCA was an iterative process involving series of rules that are coded, refined, tweaked, and re-coded till ACCA derived croplands (ACLs) match accurately with TCLs. Third, the ACCA derived cropland

  20. Development and comparative assessment of Raman spectroscopic classification algorithms for lesion discrimination in stereotactic breast biopsies with microcalcifications.

    Science.gov (United States)

    Dingari, Narahara Chari; Barman, Ishan; Saha, Anushree; McGee, Sasha; Galindo, Luis H; Liu, Wendy; Plecha, Donna; Klein, Nina; Dasari, Ramachandra Rao; Fitzmaurice, Maryann

    2013-04-01

    Microcalcifications are an early mammographic sign of breast cancer and a target for stereotactic breast needle biopsy. Here, we develop and compare different approaches for developing Raman classification algorithms to diagnose invasive and in situ breast cancer, fibrocystic change and fibroadenoma that can be associated with microcalcifications. In this study, Raman spectra were acquired from tissue cores obtained from fresh breast biopsies and analyzed using a constituent-based breast model. Diagnostic algorithms based on the breast model fit coefficients were devised using logistic regression, C4.5 decision tree classification, k-nearest neighbor (k -NN) and support vector machine (SVM) analysis, and subjected to leave-one-out cross validation. The best performing algorithm was based on SVM analysis (with radial basis function), which yielded a positive predictive value of 100% and negative predictive value of 96% for cancer diagnosis. Importantly, these results demonstrate that Raman spectroscopy provides adequate diagnostic information for lesion discrimination even in the presence of microcalcifications, which to the best of our knowledge has not been previously reported.

  1. Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery

    OpenAIRE

    Li, Guiying; Lu, Dengsheng; MORAN, EMILIO; Hetrick, Scott

    2011-01-01

    This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms – maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based clas...

  2. Automated Classification and Correlation of Drill Cores using High-Resolution Hyperspectral Images and Supervised Pattern Classification Algorithms. Applications to Paleoseismology

    Science.gov (United States)

    Ragona, D. E.; Minster, B.; Rockwell, T.; Jasso, H.

    2006-12-01

    The standard methodology to describe, classify and correlate geologic materials in the field or lab rely on physical inspection of samples, sometimes with the assistance of conventional analytical techniques (e. g. XRD, microscopy, particle size analysis). This is commonly both time-consuming and inherently subjective. Many geological materials share identical visible properties (e.g. fine grained materials, alteration minerals) and therefore cannot be mapped using the human eye alone. Recent investigations have shown that ground- based hyperspectral imaging provides an effective method to study and digitally store stratigraphic and structural data from cores or field exposures. Neural networks and Naive Bayesian classifiers supply a variety of well-established techniques towards pattern recognition, especially for data examples with high- dimensionality input-outputs. In this poster, we present a new methodology for automatic mapping of sedimentary stratigraphy in the lab (drill cores, samples) or the field (outcrops, exposures) using short wave infrared (SWIR) hyperspectral images and these two supervised classification algorithms. High-spatial/spectral resolution data from large sediment samples (drill cores) from a paleoseismic excavation site were collected using a portable hyperspectral scanner with 245 continuous channels measured across the 960 to 2404 nm spectral range. The data were corrected for geometric and radiometric distortions and pre-processed to obtain reflectance at each pixel of the images. We built an example set using hundreds of reflectance spectra collected from the sediment core images. The examples were grouped into eight classes corresponding to materials found in the samples. We constructed two additional example sets by computing the 2-norm normalization, the derivative of the smoothed original reflectance examples. Each example set was divided into four subsets: training, training test, verification and validation. A multi

  3. Feasibility of Genetic Algorithm for Textile Defect Classification Using Neural Network

    Directory of Open Access Journals (Sweden)

    Md. Tarek Habib

    2012-08-01

    Full Text Available The global market for textile industry is highly competitive nowadays. Quality control in productionprocess in textile industry has been a key factor for retaining existence in such competitive market.Automated textile inspection systems are very useful in this respect, because manual inspection is timeconsuming and not accurate enough. Hence, automated textile inspection systems have been drawing plentyof attention of the researchers of different countries in order to replace manual inspection. Defect detectionand defect classification are the two major problems that are posed by the research of automated textileinspection systems. In this paper, we perform an extensive investigation on the applicability of geneticalgorithm (GA in the context of textile defect classification using neural network (NN. We observe theeffect of tuning different network parameters and explain the reasons. We empirically find a suitable NNmodel in the context of textile defect classification. We compare the performance of this model with that ofthe classification models implemented by others.

  4. TEXTURE BASED LAND COVER CLASSIFICATION ALGORITHM USING GABOR WAVELET AND ANFIS CLASSIFIER

    Directory of Open Access Journals (Sweden)

    S. Jenicka

    2016-05-01

    Full Text Available Texture features play a predominant role in land cover classification of remotely sensed images. In this study, for extracting texture features from data intensive remotely sensed image, Gabor wavelet has been used. Gabor wavelet transform filters frequency components of an image through decomposition and produces useful features. For classification of fuzzy land cover patterns in the remotely sensed image, Adaptive Neuro Fuzzy Inference System (ANFIS has been used. The strength of ANFIS classifier is that it combines the merits of fuzzy logic and neural network. Hence in this article, land cover classification of remotely sensed image has been performed using Gabor wavelet and ANFIS classifier. The classification accuracy of the classified image obtained is found to be 92.8%.

  5. Hierarchical, multi-sensor based classification of daily life activities: comparison with state-of-the-art algorithms using a benchmark dataset.

    Science.gov (United States)

    Leutheuser, Heike; Schuldhaus, Dominik; Eskofier, Bjoern M

    2013-01-01

    Insufficient physical activity is the 4th leading risk factor for mortality. Methods for assessing the individual daily life activity (DLA) are of major interest in order to monitor the current health status and to provide feedback about the individual quality of life. The conventional assessment of DLAs with self-reports induces problems like reliability, validity, and sensitivity. The assessment of DLAs with small and light-weight wearable sensors (e.g. inertial measurement units) provides a reliable and objective method. State-of-the-art human physical activity classification systems differ in e.g. the number and kind of sensors, the performed activities, and the sampling rate. Hence, it is difficult to compare newly proposed classification algorithms to existing approaches in literature and no commonly used dataset exists. We generated a publicly available benchmark dataset for the classification of DLAs. Inertial data were recorded with four sensor nodes, each consisting of a triaxial accelerometer and a triaxial gyroscope, placed on wrist, hip, chest, and ankle. Further, we developed a novel, hierarchical, multi-sensor based classification system for the distinction of a large set of DLAs. Our hierarchical classification system reached an overall mean classification rate of 89.6% and was diligently compared to existing state-of-the-art algorithms using our benchmark dataset. For future research, the dataset can be used in the evaluation process of new classification algorithms and could speed up the process of getting the best performing and most appropriate DLA classification system.

  6. DOA Estimation of Low Altitude Target Based on Adaptive Step Glowworm Swarm Optimization-multiple Signal Classification Algorithm

    Directory of Open Access Journals (Sweden)

    Zhou Hao

    2015-06-01

    Full Text Available The traditional MUltiple SIgnal Classification (MUSIC algorithm requires significant computational effort and can not be employed for the Direction Of Arrival (DOA estimation of targets in a low-altitude multipath environment. As such, a novel MUSIC approach is proposed on the basis of the algorithm of Adaptive Step Glowworm Swarm Optimization (ASGSO. The virtual spatial smoothing of the matrix formed by each snapshot is used to realize the decorrelation of the multipath signal and the establishment of a fullorder correlation matrix. ASGSO optimizes the function and estimates the elevation of the target. The simulation results suggest that the proposed method can overcome the low altitude multipath effect and estimate the DOA of target readily and precisely without radar effective aperture loss.

  7. Modeling excitation-emission fluorescence matrices with pattern recognition algorithms for classification of Argentine white wines according grape variety.

    Science.gov (United States)

    Azcarate, Silvana M; de Araújo Gomes, Adriano; Alcaraz, Mirta R; Ugulino de Araújo, Mário C; Camiña, José M; Goicoechea, Héctor C

    2015-10-01

    This paper reports the modeling of excitation-emission matrices for classification of Argentinean white wines according to the grape variety employing chemometric tools for pattern recognition. The discriminative power of the data was first investigated using Principal Component Analysis (PCA) and Parallel Factor Analysis (PARAFAC). The score plots showed strong overlapping between classes. A forty-one samples set was partitioned into training and test sets by the Kennard-Stone algorithm. The algorithms evaluated were SIMCA, N- and U-PLS-DA and SPA-LDA. The fit of the implemented models was assessed by mean of accuracy, sensitivity and specificity. These models were then used to assign the type of grape of the wines corresponding to the twenty samples test set. The best results were obtained for U-PLS-DA and SPA-LDA with 76% and 80% accuracy.

  8. Analysis and Classification of Stride Patterns Associated with Children Development Using Gait Signal Dynamics Parameters and Ensemble Learning Algorithms.

    Science.gov (United States)

    Wu, Meihong; Liao, Lifang; Luo, Xin; Ye, Xiaoquan; Yao, Yuchen; Chen, Pinnan; Shi, Lei; Huang, Hui; Wu, Yunfeng

    2016-01-01

    Measuring stride variability and dynamics in children is useful for the quantitative study of gait maturation and neuromotor development in childhood and adolescence. In this paper, we computed the sample entropy (SampEn) and average stride interval (ASI) parameters to quantify the stride series of 50 gender-matched children participants in three age groups. We also normalized the SampEn and ASI values by leg length and body mass for each participant, respectively. Results show that the original and normalized SampEn values consistently decrease over the significance level of the Mann-Whitney U test (p algorithms were used to effectively distinguish the children's gait patterns. These ensemble learning algorithms both provided excellent gait classification results in terms of overall accuracy (≥90%), recall (≥0.8), and precision (≥0.8077).

  9. Multi-Agent Pathfinding with n Agents on Graphs with n Vertices: Combinatorial Classification and Tight Algorithmic Bounds

    DEFF Research Database (Denmark)

    Förster, Klaus-Tycho; Groner, Linus; Hoefler, Torsten

    2017-01-01

    We investigate the multi-agent pathfinding (MAPF) problem with $n$ agents on graphs with $n$ vertices: Each agent has a unique start and goal vertex, with the objective of moving all agents in parallel movements to their goal s.t.~each vertex and each edge may only be used by one agent at a time....... We give a combinatorial classification of all graphs where this problem is solvable in general, including cases where the solvability depends on the initial agent placement. Furthermore, we present an algorithm solving the MAPF problem in our setting, requiring O(n²) rounds, or O(n³) moves...... of individual agents. Complementing these results, we show that there are graphs where Omega(n²) rounds and Omega(n³) moves are required for any algorithm....

  10. Comparative estimate of the effectiveness of different algorithms for the radar classification of thunderstorms and showers

    Science.gov (United States)

    Linev, A. G.; Oprishko, V. S.; Popova, N. D.; Salman, Y. M.

    1975-01-01

    Several schemes for discriminating severe weather phenomena with the aid of different algorithms are examined. The schemes were tested on the same sample. A comparative estimate of the effectiveness of the different algorithms for classifying thunderstorms and showers is carried out.

  11. Greedy heuristic algorithm for solving series of eee components classification problems*

    Science.gov (United States)

    Kazakovtsev, A. L.; Antamoshkin, A. N.; Fedosov, V. V.

    2016-04-01

    Algorithms based on using the agglomerative greedy heuristics demonstrate precise and stable results for clustering problems based on k- means and p-median models. Such algorithms are successfully implemented in the processes of production of specialized EEE components for using in space systems which include testing each EEE device and detection of homogeneous production batches of the EEE components based on results of the tests using p-median models. In this paper, authors propose a new version of the genetic algorithm with the greedy agglomerative heuristic which allows solving series of problems. Such algorithm is useful for solving the k-means and p-median clustering problems when the number of clusters is unknown. Computational experiments on real data show that the preciseness of the result decreases insignificantly in comparison with the initial genetic algorithm for solving a single problem.

  12. Algorithms for Hyperspectral Endmember Extraction and Signature Classification with Morphological Dendritic Networks

    Science.gov (United States)

    Schmalz, M.; Ritter, G.

    Accurate multispectral or hyperspectral signature classification is key to the nonimaging detection and recognition of space objects. Additionally, signature classification accuracy depends on accurate spectral endmember determination [1]. Previous approaches to endmember computation and signature classification were based on linear operators or neural networks (NNs) expressed in terms of the algebra (R, +, x) [1,2]. Unfortunately, class separation in these methods tends to be suboptimal, and the number of signatures that can be accurately classified often depends linearly on the number of NN inputs. This can lead to poor endmember distinction, as well as potentially significant classification errors in the presence of noise or densely interleaved signatures. In contrast to traditional CNNs, autoassociative morphological memories (AMM) are a construct similar to Hopfield autoassociatived memories defined on the (R, +, ?,?) lattice algebra [3]. Unlimited storage and perfect recall of noiseless real valued patterns has been proven for AMMs [4]. However, AMMs suffer from sensitivity to specific noise models, that can be characterized as erosive and dilative noise. On the other hand, the prior definition of a set of endmembers corresponds to material spectra lying on vertices of the minimum convex region covering the image data. These vertices can be characterized as morphologically independent patterns. It has further been shown that AMMs can be based on dendritic computation [3,6]. These techniques yield improved accuracy and class segmentation/separation ability in the presence of highly interleaved signature data. In this paper, we present a procedure for endmember determination based on AMM noise sensitivity, which employs morphological dendritic computation. We show that detected endmembers can be exploited by AMM based classification techniques, to achieve accurate signature classification in the presence of noise, closely spaced or interleaved signatures, and

  13. A sequential nonparametric pattern classification algorithm based on the Wald SPRT. [Sequential Probability Ratio Test

    Science.gov (United States)

    Poage, J. L.

    1975-01-01

    A sequential nonparametric pattern classification procedure is presented. The method presented is an estimated version of the Wald sequential probability ratio test (SPRT). This method utilizes density function estimates, and the density estimate used is discussed, including a proof of convergence in probability of the estimate to the true density function. The classification procedure proposed makes use of the theory of order statistics, and estimates of the probabilities of misclassification are given. The procedure was tested on discriminating between two classes of Gaussian samples and on discriminating between two kinds of electroencephalogram (EEG) responses.

  14. A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data

    Directory of Open Access Journals (Sweden)

    Sheng Yang

    2015-01-01

    Full Text Available Sequencing is widely used to discover associations between microRNAs (miRNAs and diseases. However, the negative binomial distribution (NB and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF, was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96 from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes.

  15. A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data.

    Science.gov (United States)

    Yang, Sheng; Guo, Li; Shao, Fang; Zhao, Yang; Chen, Feng

    2015-01-01

    Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS) algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF), was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96) from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes.

  16. Automated condition-invariable neurite segmentation and synapse classification using textural analysis-based machine-learning algorithms.

    Science.gov (United States)

    Kandaswamy, Umasankar; Rotman, Ziv; Watt, Dana; Schillebeeckx, Ian; Cavalli, Valeria; Klyachko, Vitaly A

    2013-02-15

    High-resolution live-cell imaging studies of neuronal structure and function are characterized by large variability in image acquisition conditions due to background and sample variations as well as low signal-to-noise ratio. The lack of automated image analysis tools that can be generalized for varying image acquisition conditions represents one of the main challenges in the field of biomedical image analysis. Specifically, segmentation of the axonal/dendritic arborizations in brightfield or fluorescence imaging studies is extremely labor-intensive and still performed mostly manually. Here we describe a fully automated machine-learning approach based on textural analysis algorithms for segmenting neuronal arborizations in high-resolution brightfield images of live cultured neurons. We compare performance of our algorithm to manual segmentation and show that it combines 90% accuracy, with similarly high levels of specificity and sensitivity. Moreover, the algorithm maintains high performance levels under a wide range of image acquisition conditions indicating that it is largely condition-invariable. We further describe an application of this algorithm to fully automated synapse localization and classification in fluorescence imaging studies based on synaptic activity. Textural analysis-based machine-learning approach thus offers a high performance condition-invariable tool for automated neurite segmentation.

  17. Comparative analysis of different implementations of a parallel algorithm for automatic target detection and classification of hyperspectral images

    Science.gov (United States)

    Paz, Abel; Plaza, Antonio; Plaza, Javier

    2009-08-01

    Automatic target detection in hyperspectral images is a task that has attracted a lot of attention recently. In the last few years, several algoritms have been developed for this purpose, including the well-known RX algorithm for anomaly detection, or the automatic target detection and classification algorithm (ATDCA), which uses an orthogonal subspace projection (OSP) approach to extract a set of spectrally distinct targets automatically from the input hyperspectral data. Depending on the complexity and dimensionality of the analyzed image scene, the target/anomaly detection process may be computationally very expensive, a fact that limits the possibility of utilizing this process in time-critical applications. In this paper, we develop computationally efficient parallel versions of both the RX and ATDCA algorithms for near real-time exploitation of these algorithms. In the case of ATGP, we use several distance metrics in addition to the OSP approach. The parallel versions are quantitatively compared in terms of target detection accuracy, using hyperspectral data collected by NASA's Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) over the World Trade Center in New York, five days after the terrorist attack of September 11th, 2001, and also in terms of parallel performance, using a massively Beowulf cluster available at NASA's Goddard Space Flight Center in Maryland.

  18. An enhanced algorithm for knee joint sound classification using feature extraction based on time-frequency analysis.

    Science.gov (United States)

    Kim, Keo Sik; Seo, Jeong Hwan; Kang, Jin U; Song, Chul Gyu

    2009-05-01

    Vibroarthrographic (VAG) signals, generated by human knee movement, are non-stationary and multi-component in nature and their time-frequency distribution (TFD) provides a powerful means to analyze such signals. The objective of this paper is to improve the classification accuracy of the features, obtained from the TFD of normal and abnormal VAG signals, using segmentation by the dynamic time warping (DTW) and denoising algorithm by the singular value decomposition (SVD). VAG and knee angle signals, recorded simultaneously during one flexion and one extension of the knee, were segmented and normalized at 0.5 Hz by the DTW method. Also, the noise within the TFD of the segmented VAG signals was reduced by the SVD algorithm, and a back-propagation neural network (BPNN) was used to classify the normal and abnormal VAG signals. The characteristic parameters of VAG signals consist of the energy, energy spread, frequency and frequency spread parameter extracted by the TFD. A total of 1408 segments (normal 1031, abnormal 377) were used for training and evaluating the BPNN. As a result, the average classification accuracy was 91.4 (standard deviation +/-1.7) %. The proposed method showed good potential for the non-invasive diagnosis and monitoring of joint disorders such as osteoarthritis.

  19. A Benchmark Data Set to Evaluate the Illumination Robustness of Image Processing Algorithms for Object Segmentation and Classification.

    Science.gov (United States)

    Khan, Arif Ul Maula; Mikut, Ralf; Reischl, Markus

    2015-01-01

    Developers of image processing routines rely on benchmark data sets to give qualitative comparisons of new image analysis algorithms and pipelines. Such data sets need to include artifacts in order to occlude and distort the required information to be extracted from an image. Robustness, the quality of an algorithm related to the amount of distortion is often important. However, using available benchmark data sets an evaluation of illumination robustness is difficult or even not possible due to missing ground truth data about object margins and classes and missing information about the distortion. We present a new framework for robustness evaluation. The key aspect is an image benchmark containing 9 object classes and the required ground truth for segmentation and classification. Varying levels of shading and background noise are integrated to distort the data set. To quantify the illumination robustness, we provide measures for image quality, segmentation and classification success and robustness. We set a high value on giving users easy access to the new benchmark, therefore, all routines are provided within a software package, but can as well easily be replaced to emphasize other aspects.

  20. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

    DEFF Research Database (Denmark)

    Yoo, C.; Gernaey, Krist

    2008-01-01

    In this paper, a new supervised clustering and classification method is proposed. First, the application of discriminant partial least squares (DPLS) for the selection of a minimum number of key genes is applied on a gene expression microarray data set. Second, supervised hierarchical clustering ...

  1. Classification of positive blood cultures: computer algorithms versus physicians' assessment - development of tools for surveillance of bloodstream infection prognosis using population-based laboratory databases

    Directory of Open Access Journals (Sweden)

    Gradel Kim O

    2012-09-01

    Full Text Available Abstract Background Information from blood cultures is utilized for infection control, public health surveillance, and clinical outcome research. This information can be enriched by physicians’ assessments of positive blood cultures, which are, however, often available from selected patient groups or pathogens only. The aim of this work was to determine whether patients with positive blood cultures can be classified effectively for outcome research in epidemiological studies by the use of administrative data and computer algorithms, taking physicians’ assessments as reference. Methods Physicians’ assessments of positive blood cultures were routinely recorded at two Danish hospitals from 2006 through 2008. The physicians’ assessments classified positive blood cultures as: a contamination or bloodstream infection; b bloodstream infection as mono- or polymicrobial; c bloodstream infection as community- or hospital-onset; d community-onset bloodstream infection as healthcare-associated or not. We applied the computer algorithms to data from laboratory databases and the Danish National Patient Registry to classify the same groups and compared these with the physicians’ assessments as reference episodes. For each classification, we tabulated episodes derived by the physicians’ assessment and the computer algorithm and compared 30-day mortality between concordant and discrepant groups with adjustment for age, gender, and comorbidity. Results Physicians derived 9,482 reference episodes from 21,705 positive blood cultures. The agreement between computer algorithms and physicians’ assessments was high for contamination vs. bloodstream infection (8,966/9,482 reference episodes [96.6%], Kappa = 0.83 and mono- vs. polymicrobial bloodstream infection (6,932/7,288 reference episodes [95.2%], Kappa = 0.76, but lower for community- vs. hospital-onset bloodstream infection (6,056/7,288 reference episodes [83.1%], Kappa = 0.57 and

  2. Based on classification algorithm software DCA%基于分类算法的软件缺陷原因分析

    Institute of Scientific and Technical Information of China (English)

    戴春博; 傅蓉; 马力

    2012-01-01

    软件缺陷预防是软件质量保证的重要手段,而软件缺陷原因分析又是软件缺陷预防的核心任务.针对当前的缺陷原因分析方法对大型项目缺陷数据无法快速、深入地定位软件缺陷原因的问题,提出了一种基于机器学习分类算法的缺陷原因分析方法.通过对当前两种原因分析方法的比较,建立改进的缺陷量化方法;然后比较主流的分类算法,应用算法对量化数据分类.实验结果表明,该方法有较好的实用效果,同时极大的减少了分析代价.%Software defect prevention is an important matter in software quality assurance, in which software defect cause analysis is the core task. Aimed at the problem of the current methods, which could not be used to analyse defect cause quickly and go into the root of the defect in large projects, a method of defect cause anylysis based on machine learning classification algorithm is presented. An improved method of defect quantization is built through comparing the current two methods. Then common classification algorithms are compared, and the quantized data is classified. The experiment shows that the method presented has good effect and reduce the analysis cost greatly.

  3. Classification of juvenile myoclonic epilepsy data acquired through scanning electromyography with machine learning algorithms.

    Science.gov (United States)

    Goker, Imran; Osman, Onur; Ozekes, Serhat; Baslo, M Baris; Ertas, Mustafa; Ulgen, Yekta

    2012-10-01

    In this paper, classification of Juvenile Myoclonic Epilepsy (JME) patients and healthy volunteers included into Normal Control (NC) groups was established using Feed-Forward Neural Networks (NN), Support Vector Machines (SVM), Decision Trees (DT), and Naïve Bayes (NB) methods by utilizing the data obtained through the scanning EMG method used in a clinical study. An experimental setup was built for this purpose. 105 motor units were measured. 44 of them belonged to JME group consisting of 9 patients and 61 of them belonged to NC group comprising ten healthy volunteers. k-fold cross validation was applied to train and test the models. ROC curves were drawn for k values of 4, 6, 8 and 10. 100% of detection sensitivity was obtained for DT, NN, and NB classification methods. The lowest FP number, which was obtained by NN, was 5.

  4. Hybrid Medical Image Classification Using Association Rule Mining with Decision Tree Algorithm

    OpenAIRE

    Rajendran, P.; M.Madheswaran

    2010-01-01

    The main focus of image mining in the proposed method is concerned with the classification of brain tumor in the CT scan brain images. The major steps involved in the system are: pre-processing, feature extraction, association rule mining and hybrid classifier. The pre-processing step has been done using the median filtering process and edge features have been extracted using canny edge detection technique. The two image mining approaches with a hybrid manner have been proposed in this paper....

  5. A comparison of classification techniques for glacier change detection using multispectral images

    Directory of Open Access Journals (Sweden)

    Rahul Nijhawan

    2016-09-01

    Full Text Available Main aim of this paper is to compare the classification accuracies of glacier change detection by following classifiers: sub-pixel classification algorithm, indices based supervised classification and object based algorithm using Landsat imageries. It was observed that shadow effect was not removed in sub-pixel based classification which was removed by the indices method. Further the accuracy was improved by object based classification. Objective of the paper is to analyse different classification algorithms and interpret which one gives the best results in mountainous regions. The study showed that object based method was best in mountainous regions as optimum results were obtained in the shadowed covered regions.

  6. Analytical Study of Some Selected Classification Algorithms in WEKA Using Real Crime Data

    Directory of Open Access Journals (Sweden)

    Obuandike Georgina N.

    2015-12-01

    Full Text Available Data mining in the field of computer science is an answered prayer to the demand of this digital age. It is used to unravel hidden information from large volumes of data usually kept in data repositories to help improve management decision making. Classification is an essential task in data mining which is used to predict unknown class labels. It has been applied in the classification of different types of data. There are different techniques that can be applied in building a classification model. In this study the performance of these techniques such as J48 which is a type of decision tree classifier, Naïve Bayesian is a classifier that applies probability functions and ZeroR is a rule induction classifier are used. These classifiers are tested using real crime data collected from Nigeria Prisons Service. The metrics used to measure the performance of each classifier include accuracy, time, True Positive Rate (TP Rate, False Positive (FP Rate, Kappa Statistic, Precision and Recall. The study showed that the J48 classifier has the highest accuracy compared to other two classifiers in consideration. Choosing the right classifier for data mining task will help increase the mining accuracy.

  7. REAL TIME CLASSIFICATION AND CLUSTERING OF IDS ALERTS USING MACHINE LEARNING ALGORITHMS

    Directory of Open Access Journals (Sweden)

    T. Subbulakshmi

    2010-01-01

    Full Text Available Intrusion Detection Systems (IDS monitor a secured network for the evidence of malicious activities originating either inside or outside. Upon identifying a suspicious traffic, IDS generates and logs an alert. Unfortunately, most of the alerts generated are either false positive, i.e. benign traffic that has been classified as intrusions, or irrelevant, i.e. attacks that are not successful. The abundance of false positive alerts makes it difficult for the security analyst to find successful attacks and take remedial action. This paper describes a two phase automatic alert classification system to assist the human analyst in identifying the false positives. In the first phase, the alerts collected from one or more sensors are normalized and similar alerts are grouped to form a meta-alert. These meta-alerts are passively verified with an asset database to find out irrelevant alerts. In addition, an optional alert generalization is also performed for root cause analysis and thereby reduces false positives with human interaction. In the second phase, the reduced alerts are labeled and passed to an alert classifier which uses machine learning techniques for building the classification rules. This helps the analyst in automatic classification of the alerts. The system is tested in real environments and found to be effective in reducing the number of alerts as well as false positives dramatically, and thereby reducing the workload of human analyst.

  8. Scalable Algorithms for Unsupervised Classification and Anomaly Detection in Large Geospatiotemporal Data Sets

    Science.gov (United States)

    Mills, R. T.; Hoffman, F. M.; Kumar, J.

    2015-12-01

    The increasing availability of high-resolution geospatiotemporal datasets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery and mining of ecological data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe some unsupervised knowledge discovery and anomaly detection approaches based on highly scalable parallel algorithms for k-means clustering and singular value decomposition, consider a few practical applications thereof to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.

  9. KohonAnts: A Self-Organizing Ant Algorithm for Clustering and Pattern Classification

    CERN Document Server

    Fernandes, C; Merelo, J J; Ramos, V; Laredo, J L J

    2008-01-01

    In this paper we introduce a new ant-based method that takes advantage of the cooperative self-organization of Ant Colony Systems to create a naturally inspired clustering and pattern recognition method. The approach considers each data item as an ant, which moves inside a grid changing the cells it goes through, in a fashion similar to Kohonen's Self-Organizing Maps. The resulting algorithm is conceptually more simple, takes less free parameters than other ant-based clustering algorithms, and, after some parameter tuning, yields very good results on some benchmark problems.

  10. CLASSIFICATION OF NEURAL NETWORK FOR TECHNICAL CONDITION OF TURBOFAN ENGINES BASED ON HYBRID ALGORITHM

    Directory of Open Access Journals (Sweden)

    Valentin Potapov

    2016-12-01

    Full Text Available Purpose: This work presents a method of diagnosing the technical condition of turbofan engines using hybrid neural network algorithm based on software developed for the analysis of data obtained in the aircraft life. Methods: allows the engine diagnostics with deep recognition to the structural assembly in the presence of single structural damage components of the engine running and the multifaceted damage. Results: of the optimization of neural network structure to solve the problems of evaluating technical state of the bypass turbofan engine, when used with genetic algorithms.

  11. Analysis of family health history based risk assessment algorithms: classification and data requirements.

    Science.gov (United States)

    Ranade-Kharkar, Pallavi; Del Fiol, Guilherme; Williams, Janet L; Hulse, Nathan C; Haug, Peter

    2013-01-01

    Family Health History (FHH) is a valuable and potentially low-cost tool for risk assessment and diagnosis in patient-centered healthcare. In this study, we identified and analyzed existing FHH-based risk assessment algorithms (RAAs) for cardio-vascular disease (CVD) and colorectal cancer (CRC) to guide implementers of electronic health record (EHR) systems regarding the data requirements for computing risk using these algorithms. We found a core set of data elements that are required by most RAAs. While some of these data are available in EHR systems, the patients can be empowered to contribute the remainder.

  12. A Novel Detection and Classification Algorithm for Power Quality Disturbances using Wavelets

    Directory of Open Access Journals (Sweden)

    C. Sharmeela

    2006-01-01

    Full Text Available This study presents a novel method to detect and classify power quality disturbances using wavelets. The proposed algorithm uses different wavelets each for a particular class of disturbance. The method uses wavelet filter banks in an effective way and does multiple filtering to detect the disturbances. A qualitative comparison of results shows the advantages and drawbacks of each wavelet when applied to the detection of the disturbances. This method is tested for a large class of test conditions simulated in MATLAB. Power quality monitoring together with the ability of the proposed algorithm to classify the disturbances will be a powerful tool for the power system engineers.

  13. Algorithm for the classification of multi-modulating signals on the electrocardiogram.

    Science.gov (United States)

    Mita, Mitsuo

    2007-03-01

    This article discusses the algorithm to measure electrocardiogram (ECG) and respiration simultaneously and to have the diagnostic potentiality for sleep apnoea from ECG recordings. The algorithm is composed by the combination with the three particular scale transform of a(j)(t), u(j)(t), o(j)(a(j)) and the statistical Fourier transform (SFT). Time and magnitude scale transforms of a(j)(t), u(j)(t) change the source into the periodic signal and tau(j) = o(j)(a(j)) confines its harmonics into a few instantaneous components at tau(j) being a common instant on two scales between t and tau(j). As a result, the multi-modulating source is decomposed by the SFT and is reconstructed into ECG, respiration and the other signals by inverse transform. The algorithm is expected to get the partial ventilation and the heart rate variability from scale transforms among a(j)(t), a(j+1)(t) and u(j+1)(t) joining with each modulation. The algorithm has a high potentiality of the clinical checkup for the diagnosis of sleep apnoea from ECG recordings.

  14. A comparison of two open source LiDAR surface classification algorithms

    Science.gov (United States)

    With the progression of LiDAR (Light Detection and Ranging) towards a mainstream resource management tool, it has become necessary to understand how best to process and analyze the data. While most ground surface identification algorithms remain proprietary and have high purchase costs; a few are op...

  15. An Index-Inspired Algorithm for Anytime Classification on Evolving Data Streams

    DEFF Research Database (Denmark)

    Kranen, Phillip; Assent, Ira; Seidl, Thomas

    2012-01-01

    Due to the ever growing presence of data streams there has been a considerable amount of research on stream data mining over the past years. Anytime algorithms are particularly well suited for stream mining, since they flexibly use all available time on streams of varying data rates, and are also...

  16. Evaluation of feature selection algorithms for classification in temporal lobe epilepsy based on MR images

    Science.gov (United States)

    Lai, Chunren; Guo, Shengwen; Cheng, Lina; Wang, Wensheng; Wu, Kai

    2017-02-01

    It's very important to differentiate the temporal lobe epilepsy (TLE) patients from healthy people and localize the abnormal brain regions of the TLE patients. The cortical features and changes can reveal the unique anatomical patterns of brain regions from the structural MR images. In this study, structural MR images from 28 normal controls (NC), 18 left TLE (LTLE), and 21 right TLE (RTLE) were acquired, and four types of cortical feature, namely cortical thickness (CTh), cortical surface area (CSA), gray matter volume (GMV), and mean curvature (MCu), were explored for discriminative analysis. Three feature selection methods, the independent sample t-test filtering, the sparse-constrained dimensionality reduction model (SCDRM), and the support vector machine-recursive feature elimination (SVM-RFE), were investigated to extract dominant regions with significant differences among the compared groups for classification using the SVM classifier. The results showed that the SVM-REF achieved the highest performance (most classifications with more than 92% accuracy), followed by the SCDRM, and the t-test. Especially, the surface area and gray volume matter exhibited prominent discriminative ability, and the performance of the SVM was improved significantly when the four cortical features were combined. Additionally, the dominant regions with higher classification weights were mainly located in temporal and frontal lobe, including the inferior temporal, entorhinal cortex, fusiform, parahippocampal cortex, middle frontal and frontal pole. It was demonstrated that the cortical features provided effective information to determine the abnormal anatomical pattern and the proposed method has the potential to improve the clinical diagnosis of the TLE.

  17. An application of the Self Organizing Map Algorithm to computer aided classification of ASTER multispectral data

    Directory of Open Access Journals (Sweden)

    Ferdinando Giacco

    2008-01-01

    Full Text Available In this paper we employ the Kohonen’s Self Organizing Map (SOM as a strategy for an unsupervised analysis of ASTER multispectral (MS images. In order to obtain an accurate clusterization we introduce as input for the network, in addition to spectral data, some texture measures extracted from IKONOS images, which gives a contribution to the classification of manmade structures. After clustering of SOM outcomes, we associated each cluster with a major land cover and compared them with prior knowledge of the scene analyzed.

  18. Sparsity-Based Representation for Classification Algorithms and Comparison Results for Transient Acoustic Signals

    Science.gov (United States)

    2016-05-01

    the source domain. The benefits of solving the transfer learning problem (Eq. 30) are 2-fold. First, it efficiently ex- ploits information on the...feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems; 2009. p...collection of  information  is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources

  19. A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification

    Science.gov (United States)

    2016-07-01

    Introduction Classification is an important problem in machine learning and computer vision . The goal is to partition a data set efficiently and...characteristic depend- ing on the problem. The R(u) term in some sense smoothes the data, and the fidelity term allows one to incorporate any known...metric computing the distance, in some sense , between x and y, and σ is a parameter to be chosen. Also, √ τ(x) =d(x,z) and z is the M th closest vertex to

  20. PERFORMANCE EVALUATION OF THE DATA MINING CLASSIFICATION METHODS

    Directory of Open Access Journals (Sweden)

    CRISTINA OPREA

    2014-05-01

    Full Text Available The paper aims to analyze how the performance evaluation of different classification models from data mining process. Classification is the most widely used data mining technique of supervised learning. This is the process of identifying a set of features and templates that describe the data classes or concepts. We applied various classification algorithms on different data sets to streamline and improve the algorithm performance.

  1. A Novel Dispersion Degree and EBFNN-based Fingerprint Classification Algorithm%基于离散度和EBFNN的指纹分类方法

    Institute of Scientific and Technical Information of China (English)

    罗菁; 林树忠; 倪建云; 宋丽梅

    2009-01-01

    Aiming at shift and rotation in fingerprint images, a novel dispersion degree and Ellipsoidal Basis Function Neural Network (EBFNN)-based fingerprint classification algorithm was proposed in this paper. Firstly, feature space was obtained through wavelet transform on fingerprint image. Then, the optimal feature combinations of different dimension were acquired by searching features in the feature space. And the feature vector was determined by studying the changes of divergence degree of those optimal feature combinations along with the dimensions. Finally, EBFNN was trained by the feature vector and fingerprint classification was accomplished. The experimental results on FVC2000 and FVC2002-DB1 show that the average classification accuracy is 91.45% if the number of the hidden neurons is 11. Moreover, the proposed algorithm is robust to shift and rotation in fingerprint images, thus it has some values in practice.%针对指纹图像中的较大平移和旋转,提出了一种基于离散度和EBFNN的指纹分类方法.首先,对指纹图像进行离散小波变换获得特征空间.然后,对特征空间进行搜索得到不同维数下的优化特征组合,通过研究这些优化特征组合的散度值随维数的变化趋势,最终确定特征向量的构成.最后,以此特征向量训练EBFNN,完成指纹纹型分类,并在FVC2000和FVC2002-DB1上作了测试.实验结果表明,当隐层节点为11时 ,总的纹型辨识正确率可达91.45%,而且对指纹图像中的平移和旋转具有良好的鲁棒性,具有一定的实用价值.

  2. The McCollough Facial Rejuvenation System: a condition-specific classification algorithm.

    Science.gov (United States)

    McCollough, E Gaylon

    2011-02-01

    The search for the holy grail in facial rejuvenation is an ongoing quest. Perhaps the reason the "ideal" face-lift has yet to be discovered is a result of three factors. First, the term FACE-LIFT has never been adequately defined. Second, fads and trends play a role in how the operation is taught and performed. Third, surgeons searching for the prototypic technique have not had a way to index the physical signs of facial aging. After 37 years of practicing facial plastic surgery and performing more than 5000 face-lifts, the author determined that replacing chaos with order is long overdue. To achieve this goal, he developed a classification system that is designed to match each potential patient's problems with the most appropriate facial rejuvenation treatment plan and a "language" by which facial rejuvenation surgeons can communicate. Five progressive stages of aging have been identified and matched with recommended courses of face-lifting, blepharoplasty, volume augmentation, and skin resurfacing techniques. Ancillary procedures have also been included when indicated. It is the author's hope that a new classification system will bring order to mounting confusion within the aesthetic surgery professions as well as within the public sector.

  3. Real Time Classification and Clustering Of IDS Alerts Using Machine Learning Algorithms

    Directory of Open Access Journals (Sweden)

    T. Subbulakshmi

    2010-01-01

    Full Text Available Intrusion Detection Systems (IDS monitor a secured network for the evidence of maliciousactivities originating either inside or outside. Upon identifying a suspicious traffic, IDSgenerates and logs an alert. Unfortunately, most of the alerts generated are either false positive,i.e. benign traffic that has been classified as intrusions, or irrelevant, i.e. attacks that are notsuccessful. The abundance of false positive alerts makes it difficult for the security analyst tofind successful attacks and take remedial action. This paper describes a two phase automaticalert classification system to assist the human analyst in identifying the false positives. In thefirst phase, the alerts collected from one or more sensors are normalized and similar alerts aregrouped to form a meta-alert. These meta-alerts are passively verified with an asset database tofind out irrelevant alerts. In addition, an optional alert generalization is also performed for rootcause analysis and thereby reduces false positives with human interaction. In the second phase,the reduced alerts are labeled and passed to an alert classifier which uses machine learningtechniques for building the classification rules. This helps the analyst in automatic classificationof the alerts. The system is tested in real environments and found to be effective in reducing thenumber of alerts as well as false positives dramatically, and thereby reducing the workload ofhuman analyst.

  4. Decision making in double-pedicled DIEP and SIEA abdominal free flap breast reconstructions: An algorithmic approach and comprehensive classification.

    Directory of Open Access Journals (Sweden)

    Charles M Malata

    2015-10-01

    Full Text Available Introduction: The deep inferior epigastric artery perforator (DIEP free flap is the gold standard for autologous breast reconstruction. However, using a single vascular pedicle may not yield sufficient tissue in patients with midline scars or insufficient lower abdominal pannus. Double-pedicled free flaps overcome this problem using different vascular arrangements to harvest the entire lower abdominal flap. The literature is, however, sparse regarding technique selection. We therefore reviewed our experience in order to formulate an algorithm and comprehensive classification for this purpose. Methods: All patients undergoing unilateral double-pedicled abdominal perforator free flap breast reconstruction (AFFBR by a single surgeon (CMM over 40 months were reviewed from a prospectively collected database. Results: Of the 112 consecutive breast free flaps performed, 25 (22% utilised two vascular pedicles. The mean patient age was 45 years (range=27-54. All flaps but one (which used the thoracodorsal system were anastomosed to the internal mammary vessels using the rib-preservation technique. The surgical duration was 656 minutes (range=468-690 mins. The median flap weight was 618g (range=432-1275g and the mastectomy weight was 445g (range=220-896g. All flaps were successful and only three patients requested minor liposuction to reduce and reshape their reconstructed breasts.Conclusion: Bipedicled free abdominal perforator flaps, employed in a fifth of all our AFFBRs, are a reliable and safe option for unilateral breast reconstruction. They, however, necessitate clear indications to justify the additional technical complexity and surgical duration. Our algorithm and comprehensive classification facilitate technique selection for the anastomotic permutations and successful execution of these operations.

  5. A review on speech enhancement algorithms and why to combine with environment classification

    Science.gov (United States)

    Nidhyananthan, S. Selva; Kumari, R. Shantha Selva; Prakash, A. Arun

    2014-04-01

    Speech enhancement has been an intensive research for several decades to enhance the noisy speech that is corrupted by additive noise, multiplicative noise or convolutional noise. Even after decades of research it is still the most challenging problem, because most papers rely on estimating the noise during the nonspeech activity assuming that the background noise is uncorrelated (statistically independent of speech signal), nonstationary and slowly varying, so that the noise characteristics estimated in the absence of speech can be used subsequently in the presence of speech, whereas in a real time environment such assumptions do not hold for all the time. In this paper, we discuss the historical development of approaches that starts from the year 1970 to, the recent, 2013 for enhancing the noisy speech corrupted by additive background noise. Seeing the history, there are algorithms that enhance the noisy speech very well as long as a specific application is concerned such as the In-car noisy environments. It has to be observed that a speech enhancement algorithm performs well with a good estimation of the noise Power Spectral Density (PSD) from the noisy speech. Our idea pops up based on this observation, for online speech enhancement (i.e. in a real time environment) such as mobile phone applications, instead of estimating the noise from the noisy speech alone, the system should be able to monitor an environment continuously and classify it. Based on the current environment of the user, the system should adapt the algorithm (i.e. enhancement or estimation algorithm) for the current environment to enhance the noisy speech.

  6. A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA).

    Science.gov (United States)

    Hamada, Michiaki; Asai, Kiyoshi

    2012-05-01

    Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA.

  7. Application of Fisher fusion techniques to improve the individual performance of sonar computer-aided detection/computer-aided classification (CAD/CAC) algorithms

    Science.gov (United States)

    Ciany, Charles M.; Zurawski, William C.

    2009-05-01

    Raytheon has extensively processed high-resolution sidescan sonar images with its CAD/CAC algorithms to provide classification of targets in a variety of shallow underwater environments. The Raytheon CAD/CAC algorithm is based on non-linear image segmentation into highlight, shadow, and background regions, followed by extraction, association, and scoring of features from candidate highlight and shadow regions of interest (ROIs). The targets are classified by thresholding an overall classification score, which is formed by summing the individual feature scores. The algorithm performance is measured in terms of probability of correct classification as a function of false alarm rate, and is determined by both the choice of classification features and the manner in which the classifier rates and combines these features to form its overall score. In general, the algorithm performs very reliably against targets that exhibit "strong" highlight and shadow regions in the sonar image- i.e., both the highlight echo and its associated shadow region from the target are distinct relative to the ambient background. However, many real-world undersea environments can produce sonar images in which a significant percentage of the targets exhibit either "weak" highlight or shadow regions in the sonar image. The challenge of achieving robust performance in these environments has traditionally been addressed by modifying the individual feature scoring algorithms to optimize the separation between the corresponding highlight or shadow feature scores of targets and non-targets. This study examines an alternate approach that employs principles of Fisher fusion to determine a set of optimal weighting coefficients that are applied to the individual feature scores before summing to form the overall classification score. The results demonstrate improved performance of the CAD/CAC algorithm on at-sea data sets.

  8. Research on remote sensing image segmentation based on ant colony algorithm: take the land cover classification of middle Qinling Mountains for example

    Science.gov (United States)

    Mei, Xin; Wang, Qian; Wang, Quanfang; Lin, Wenfang

    2009-10-01

    Remote sensing image based on the complexity of the background features, has a wealth of spatial information, how to extract huge amounts of data in the region of interest is a serious problem. Image segmentation refers to certain provisions in accordance with the characteristics of the image into different regions, and it is the key of remote sensing image recognition and information extraction. Reasonably fast image segmentation algorithm is the base of image processing; traditional segmentation methods have a lot of the limitations. Traditional threshold segmentation method in essence is an ergodic process, the low efficiency impacts on its application. The ant colony algorithm is a populationbased evolutionary algorithm heuristic biomimetic, since proposed, it has been successfully applied to the TSP, job-shop scheduling problem, network routing problem, vehicle routing problem, as well as other cluster analysis. Ant colony optimization algorithm is a fast heuristic optimization algorithm, easily integrates with other methods, and it is robust. Improved ant colony algorithm can greatly enhance the speed of image segmentation, while reducing the noise on the image. The research background of this paper is land cover classification experiments according to the SPOT images of Qinling area. The image segmentation based on ant colony algorithm is carried out and compared with traditional methods. Experimental results show that improved the ant colony algorithm can quickly and accurately segment target, and it is an effective method of image segmentation, it also has laid a good foundation of image classification for the follow-up work.

  9. Classification Features of US Images Liver Extracted with Co-occurrence Matrix Using the Nearest Neighbor Algorithm

    Science.gov (United States)

    Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen

    2011-12-01

    Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.

  10. Analysis and Classification of Stride Patterns Associated with Children Development Using Gait Signal Dynamics Parameters and Ensemble Learning Algorithms

    Directory of Open Access Journals (Sweden)

    Meihong Wu

    2016-01-01

    Full Text Available Measuring stride variability and dynamics in children is useful for the quantitative study of gait maturation and neuromotor development in childhood and adolescence. In this paper, we computed the sample entropy (SampEn and average stride interval (ASI parameters to quantify the stride series of 50 gender-matched children participants in three age groups. We also normalized the SampEn and ASI values by leg length and body mass for each participant, respectively. Results show that the original and normalized SampEn values consistently decrease over the significance level of the Mann-Whitney U test (p<0.01 in children of 3–14 years old, which indicates the stride irregularity has been significantly ameliorated with the body growth. The original and normalized ASI values are also significantly changing when comparing between any two groups of young (aged 3–5 years, middle (aged 6–8 years, and elder (aged 10–14 years children. Such results suggest that healthy children may better modulate their gait cadence rhythm with the development of their musculoskeletal and neurological systems. In addition, the AdaBoost.M2 and Bagging algorithms were used to effectively distinguish the children’s gait patterns. These ensemble learning algorithms both provided excellent gait classification results in terms of overall accuracy (≥90%, recall (≥0.8, and precision (≥0.8077.

  11. Dual-polarization C-band weather radar algorithms for rain rate estimation and hydrometeor classification in an alpine region

    Directory of Open Access Journals (Sweden)

    H. Paulitsch

    2009-03-01

    Full Text Available Dual polarization is becoming the standard for new weather radar systems. In contrast to conventional weather radars, where the reflectivity is measured in one polarization plane only, a dual polarization radar provides transmission in either horizontal, vertical, or both polarizations while receiving both the horizontal and vertical channels simultaneously. Since hydrometeors are often far from being spherical, the backscatter and propagation are different for horizontal and vertical polarization. Comparing the reflected horizontal and vertical power returns and their ratio and correlation, information on size, shape, and material density of cloud and precipitation particles can be obtained. The use of polarimetric radar variables can therefore increase the accuracy of the rain rate estimation compared to standard Z-R relationships of non-polarimetric radars. It is also possible to derive the type of precipitation from dual polarization parameters, although this is not an easy task, since there is no clear discrimination between the different values. Fuzzy logic approaches have been shown to work well with overlapping conditions and imprecisely defined class output.

    In this paper the implementation of different polarization algorithms for the new Austrian weather radar on Mt. Valluga is described, and first results from operational use are presented. This study also presents first observations of rain events in August 2007 during the test run of the radar. Further, the designated rain rate estimation and hydrometeor classification algorithms are explained.

  12. Seasonal Separation of African Savanna Components Using Worldview-2 Imagery: A Comparison of Pixel- and Object-Based Approaches and Selected Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Żaneta Kaszta

    2016-09-01

    Full Text Available Separation of savanna land cover components is challenging due to the high heterogeneity of this landscape and spectral similarity of compositionally different vegetation types. In this study, we tested the usability of very high spatial and spectral resolution WorldView-2 (WV-2 imagery to classify land cover components of African savanna in wet and dry season. We compared the performance of Object-Based Image Analysis (OBIA and pixel-based approach with several algorithms: k-nearest neighbor (k-NN, maximum likelihood (ML, random forests (RF, classification and regression trees (CART and support vector machines (SVM. Results showed that classifications of WV-2 imagery produce high accuracy results (>77% regardless of the applied classification approach. However, OBIA had a significantly higher accuracy for almost every classifier with the highest overall accuracy score of 93%. Amongst tested classifiers, SVM and RF provided highest accuracies. Overall classifications of the wet season image provided better results with 93% for RF. However, considering woody leaf-off conditions, the dry season classification also performed well with overall accuracy of 83% (SVM and high producer accuracy for the tree cover (91%. Our findings demonstrate the potential of imagery like WorldView-2 with OBIA and advanced supervised machine-learning algorithms in seasonal fine-scale land cover classification of African savanna.

  13. Comparative performance analysis of state-of-the-art classification algorithms applied to lung tissue categorization.

    Science.gov (United States)

    Depeursinge, Adrien; Iavindrasana, Jimison; Hidki, Asmâa; Cohen, Gilles; Geissbuhler, Antoine; Platon, Alexandra; Poletti, Pierre-Alexandre; Müller, Henning

    2010-02-01

    In this paper, we compare five common classifier families in their ability to categorize six lung tissue patterns in high-resolution computed tomography (HRCT) images of patients affected with interstitial lung diseases (ILD) and with healthy tissue. The evaluated classifiers are naive Bayes, k-nearest neighbor, J48 decision trees, multilayer perceptron, and support vector machines (SVM). The dataset used contains 843 regions of interest (ROI) of healthy and five pathologic lung tissue patterns identified by two radiologists at the University Hospitals of Geneva. Correlation of the feature space composed of 39 texture attributes is studied. A grid search for optimal parameters is carried out for each classifier family. Two complementary metrics are used to characterize the performances of classification. These are based on McNemar's statistical tests and global accuracy. SVM reached best values for each metric and allowed a mean correct prediction rate of 88.3% with high class-specific precision on testing sets of 423 ROIs.

  14. Development of visible/infrared/microwave agriculture classification and biomass estimation algorithms

    Science.gov (United States)

    Rosenthal, W. D.; Blanchard, B. J.; Blanchard, A. J.

    1983-01-01

    This paper describes the results of a study to determine if crop acreage and biomass estimates could be improved by using visible IR and microwave data. The objectives were to (1) develop and test agricultural crop classification models using two or more spectral regions (visible through microwave), and (2) estimate biomass by including microwave with visible and infrared data. Aircraft multispectral data collected during the study included visible and infrared data (multiband data from 0.5 m - 12 m), and active microwave data K band (2 cm), C band (6 cm), L band (20 cm), and P band (75 cm) HH and HV polarizations. Ground truth data from each field consisted of soil moisture and biomass measurements. Results indicated that C, L, and P band active microwave data combined with visible and infrared data improved crop discrimination and biomass estimates compared to results using only visible and infrared data. The active microwave frequencies were sensitive to different biomass levels; K and C being sensitive to differences at low biomass levels, while P band was sensitive to differences at high biomass levels.

  15. Classification of EEG-P300 Signals Extracted from Brain Activities in BCI Systems Using ν-SVM and BLDA Algorithms

    Directory of Open Access Journals (Sweden)

    Ali MOMENNEZHAD

    2014-06-01

    Full Text Available In this paper, a linear predictive coding (LPC model is used to improve classification accuracy, convergent speed to maximum accuracy, and maximum bitrates in brain computer interface (BCI system based on extracting EEG-P300 signals. First, EEG signal is filtered in order to eliminate high frequency noise. Then, the parameters of filtered EEG signal are extracted using LPC model. Finally, the samples are reconstructed by LPC coefficients and two classifiers, a Bayesian Linear discriminant analysis (BLDA, and b the υ-support vector machine (υ-SVM are applied in order to classify. The proposed algorithm performance is compared with fisher linear discriminant analysis (FLDA. Results show that the efficiency of our algorithm in improving classification accuracy and convergent speed to maximum accuracy are much better. As example at the proposed algorithms, respectively BLDA with LPC model and υ-SVM with LPC model with8 electrode configuration for subject S1 the total classification accuracy is improved as 9.4% and 1.7%. And also, subject 7 at BLDA and υ-SVM with LPC model algorithms (LPC+BLDA and LPC+ υ-SVM after block 11th converged to maximum accuracy but Fisher Linear Discriminant Analysis (FLDA algorithm did not converge to maximum accuracy (with the same configuration. So, it can be used as a promising tool in designing BCI systems.

  16. CANDIDATE TREE-IN-BUD PATTERN SELECTION AND CLASSIFICATION USING BALL SCALE ENCODING ALGORITHM

    Directory of Open Access Journals (Sweden)

    T. Akilandeswari

    2013-10-01

    Full Text Available Asthma, Chronic obstructive pulmonary disease, influenza, pneumonia, tuberculosis, lung cancer and many other breathing problems are the leading causes of death and disability all over the world. These diseases affect the lung. Radiology is a primary assessing method with low specificity of the prediction of the presence of these diseases. Computer Assisted Detection (CAD will help the specialists in detecting one of these diseases in an early stage. A method has been proposed by Ulas Bagci to detect lung abnormalities using Fuzzy connected object estimation, Ball scale encoding and comparing various features extracted from local patches of the lung images (CT scan. In this paper, the Tree-in-Bud patterns are selected after segmentation by using ball scale encoding algorithm.

  17. Application of classification algorithms for analysis of road safety risk factor dependencies.

    Science.gov (United States)

    Kwon, Oh Hoon; Rhee, Wonjong; Yoon, Yoonjin

    2015-02-01

    Transportation continues to be an integral part of modern life, and the importance of road traffic safety cannot be overstated. Consequently, recent road traffic safety studies have focused on analysis of risk factors that impact fatality and injury level (severity) of traffic accidents. While some of the risk factors, such as drug use and drinking, are widely known to affect severity, an accurate modeling of their influences is still an open research topic. Furthermore, there are innumerable risk factors that are waiting to be discovered or analyzed. A promising approach is to investigate historical traffic accident data that have been collected in the past decades. This study inspects traffic accident reports that have been accumulated by the California Highway Patrol (CHP) since 1973 for which each accident report contains around 100 data fields. Among them, we investigate 25 fields between 2004 and 2010 that are most relevant to car accidents. Using two classification methods, the Naive Bayes classifier and the decision tree classifier, the relative importance of the data fields, i.e., risk factors, is revealed with respect to the resulting severity level. Performances of the classifiers are compared to each other and a binary logistic regression model is used as the basis for the comparisons. Some of the high-ranking risk factors are found to be strongly dependent on each other, and their incremental gains on estimating or modeling severity level are evaluated quantitatively. The analysis shows that only a handful of the risk factors in the data dominate the severity level and that dependency among the top risk factors is an imperative trait to consider for an accurate analysis.

  18. GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design.

    Science.gov (United States)

    Pérez-Castillo, Yunierkis; Lazar, Cosmin; Taminau, Jonatan; Froeyen, Mathy; Cabrera-Pérez, Miguel Ángel; Nowé, Ann

    2012-09-24

    Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.

  19. 改进的HyperSplit报文分类算法%Improved HyperSplit Packet Classification Algorithm

    Institute of Scientific and Technical Information of China (English)

    马腾; 陈庶樵; 张校辉

    2014-01-01

    In order to solve the problem of too much memory usage in existing work for high speed large volume multi-field packet classification, an improved HyperSplit algorithm is proposed. By analyzing the cause of too much memory usage, the heuristic algorithms are modified and designed to choose the cutting points and dimensions and eliminate redundancy. Rule replication is greatly reduced, redundant rules and nodes are removed, and the decision tree’s structure is optimized. Simulation results demonstrate that compared with the existing work, independent of rule base’s type and characteristic, the algorithm can greatly reduce memory usage without increasing the number of memory accesses and ensure that packets can be processed at wire speed, and when the volume of classifier is 105, the algorithm consumes about 80%memory usage as that of HyperSplit.%针对现有高速、大容量、多域报文分类算法普遍存在内存使用量大的问题,提出一种改进的 HyperSplit 多域报文分类算法。通过分析现有算法内存使用量大的原因,修正和设计选择分割维度与分割点、去除冗余结构的启发式算法,最大限度减少决策树中的复制规则数量,消除决策树中存在的冗余规则和冗余节点,优化决策树结构。仿真结果表明,该算法与现有多域报文分类算法陒比,不依赖于规则集类型和特征,在保证内存访问次数不增加、报文得到陑速处理的情况下,可降低算法的内存使用量,当规则集容量为105时,内存使用量降低到HyperSplit算法的80%。

  20. Effectiveness of Partition and Graph Theoretic Clustering Algorithms for Multiple Source Partial Discharge Pattern Classification Using Probabilistic Neural Network and Its Adaptive Version: A Critique Based on Experimental Studies

    Directory of Open Access Journals (Sweden)

    S. Venkatesh

    2012-01-01

    Full Text Available Partial discharge (PD is a major cause of failure of power apparatus and hence its measurement and analysis have emerged as a vital field in assessing the condition of the insulation system. Several efforts have been undertaken by researchers to classify PD pulses utilizing artificial intelligence techniques. Recently, the focus has shifted to the identification of multiple sources of PD since it is often encountered in real-time measurements. Studies have indicated that classification of multi-source PD becomes difficult with the degree of overlap and that several techniques such as mixed Weibull functions, neural networks, and wavelet transformation have been attempted with limited success. Since digital PD acquisition systems record data for a substantial period, the database becomes large, posing considerable difficulties during classification. This research work aims firstly at analyzing aspects concerning classification capability during the discrimination of multisource PD patterns. Secondly, it attempts at extending the previous work of the authors in utilizing the novel approach of probabilistic neural network versions for classifying moderate sets of PD sources to that of large sets. The third focus is on comparing the ability of partition-based algorithms, namely, the labelled (learning vector quantization and unlabelled (K-means versions, with that of a novel hypergraph-based clustering method in providing parsimonious sets of centers during classification.

  1. Precision disablement aiming system

    Energy Technology Data Exchange (ETDEWEB)

    Monda, Mark J.; Hobart, Clinton G.; Gladwell, Thomas Scott

    2016-02-16

    A disrupter to a target may be precisely aimed by positioning a radiation source to direct radiation towards the target, and a detector is positioned to detect radiation that passes through the target. An aiming device is positioned between the radiation source and the target, wherein a mechanical feature of the aiming device is superimposed on the target in a captured radiographic image. The location of the aiming device in the radiographic image is used to aim a disrupter towards the target.

  2. A New Nearest Neighbor Classification Algorithm Based on Local Probability Centers

    Directory of Open Access Journals (Sweden)

    I-Jing Li

    2014-01-01

    Full Text Available The nearest neighbor is one of the most popular classifiers, and it has been successfully used in pattern recognition and machine learning. One drawback of kNN is that it performs poorly when class distributions are overlapping. Recently, local probability center (LPC algorithm is proposed to solve this problem; its main idea is giving weight to samples according to their posterior probability. However, LPC performs poorly when the value of k is very small and the higher-dimensional datasets are used. To deal with this problem, this paper suggests that the gradient of the posterior probability function can be estimated under sufficient assumption. The theoretic property is beneficial to faithfully calculate the inner product of two vectors. To increase the performance in high-dimensional datasets, the multidimensional Parzen window and Euler-Richardson method are utilized, and a new classifier based on local probability centers is developed in this paper. Experimental results show that the proposed method yields stable performance with a wide range of k for usage, robust performance to overlapping issue, and good performance to dimensionality. The proposed theorem can be applied to mathematical problems and other applications. Furthermore, the proposed method is an attractive classifier because of its simplicity.

  3. Pattern recognition in lithology classification: modeling using neural networks, self-organizing maps and genetic algorithms

    Science.gov (United States)

    Sahoo, Sasmita; Jha, Madan K.

    2017-03-01

    Effective characterization of lithology is vital for the conceptualization of complex aquifer systems, which is a prerequisite for the development of reliable groundwater-flow and contaminant-transport models. However, such information is often limited for most groundwater basins. This study explores the usefulness and potential of a hybrid soft-computing framework; a traditional artificial neural network with gradient descent-momentum training (ANN-GDM) and a traditional genetic algorithm (GA) based ANN (ANN-GA) approach were developed and compared with a novel hybrid self-organizing map (SOM) based ANN (SOM-ANN-GA) method for the prediction of lithology at a basin scale. This framework is demonstrated through a case study involving a complex multi-layered aquifer system in India, where well-log sites were clustered on the basis of sand-layer frequencies; within each cluster, subsurface layers were reclassified into four depth classes based on the maximum drilling depth. ANN models for each depth class were developed using each of the three approaches. Of the three, the hybrid SOM-ANN-GA models were able to recognize incomplete geologic pattern more reasonably, followed by ANN-GA and ANN-GDM models. It is concluded that the hybrid soft-computing framework can serve as a promising tool for characterizing lithology in groundwater basins with missing lithologic patterns.

  4. Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study

    Directory of Open Access Journals (Sweden)

    Kanthida Kusonmano

    2012-01-01

    Full Text Available A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs, random forest (RF, k-nearest neighbors (k-NN, penalized logistic regression (PLR, and prediction analysis for microarrays (PAMs. We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints.

  5. Aim to unify the narrow band imaging (NBI) magnifying classification for colorectal tumors: current status in Japan from a summary of the consensus symposium in the 79th Annual Meeting of the Japan Gastroenterological Endoscopy Society.

    Science.gov (United States)

    Tanaka, Shinji; Sano, Yasushi

    2011-05-01

    At present, there are many narrow band imaging (NBI) magnifying observation classifications for colorectal tumor in Japan. To internationally standardize the NBI observation criteria, a simple classification system is required. When a colorectal tumor is closely observed using the recent high-resolution videocolonoscope, a pit-like pattern on the tumor can be observed to a certain degree without magnification. In the symposium we could have a consensus that we will name the pit-like pattern as 'surface pattern.' Using the NBI system, the microvessels on the tumor surface can also be recognized to a certain degree. When the NBI system is used, the structure is emphasized, and consequently, the surface pattern can be recognized easily. Recently, an international cooperative group was formed and consists of members from Japan, the USA and Europe, which is named as the Colon Tumor NBI Interest Group. This group has developed a simple category classification (NBI international colorectal endoscopic [NICE] classification), which classifies colorectal tumors into types 1-3 even by closely observing colorectal tumors using a high-resolution videocolonoscope (Validation study is now ongoing by Colon Tumor NBI Interest Group.). The key advantage of this is simplification of the NBI classification. Although the magnifying observation is the best for getting detailed NBI findings, both close observation and magnifying observation using the NICE classification might give almost similar results. Of course the NICE classification can be used more precisely with magnification. In this report we also refer the issues on NBI magnification, which should be solved as early as possible.

  6. Web Document Classification Algorithm Based on Manifold Learning and SVM%基于流形学习和SVM的Web文档分类算法

    Institute of Scientific and Technical Information of China (English)

    王自强; 钱旭

    2009-01-01

    为解决Web文档分类问题,提出一种基于流形学习和SVM的Web文档分类算法.该算法利用流形学习算法LPP对训练集中的高维Web文档空间进行非线性降维,从中找出隐藏在高维观测数据中有意义的低维结构,在降维后的低维特征空间中利用乘性更新规则的优化SVM进行分类预测.实验结果表明该算法以较少的运行时间获得更高的分类准确率.%To efficiently resolve Web document classification problem, a novel Web document classification algorithm based on manifold learning and Support Vector Machine(SVM) is proposed. The high dimensional Web document space in the training sets are non-linearly reduced to lower dimensional space with manifold learning algorithm LPP, and the hidden interesting lower dimensional structure can be discovered from the high dimensional observisional data. The classification and predication in the lower dimensional feature space are implemented with the multiplicative update-based optimal SVM. Experimental results show that the algorithm achieves higher classification accuracy with less running time.

  7. Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the 'Extreme Learning Machine' Algorithm.

    Science.gov (United States)

    McDonnell, Mark D; Tissera, Migel D; Vladusich, Tony; van Schaik, André; Tapson, Jonathan

    2015-01-01

    Recent advances in training deep (multi-layer) architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates below 1% on the MNIST handwritten digit benchmark can be replicated with shallow non-convolutional neural networks. This is achieved by training such networks using the 'Extreme Learning Machine' (ELM) approach, which also enables a very rapid training time (∼ 10 minutes). Adding distortions, as is common practise for MNIST, reduces error rates even further. Our methods are also shown to be capable of achieving less than 5.5% error rates on the NORB image database. To achieve these results, we introduce several enhancements to the standard ELM algorithm, which individually and in combination can significantly improve performance. The main innovation is to ensure each hidden-unit operates only on a randomly sized and positioned patch of each image. This form of random 'receptive field' sampling of the input ensures the input weight matrix is sparse, with about 90% of weights equal to zero. Furthermore, combining our methods with a small number of iterations of a single-batch backpropagation method can significantly reduce the number of hidden-units required to achieve a particular performance. Our close to state-of-the-art results for MNIST and NORB suggest that the ease of use and accuracy of the ELM algorithm for designing a single-hidden-layer neural network classifier should cause it to be given greater consideration either as a standalone method for simpler problems, or as the final classification stage in deep neural networks applied to more difficult problems.

  8. Fast, Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the 'Extreme Learning Machine' Algorithm.

    Directory of Open Access Journals (Sweden)

    Mark D McDonnell

    Full Text Available Recent advances in training deep (multi-layer architectures have inspired a renaissance in neural network use. For example, deep convolutional networks are becoming the default option for difficult tasks on large datasets, such as image and speech recognition. However, here we show that error rates below 1% on the MNIST handwritten digit benchmark can be replicated with shallow non-convolutional neural networks. This is achieved by training such networks using the 'Extreme Learning Machine' (ELM approach, which also enables a very rapid training time (∼ 10 minutes. Adding distortions, as is common practise for MNIST, reduces error rates even further. Our methods are also shown to be capable of achieving less than 5.5% error rates on the NORB image database. To achieve these results, we introduce several enhancements to the standard ELM algorithm, which individually and in combination can significantly improve performance. The main innovation is to ensure each hidden-unit operates only on a randomly sized and positioned patch of each image. This form of random 'receptive field' sampling of the input ensures the input weight matrix is sparse, with about 90% of weights equal to zero. Furthermore, combining our methods with a small number of iterations of a single-batch backpropagation method can significantly reduce the number of hidden-units required to achieve a particular performance. Our close to state-of-the-art results for MNIST and NORB suggest that the ease of use and accuracy of the ELM algorithm for designing a single-hidden-layer neural network classifier should cause it to be given greater consideration either as a standalone method for simpler problems, or as the final classification stage in deep neural networks applied to more difficult problems.

  9. Love thy neighbour: automatic animal behavioural classification of acceleration data using the K-nearest neighbour algorithm.

    Directory of Open Access Journals (Sweden)

    Owen R Bidder

    Full Text Available Researchers hoping to elucidate the behaviour of species that aren't readily observed are able to do so using biotelemetry methods. Accelerometers in particular are proving particularly effective and have been used on terrestrial, aquatic and volant species with success. In the past, behavioural modes were detected in accelerometer data through manual inspection, but with developments in technology, modern accelerometers now record at frequencies that make this impractical. In light of this, some researchers have suggested the use of various machine learning approaches as a means to classify accelerometer data automatically. We feel uptake of this approach by the scientific community is inhibited for two reasons; 1 Most machine learning algorithms require selection of summary statistics which obscure the decision mechanisms by which classifications are arrived, and 2 they are difficult to implement without appreciable computational skill. We present a method which allows researchers to classify accelerometer data into behavioural classes automatically using a primitive machine learning algorithm, k-nearest neighbour (KNN. Raw acceleration data may be used in KNN without selection of summary statistics, and it is easily implemented using the freeware program R. The method is evaluated by detecting 5 behavioural modes in 8 species, with examples of quadrupedal, bipedal and volant species. Accuracy and Precision were found to be comparable with other, more complex methods. In order to assist in the application of this method, the script required to run KNN analysis in R is provided. We envisage that the KNN method may be coupled with methods for investigating animal position, such as GPS telemetry or dead-reckoning, in order to implement an integrated approach to movement ecology research.

  10. Method of network traffic classification using improved LM algorithm%采用改进LM算法的网络流量分类方法

    Institute of Scientific and Technical Information of China (English)

    胡婷; 王勇; 陶晓玲

    2011-01-01

    In order to solve the problems in current work that rely on the traditional method of traffic classification,such as low accuracy, limited application region, an effective approach for network traffic classification named GA-LM is proposed. This method employs the classification method based on neural network as the classification model of network traffic,and applies L-M algorithm which is an improved BP algorithms and Genetic Algorithm (GA) that optimizes neural network weights, which will speed up the convergence of the neural network and improve the classification performance.The experimental data sets that are collected from actual networks are conducted experiments.The results show that the convergence speed of the method is faster and GA-LM has better feasibility and high accuracy which can be used effectively to the application of network traffic classification.%针对传统的流量分类方法准确率低、开销大、应用范围受限等问题,提出一种有效的网络流量分类方法(GA-LM).该方法将基于神经网络的分类方法作为网络流量的分类模型,采用L-M算法构造分类器,并用遗传算法优化网络初始连接权值,加速了网络收敛过程,提高了分类性能.通过对收集到的实际网络流量数据进行分类,实验结果表明GA-LM比标准BP算法和L-M算法的收敛速度快,具有较好的可行性和高准确性,从而可有效地用于网络流量分类中.

  11. Impact of image normalization and quantization on the performance of sonar computer-aided detection/computer-aided classification (CAD/CAC) algorithms

    Science.gov (United States)

    Ciany, Charles M.; Zurawski, William C.

    2007-04-01

    Raytheon has extensively processed high-resolution sonar images with its CAD/CAC algorithms to provide real-time classification of mine-like bottom objects in a wide range of shallow-water environments. The algorithm performance is measured in terms of probability of correct classification (Pcc) as a function of false alarm rate, and is impacted by variables associated with both the physics of the problem and the signal processing design choices. Some examples of prominent variables pertaining to the choices of signal processing parameters are image resolution (i.e., pixel dimensions), image normalization scheme, and pixel intensity quantization level (i.e., number of bits used to represent the intensity of each image pixel). Improvements in image resolution associated with the technology transition from sidescan to synthetic aperture sonars have prompted the use of image decimation algorithms to reduce the number of pixels per image that are processed by the CAD/CAC algorithms, in order to meet real-time processor throughput requirements. Additional improvements in digital signal processing hardware have also facilitated the use of an increased quantization level in converting the image data from analog to digital format. This study evaluates modifications to the normalization algorithm and image pixel quantization level within the image processing prior to CAD/CAC processing, and examines their impact on the resulting CAD/CAC algorithm performance. The study utilizes a set of at-sea data from multiple test exercises in varying shallow water environments.

  12. 改进的一对一支持向量机多分类算法%Improved multi-classification algorithm of one-against-one SVM

    Institute of Scientific and Technical Information of China (English)

    单玉刚; 王宏; 董爽

    2012-01-01

    支持向量机的一对一多分类算法具有良好的性能,但该算法在分类时存在不可分区域,影响了该方法的应用.因此,提出一种一对一与基于紧密度判决相结合的多分类方法,使用一对一算法分类,采用基于紧密度决策解决不可分区,依据样本到类中心之间的距离和基于kNN (k nearest neighbor)的样本分布情况结合的方式构建判别函数来确定类别归属.使用UCI (university of California Irvine)数据集做测试,测试结果表明,该算法能有效地解决不可分区域问题,而且表现出比其它算法更好的性能.%Multi-class classification algorithm of one-against-one SVM show good performance, but the algorithm exists an un-classifiable region, which affects the application effect of the algorithm. Hence, a multi-classification algorithm of integration of one-against-one and affinity decision is presented. Firstly, the one-against-one multi-class classification algorithm is used to classify samples, and then the affinity decision is used to solve samples in the unclassifiable region and to determine categories of samples, which using the approach of distance between the sample and centers of classes and sample distribution based on kNN (k nearest neighbor) to create decision function. By adopting UCI data sets for testing, the results show that the algorithm can solve unclassifiable region issues, and show better performance than other algorithms.

  13. Multi-layer Attribute Selection and Classification Algorithm for the Diagnosis of Cardiac Autonomic Neuropathy Based on HRV Attributes

    Directory of Open Access Journals (Sweden)

    Herbert F. Jelinek

    2015-12-01

    Full Text Available Cardiac autonomic neuropathy (CAN poses an important clinical problem, which often remains undetected due difficulty of conducting the current tests and their lack of sensitivity. CAN has been associated with growth in the risk of unexpected death in cardiac patients with diabetes mellitus. Heart rate variability (HRV attributes have been actively investigated, since they are important for diagnostics in diabetes, Parkinson's disease, cardiac and renal disease. Due to the adverse effects of CAN it is important to obtain a robust and highly accurate diagnostic tool for identification of early CAN, when treatment has the best outcome. Use of HRV attributes to enhance the effectiveness of diagnosis of CAN progression may provide such a tool. In the present paper we propose a new machine learning algorithm, the Multi-Layer Attribute Selection and Classification (MLASC, for the diagnosis of CAN progression based on HRV attributes. It incorporates our new automated attribute selection procedure, Double Wrapper Subset Evaluator with Particle Swarm Optimization (DWSE-PSO. We present the results of experiments, which compare MLASC with other simpler versions and counterpart methods. The experiments used our large and well-known diabetes complications database. The results of experiments demonstrate that MLASC has significantly outperformed other simpler techniques.

  14. Emotion Recognition of Weblog Sentences Based on an Ensemble Algorithm of Multi-label Classification and Word Emotions

    Science.gov (United States)

    Li, Ji; Ren, Fuji

    Weblogs have greatly changed the communication ways of mankind. Affective analysis of blog posts is found valuable for many applications such as text-to-speech synthesis or computer-assisted recommendation. Traditional emotion recognition in text based on single-label classification can not satisfy higher requirements of affective computing. In this paper, the automatic identification of sentence emotion in weblogs is modeled as a multi-label text categorization task. Experiments are carried out on 12273 blog sentences from the Chinese emotion corpus Ren_CECps with 8-dimension emotion annotation. An ensemble algorithm RAKEL is used to recognize dominant emotions from the writer's perspective. Our emotion feature using detailed intensity representation for word emotions outperforms the other main features such as the word frequency feature and the traditional lexicon-based feature. In order to deal with relatively complex sentences, we integrate grammatical characteristics of punctuations, disjunctive connectives, modification relations and negation into features. It achieves 13.51% and 12.49% increases for Micro-averaged F1 and Macro-averaged F1 respectively compared to the traditional lexicon-based feature. Result shows that multiple-dimension emotion representation with grammatical features can efficiently classify sentence emotion in a multi-label problem.

  15. The "Life Potential": a new complex algorithm to assess "Heart Rate Variability" from Holter records for cognitive and diagnostic aims. Preliminary experimental results showing its dependence on age, gender and health conditions

    CERN Document Server

    Barra, Orazio A

    2013-01-01

    Although HRV (Heart Rate Variability) analyses have been carried out for several decades, several limiting factors still make these analyses useless from a clinical point of view. The present paper aims at overcoming some of these limits by introducing the "Life Potential" (BMP), a new mathematical algorithm which seems to exhibit surprising cognitive and predictive capabilities. BMP is defined as a linear combination of five HRV Non-Linear Variables, in turn derived from the thermodynamic formalism of chaotic dynamic systems. The paper presents experimental measurements of BMP (Average Values and Standard Deviations) derived from 1048 Holter tests, matched in age and gender, including a control group of 356 healthy subjects. The main results are: (a) BMP always decreases when the age increases, and its dependence on age and gender is well established; (b) the shape of the age dependence within "healthy people" is different from that found in the general group: this behavior provides evidence of possible illn...

  16. Ontology-Based Classification System Development Methodology

    Directory of Open Access Journals (Sweden)

    Grabusts Peter

    2015-12-01

    Full Text Available The aim of the article is to analyse and develop an ontology-based classification system methodology that uses decision tree learning with statement propositionalized attributes. Classical decision tree learning algorithms, as well as decision tree learning with taxonomy and propositionalized attributes have been observed. Thus, domain ontology can be extracted from the data sets and can be used for data classification with the help of a decision tree. The use of ontology methods in decision tree-based classification systems has been researched. Using such methodologies, the classification accuracy in some cases can be improved.

  17. Precision laser aiming system

    Science.gov (United States)

    Ahrens, Brandon R.; Todd, Steven N.

    2009-04-28

    A precision laser aiming system comprises a disrupter tool, a reflector, and a laser fixture. The disrupter tool, the reflector and the laser fixture are configurable for iterative alignment and aiming toward an explosive device threat. The invention enables a disrupter to be quickly and accurately set up, aligned, and aimed in order to render safe or to disrupt a target from a standoff position.

  18. 蚁群算法在数据挖掘分类中的研究%Application Research on the classification of Data Mining Using Ant Colony Algorithm

    Institute of Scientific and Technical Information of China (English)

    熊斌; 熊娟

    2012-01-01

    Classification is an important task in data mining, using ant foraging theory in the database search to introduce the ant algorithm classification in rules discovery,to chose and optimize a group of rules which is produced random, until the database can be covered, thereby dig the implicit rules in the database, set up the optimal classification model.%对蚁群算法杂数据挖掘中的分类任务的应用进行了研究,算法实质上是利用蚁群觅食原理在数据库中进行搜索,对随机产生的一组规则进行选择优化,直到数据库能被该组规则覆盖,从而挖掘出隐含在数据库中的规则。

  19. Absolute calibration of the colour index and O4 absorption derived from Multi AXis (MAX-)DOAS measurements and their application to a standardised cloud classification algorithm

    Science.gov (United States)

    Wagner, Thomas; Beirle, Steffen; Remmers, Julia; Shaiganfar, Reza; Wang, Yang

    2016-09-01

    A method is developed for the calibration of the colour index (CI) and the O4 absorption derived from differential optical absorption spectroscopy (DOAS) measurements of scattered sunlight. The method is based on the comparison of measurements and radiative transfer simulations for well-defined atmospheric conditions and viewing geometries. Calibrated measurements of the CI and the O4 absorption are important for the detection and classification of clouds from MAX-DOAS observations. Such information is needed for the identification and correction of the cloud influence on Multi AXis (MAX-)DOAS profile inversion results, but might be also be of interest on their own, e.g. for meteorological applications. The calibration algorithm was successfully applied to measurements at two locations: Cabauw in the Netherlands and Wuxi in China. We used CI and O4 observations calibrated by the new method as input for our recently developed cloud classification scheme and also adapted the corresponding threshold values accordingly. For the observations at Cabauw, good agreement is found with the results of the original algorithm. Together with the calibration procedure of the CI and O4 absorption, the cloud classification scheme, which has been tuned to specific locations/conditions so far, can now be applied consistently to MAX-DOAS measurements at different locations. In addition to the new threshold values, further improvements were introduced to the cloud classification algorithm, namely a better description of the SZA (solar zenith angle) dependence of the threshold values and a new set of wavelengths for the determination of the CI. We also indicate specific areas for future research to further improve the cloud classification scheme.

  20. Mapping the distributions of C3 and C4 grasses in the mixed-grass prairies of southwest Oklahoma using the Random Forest classification algorithm

    Science.gov (United States)

    Yan, Dong; de Beurs, Kirsten M.

    2016-05-01

    The objective of this paper is to demonstrate a new method to map the distributions of C3 and C4 grasses at 30 m resolution and over a 25-year period of time (1988-2013) by combining the Random Forest (RF) classification algorithm and patch stable areas identified using the spatial pattern analysis software FRAGSTATS. Predictor variables for RF classifications consisted of ten spectral variables, four soil edaphic variables and three topographic variables. We provided a confidence score in terms of obtaining pure land cover at each pixel location by retrieving the classification tree votes. Classification accuracy assessments and predictor variable importance evaluations were conducted based on a repeated stratified sampling approach. Results show that patch stable areas obtained from larger patches are more appropriate to be used as sample data pools to train and validate RF classifiers for historical land cover mapping purposes and it is more reasonable to use patch stable areas as sample pools to map land cover in a year closer to the present rather than years further back in time. The percentage of obtained high confidence prediction pixels across the study area ranges from 71.18% in 1988 to 73.48% in 2013. The repeated stratified sampling approach is necessary in terms of reducing the positive bias in the estimated classification accuracy caused by the possible selections of training and validation pixels from the same patch stable areas. The RF classification algorithm was able to identify the important environmental factors affecting the distributions of C3 and C4 grasses in our study area such as elevation, soil pH, soil organic matter and soil texture.

  1. Webpage Classification Based on Deep Learning Algorithm%基于深度学习的网页分类算法研究

    Institute of Scientific and Technical Information of China (English)

    陈芊希; 范磊

    2016-01-01

    Webpage classification can be used to select accurate webpage for users, which improves the accuracy of information retrieval. Deep learning is a new field in machine learning world. It's a multi-layer neural network learning algorithm, which achieves a very high accuracy by initializing the layer by layer. It has been used in image recognition, speech recognition and text classification. This paper uses the deep learning algorithm in webpage classification. With the experiments, it finds out that the deep learning has obvious advantages for webpage classification.%网页分类可将信息准确筛选与呈现给用户,提高信息检索的准确率.深度学习是机器学习中一个全新的领域,其本质是一种多层的神经网络学习算法,通过逐层初始化的方法来达到极高的准确率,被多次使用在图像识别、语音识别、文本分类中.提出了基于深度学习的网页分类算法,实验数据证明该方法可有效提高网页分类的准确率.

  2. Improved Ontology Concept Classification Algorithm Based on Extend Tag%基于扩展标记的改进本体概念分类算法

    Institute of Scientific and Technical Information of China (English)

    吕素刚; 郑洪源

    2011-01-01

    研究Pellet系统本体概念分类算法及其优化技术,在此基础上给出一种基于扩展标记的改进算法.该算法通过概念间已知的包含关系,控制分类过程中遍历时概念加入的顺序,并最大程度地双向传播这些关系,从而有效地降低概念包含测试的次数.验证结果表明,该算法的概念分类性能平均提高约22%.%Based on research of Pellet concept of ontology classification algorithm and some optimization techniques, this paper presents a method based on extend tag improvement. The method is mainly through the use of the inclusion relations concept between the known, taking control of the classification process of inclusion of the traversal order, and the maximum two-way dissemination of these relationships, thus effectively reducing the number of concept subsumption test. Algorithm verification results show that the refined algorithm improves the performance on average by about 22% when carrying out the concept classification.

  3. Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm.

    Science.gov (United States)

    Kargarfard, Fatemeh; Sami, Ashkan; Ebrahimie, Esmaeil

    2015-10-01

    Pandemic influenza is a major concern worldwide. Availability of advanced technologies and the nucleotide sequences of a large number of pandemic and non-pandemic influenza viruses in 2009 provide a great opportunity to investigate the underlying rules of pandemic induction through data mining tools. Here, for the first time, an integrated classification and association rule mining algorithm (CBA) was used to discover the rules underpinning alteration of non-pandemic sequences to pandemic ones. We hypothesized that the extracted rules can lead to the development of an efficient expert system for prediction of influenza pandemics. To this end, we used a large dataset containing 5373 HA (hemagglutinin) segments of the 2009 H1N1 pandemic and non-pandemic influenza sequences. The analysis was carried out for both nucleotide and protein sequences. We found a number of new rules which potentially present the undiscovered antigenic sites at influenza structure. At the nucleotide level, alteration of thymine (T) at position 260 was the key discriminating feature in distinguishing non-pandemic from pandemic sequences. At the protein level, rules including I233K, M334L were the differentiating features. CBA efficiently classifies pandemic and non-pandemic sequences with high accuracy at both the nucleotide and protein level. Finding hotspots in influenza sequences is a significant finding as they represent the regions with low antibody reactivity. We argue that the virus breaks host immunity response by mutation at these spots. Based on the discovered rules, we developed the software, "Prediction of Pandemic Influenza" for discrimination of pandemic from non-pandemic sequences. This study opens a new vista in discovery of association rules between mutation points during evolution of pandemic influenza.

  4. Classification of traumatic brain injury severity using informed data reduction in a series of binary classifier algorithms.

    Science.gov (United States)

    Prichep, Leslie S; Jacquin, Arnaud; Filipenko, Julie; Dastidar, Samanwoy Ghosh; Zabele, Stephen; Vodencarević, Asmir; Rothman, Neil S

    2012-11-01

    Assessment of medical disorders is often aided by objective diagnostic tests which can lead to early intervention and appropriate treatment. In the case of brain dysfunction caused by head injury, there is an urgent need for quantitative evaluation methods to aid in acute triage of those subjects who have sustained traumatic brain injury (TBI). Current clinical tools to detect mild TBI (mTBI/concussion) are limited to subjective reports of symptoms and short neurocognitive batteries, offering little objective evidence for clinical decisions; or computed tomography (CT) scans, with radiation-risk, that are most often negative in mTBI. This paper describes a novel methodology for the development of algorithms to provide multi-class classification in a substantial population of brain injured subjects, across a broad age range and representative subpopulations. The method is based on age-regressed quantitative features (linear and nonlinear) extracted from brain electrical activity recorded from a limited montage of scalp electrodes. These features are used as input to a unique "informed data reduction" method, maximizing confidence of prospective validation and minimizing over-fitting. A training set for supervised learning was used, including: "normal control," "concussed," and "structural injury/CT positive (CT+)." The classifier function separating CT+ from the other groups demonstrated a sensitivity of 96% and specificity of 78%; the classifier separating "normal controls" from the other groups demonstrated a sensitivity of 81% and specificity of 74%, suggesting high utility of such classifiers in acute clinical settings. The use of a sequence of classifiers where the desired risk can be stratified further supports clinical utility.

  5. Evaluation of the false recent classification rates of multiassay algorithms in estimating HIV type 1 subtype C incidence.

    Science.gov (United States)

    Moyo, Sikhulile; LeCuyer, Tessa; Wang, Rui; Gaseitsiwe, Simani; Weng, Jia; Musonda, Rosemary; Bussmann, Hermann; Mine, Madisa; Engelbrecht, Susan; Makhema, Joseph; Marlink, Richard; Baum, Marianna K; Novitsky, Vladimir; Essex, M

    2014-01-01

    Laboratory cross-sectional assays are useful for the estimation of HIV incidence, but are known to misclassify individuals with long-standing infection as recently infected. The false recent rate (FRR) varies widely across geographic areas; therefore, accurate estimates of HIV incidence require a locally defined FRR. We determined FRR for Botswana, where HIV-1 subtype C infection is predominant, using the BED capture enzyme immunoassay (BED), a Bio-Rad Avidity Index (BAI) assay (a modification of the Bio-Rad HIV1/2+O EIA), and two multiassay algorithms (MAA) that included clinical data. To estimate FRR, stored blood samples from 512 antiretroviral (ARV)-naive HIV-1 subtype C-infected individuals from a prospective cohort in Botswana were tested at 18-24 months postenrollment. The following FRR mean (95% CI) values were obtained: BED 6.05% (4.15-8.48), BAI 5.57% (3.70-8.0), BED-BAI 2.25% (1.13-4.0), and a combination of BED-BAI with CD4 (>200) and viral load (>400) threshold 1.43% (0.58-2.93). The interassay agreement between BED and BAI was 92.8% (95% CI, 90.1-94.5) for recent/long-term classification. Misclassification was associated with viral suppression for BED [adjusted OR (aOR) 10.31; p=0.008], BAI [aOR 9.72; p=0.019], and MAA1 [aOR 16.6; p=0.006]. Employing MAA can reduce FRR to <2%. A local FRR can improve cross-sectional HIV incidence estimates.

  6. Classification dynamique d'un flux documentaire : une \\'evaluation statique pr\\'ealable de l'algorithme GERMEN

    CERN Document Server

    Lelu, Alain; Johansson, Joel

    2008-01-01

    Data-stream clustering is an ever-expanding subdomain of knowledge extraction. Most of the past and present research effort aims at efficient scaling up for the huge data repositories. Our approach focuses on qualitative improvement, mainly for "weak signals" detection and precise tracking of topical evolutions in the framework of information watch - though scalability is intrinsically guaranteed in a possibly distributed implementation. Our GERMEN algorithm exhaustively picks up the whole set of density peaks of the data at time t, by identifying the local perturbations induced by the current document vector, such as changing cluster borders, or new/vanishing clusters. Optimality yields from the uniqueness 1) of the density landscape for any value of our zoom parameter, 2) of the cluster allocation operated by our border propagation rule. This results in a rigorous independence from the data presentation ranking or any initialization parameter. We present here as a first step the only assessment of a static ...

  7. 基于密度峰值的三维模型无监督分类算法%Unsupervised 3D Shape Classification Algorithm Using Density Peaks

    Institute of Scientific and Technical Information of China (English)

    舒振宇; 祁成武; 辛士庆; 胡超; 韩祥兰; 刘利刚

    2016-01-01

    In this paper, we propose an unsupervised classification algorithm by using density peaks for automatic content-based 3D model classification. Firstly, the algorithm extracts multiple kinds of feature vectors for each model in the given shape collection. Secondly, it uses robust principal component analysis to denoise the feature vectors and reduce their dimensions simultaneously. Finally, the algorithm determines the number of categories of the 3D models and realizes an unsupervised classification in an intuitive and visual way by computing the density peaks of the feature vectors’ distribution and a corresponding decision graph. Extensive experimental results show that the number of categories of clustering is much easier to determine and the results are more accurate and ro-bust in our algorithm when compared with the traditional algorithms.%针对基于内容的三维模型自动分类问题,提出一种密度峰值驱动的三维模型无监督分类算法。首先利用多种特征描述符分别对每个三维模型提取相应的特征向量;然后将得到的特征向量运用鲁棒主成分分析去除噪声并降维;最后通过计算特征向量分布的密度峰值,并配合决策图,以直观的方式确定三维模型分类类别数,最终实现三维模型的无监督分类。实验结果表明,与传统算法相比,该算法具有易于确定分类类别数、准确率高、鲁棒性强等优点。

  8. 一种基于蚁群优化的图像分类算法%AN IMAGE CLASSIFICATION ALGORITHM BASED ON ANT COLONY OPTIMISATION

    Institute of Scientific and Technical Information of China (English)

    屠莉; 杨立志

    2015-01-01

    现有图像降维方法中特征信息被过多压缩,从而影响图像分类效果。提出IC-ACO算法,利用蚁群算法来解决图像分类问题。算法充分提取并保留图像的各种形态特征。利用蚁群优化算法在特征集中自动挖掘有效特征和特征值,构建各类分类规则,从而实现图像的分类识别。在真实的车标图像数据集上的实验结果表明,IC-ACO算法比其他类似算法具有更高的分类识别率。%Feature information in current image dimension reduction methods has been excessively compressed,which impacts the efficiency of image classification.In this paper we present the IC-ACO algorithm,it employs ant colony optimisation to solve image classification problem.The algorithm fully extracts various morphological features of image and retains them.The ant colony optimisation is used to automatically mine effective features and feature values from feature sets,the algorithm then constructs the classification rules of every type,thus realises image’s classified recognition.Experimental results on actual vehicle-logo image data sets show that the IC-ACO algorithm outperforms other similar algorithms in terms of the classified recognition accuracy.

  9. 基于分类挖掘的网格资源分配研究%Research of resource allocation algorithm based on classification data mining

    Institute of Scientific and Technical Information of China (English)

    刘林东

    2013-01-01

    Making use of classification data mining algorithm to analyze user' s historical access information, this paper got user' s classification access rules and patterns in cluster renvironment. Firstly, it constructed a classification mining-based resource scheduling model. Secondly, it designed a UA algorithm to allocate all of the users' tasks in each cluster. Finally, it applied a new resource algorithm CDMRA to assign users' tasks to idle CPU resources in every node. Experiments show that CD-MAR algorithm can reduce the resource reallocation times and improve the efficiency and accuracy of resource allocation compared to other algorithms. It can increase the utilization of grid resources.%根据用户访问网格资源的历史信息,采用分类算法对此信息进行挖掘,得出用户使用集群资源的访问规则和模式,在此基础上构造一种基于分类挖掘的资源调度模型、用户调度UA算法以及资源调度CDMRA算法,分别将用户请求调度到各个集群中闲置的CPU资源.实验证明,采用基于分类挖掘的资源分配策略相比其他算法可以减少资源分配过程中对资源的重新分配次数,可以提高网格资源的利用率.

  10. 多分类问题的凸包收缩方法%Multi-classification algorithm based on contraction of closed convex hull

    Institute of Scientific and Technical Information of China (English)

    李雪辉; 魏立力

    2011-01-01

    在最大边缘线性分类器和闭凸包收缩思想的基础上,针对二分类问题,通过闭凸包收缩技术,将线性不可分问题转化为线性可分问题.将上述思想推广到解决多分类问题中,提出了一类基于闭凸包收缩的多分类算法.该方法几何意义明确,在一定程度上克服了以往多分类方法目标函数过于复杂的缺点,并利用核思想将其推广到非线性分类问题上.%According to the maximal margin linear classifier and the contraction of closed convex hull, 2-classification linearly non-separable problem can be transformed to linearly separable problem by using proposed contraction methods of closed convex hull.Multi-classification problem can be solved by contracting closed convex, and multi-classification algorithm based on the contraction of closed convex hull is presented.The geometric meaning of optimization problem is obvious.The shortcomings of complicated objective function in multi-classification are overcame, nonlinear separable multi-classification problem can be solved using kernel method.

  11. 基于建模仿真的战车分类算法研究%Research on the Military Vehicle Classification Algorithm Based on Modeling and Simulation

    Institute of Scientific and Technical Information of China (English)

    马云飞

    2014-01-01

    Recognition and classification of military vehicle is an important research content of information acquirement in battlefield. In order to collect data and study military vehicle classification algorithm, real external field experiment mode is popularly used, however it needs long time and expensive costs. In this paper, the tank model, armored vehicle model and truck model are built in the virtual battlefield simulation platform. The noise signals, magnetic field signals and vibration signals of the military vehicles in simulation environment are collected and used as the sample data for the research of military vehicle classification algorithm. In the same time this paper designs a classification algorithm of military vehicle based on the one-to-one multi-class SVM, and gives out an adjustment strategy for classifier parameters based on the cross-validation method. The experiment results show that, compared to AdaBoost algorithm, the present algorithm has higher classification accuracy on military vehicles.%战车类型的识别分类是现代情报获取的重要研究内容。为了获得数据并研究战车分类算法,常进行外场真实实验,但其时间长、耗资巨大。本文在虚拟战场仿真平台上建立坦克、装甲车、运兵车三种战车模型。利用仿真环境中的战车噪声、磁场、振动特征信号作为样本数据,进行战车的分类算法研究。同时基于一对一多分类支持向量机,设计了一种战车分类算法,并给出了分类器交叉验证参数调整策略。实验表明,相比于AdaBoost算法,文章提出的战车分类算法的分类准确率较高。

  12. THE USE OF RANDOM FOREST CLASSIFICATION AND K-MEANS CLUSTERING ALGORITHM FOR DETECTING TIME STAMPED SIGNATURES IN THE ACTIVE NETWORKS

    Directory of Open Access Journals (Sweden)

    Kamalanaban Ethala

    2013-01-01

    Full Text Available In day to day information security infrastructure, intrusion detection is indispensible. Signature based intrusion detection system mechanisms are often available in detecting many types of attacks. But this mechanism alone is not sufficient in many cases. Another intrusion detection method viz K-means is employed for clustering and classifying the unlabelled data. IDS is a special embedded device or relied software package which process of monitoring the events occurring in a computer system or network (WLAN (Wi-Fi, Wimax and LAN ((Ethernet, FDDI, ADSL, Token ring based and analysing them for sign of possible incident which are violations or forthcoming threats of violations of computer security policies or standard security policies (i.e., DMA acts. We proposed a new methodology for detecting intrusions by means of clustering and classification algorithms. There we used correlation clustering and K-means clustering algorithm for clustering and random forest algorithm for classification. This type of extension establishes a layer which refines the escalated alerts using signature-based correlation. In this study, signature based intrusion detection system with optimised algorithm for better prediction of intrusions has been addressed. Results are presented and discussed.

  13. A Complete Solution Classification and Unified Algorithmic Treatment for the One- and Two-Step Asymmetric S-Transverse Mass (MT2) Event Scale Statistic

    CERN Document Server

    Walker, Joel W

    2014-01-01

    The MT2 or "s-transverse mass", statistic was developed to cope with the difficulty of associating a parent mass scale with a missing transverse energy signature, given that models of new physics generally predict production of escaping particles in pairs, while collider experiments are sensitive to just a single vector sum over all sources of missing transverse momentum. This document focuses on the generalized extension of that statistic to asymmetric one- and two-step decay chains, with arbitrary child particle masses and upstream missing transverse momentum. It provides a unified theoretical formulation, complete solution classification, taxonomy of critical points, and technical algorithmic prescription for treatment of the MT2 event scale. An implementation of the described algorithm is available for download, and is also a deployable component of the author's fully-featured selection cut software package AEACuS (Algorithmic Event Arbiter and Cut Selector).

  14. 基于KD-Tree的KNN文本分类算法%KNN Algorithm for Text Classification Based on KD-Tree

    Institute of Scientific and Technical Information of China (English)

    刘忠; 刘洋; 建晓

    2012-01-01

    This paper apply KD-Tree to KNN text classification algorithm,firstly put a training text set into a KD-Tree,then search KD-Tree for the all parents nodes of the tested text node,the set including these parents text nodes is the most nearest text set,the type of the tested text is the same as the type of the most nearest text which has the most similarity with the test text,this algorithm decreases the number of the compared texts,and the time complexity is o(log2N).Experiments show that the improved KNN text classification algorithm is better than the traditional KNN text classification in classification efficiency.%本文将KD-Tree应用到KNN文本分类算法中,先对训练文本集建立一个KD-Tree,然后在KD-Tree中搜索测试文本的所有祖先节点文本,这些祖先节点文本集合就是待测文本的最邻近文本集合,与测试文本有最大相似度的祖先的文本类型就是待测试文本的类型,这种算法大大减少了参与比较的向量文本数目,时间复杂度仅为O(log2N)。实验表明,改进后的KNN文本分类算法具有比传统KNN文本分类法更高的分类效率。

  15. L1+L2正则化逻辑斯蒂模型分类算法%Logistic Model Classification Algorithm via L1+L2 Regularization

    Institute of Scientific and Technical Information of China (English)

    刘建伟; 付捷; 罗雄麟

    2012-01-01

    提出一种L1+L2范数正则化逻辑斯蒂模型分类算法.该算法引入L2范数正则化,解决L1正则化逻辑斯蒂算法迭代过程奇异问题,通过引入样本向量的扩展和新的权值向量完成L1范数非平滑问题,最终使用共轭梯度方法求解经过转化的最优化问题.在各种实际数据集上的实验结果表明,该算法优于L2范数、L1范数和Lp范数正则化逻辑斯蒂模型,具有较好的特征选择和分类性能.%This paper proposes an L1+L2 norm regularized logistic model classification algorithm, and the singularity of iterative process in LI norm regularized logistic classification algorithm is solved by using L2 norm regularization. The non-smooth problem is transformed into smooth one via argumentation of vector of samples and introduction of new weight vector, and classification object function is solved using the conjugate gradient method. Performance of classification and feature selection on real datasets shows that the algorithm is better than L2 norm, LI nrom and Lp norm regularized logistic model.

  16. AIMES Final Technical Report

    Energy Technology Data Exchange (ETDEWEB)

    Katz, Daniel S [Univ. of Illinois, Urbana-Champaign, IL (United States). National Center for Supercomputing Applications (NCSA); Jha, Shantenu [Rutgers Univ., New Brunswick, NJ (United States); Weissman, Jon [Univ. of Minnesota, Minneapolis, MN (United States); Turilli, Matteo [Rutgers Univ., New Brunswick, NJ (United States)

    2017-01-31

    This is the final technical report for the AIMES project. Many important advances in science and engineering are due to large-scale distributed computing. Notwithstanding this reliance, we are still learning how to design and deploy large-scale production Distributed Computing Infrastructures (DCI). This is evidenced by missing design principles for DCI, and an absence of generally acceptable and usable distributed computing abstractions. The AIMES project was conceived against this backdrop, following on the heels of a comprehensive survey of scientific distributed applications. AIMES laid the foundations to address the tripartite challenge of dynamic resource management, integrating information, and portable and interoperable distributed applications. Four abstractions were defined and implemented: skeleton, resource bundle, pilot, and execution strategy. The four abstractions were implemented into software modules and then aggregated into the AIMES middleware. This middleware successfully integrates information across the application layer (skeletons) and resource layer (Bundles), derives a suitable execution strategy for the given skeleton and enacts its execution by means of pilots on one or more resources, depending on the application requirements, and resource availabilities and capabilities.

  17. A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

    Directory of Open Access Journals (Sweden)

    Li Zhen

    2008-05-01

    Full Text Available Abstract Background Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML methods. Results The classification performance of Artificial Neural Networks (ANN, K-Nearest Neighbors (KNN, Linear Discriminant Analysis (LDA, Naïve Bayes (NB, Recursive Partitioning and Regression Trees (RPART, and Support Vector Machines (SVM in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5 were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. Conclusion We have developed a novel simulation model to evaluate machine learning methods for the

  18. Aims of the Workshop

    CERN Document Server

    Dornan, P J

    2010-01-01

    There are challenges and opportunities for the European particle physics community to engage with innovative and exciting developments which could lead to precision measurements in the neutrino sector. These have the potential to yield significant advances in the understanding of CP violation, the flavour riddle and theories beyond the Standard Model. This workshop aims to start the process of a dialogue in Europe so that informed decisions on the appropriate directions to pursue can be made in a few years time.

  19. Aims and Scope

    Institute of Scientific and Technical Information of China (English)

    2015-01-01

    Chinese Nursing Research(CNR)is the official peer-reviewed research journal and an international platform of nursing academic exchange.This journal aims to promote excellence in nursing and health care through the dissemination of the latest,evidence-based,peer-reviewed research report,academic papers,review articles and clinical research,to reflect nursing academic trends,scientific research and

  20. Load characteristics classification based on adaptive genetic algorithm%基于自适应遗传算法的负荷特性分类

    Institute of Scientific and Technical Information of China (English)

    白建勋; 杨洪耕; 吴传来; 唐山

    2012-01-01

    提出了运用一种改进的遗传算法对电力负荷特性进行分类的新方法.通过对样本进行遗传操作,求出适应度最高的个体,解码得到最优聚类中心,再根据样本与各中心距离进行划分,从而得到负荷样本的最优分类结果,用获得分类的聚类中心对所属类别样本进行拟合以检验分类效果.改进后的遗传算法的交叉概率和变异概率随进化过程自适应变化,在保证遗传算法良好的全局性和随机性的同时,避免了早熟收敛和收敛过慢.实际算例表明,用这种改进遗传算法对电力负荷特性进行分类,能够有效避免初始条件对分类结果的过度影响,取得了良好的分类效果.%A new method based on improved genetic algorithm is presented for load characteristics classification. The best individual which is of the highest fitness can be obtained by genetic manipulation on samples, and the individual is decoded to get the best cluster center, then the optimal classification is obtained by dividing samples based on the distance of the samples and the cluster centers, and finally the samples are fitted with the cluster centers of respective categories to test the classification accuracy. While ensuring the overall performance and randomness of adaptive genetic algorithm, the adaptive changing of the crossover probability and mutation probability with the process of evolution proposed in this paper can avoid the premature convergence and slow convergence which may appear in traditional genetic algorithm. Practical examples show that it can avoid the excessive impact of the initial conditions on the classification results and achieves desired classification results when classifying load characteristics with adaptive genetic algorithm.

  1. Ontology-Based Classification System Development Methodology

    OpenAIRE

    2015-01-01

    The aim of the article is to analyse and develop an ontology-based classification system methodology that uses decision tree learning with statement propositionalized attributes. Classical decision tree learning algorithms, as well as decision tree learning with taxonomy and propositionalized attributes have been observed. Thus, domain ontology can be extracted from the data sets and can be used for data classification with the help of a decision tree. The use of ontology methods in decision ...

  2. Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177

  3. Knowledge discovery from patients' behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services.

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.

  4. An Evaluation of Different Training Sample Allocation Schemes for Discrete and Continuous Land Cover Classification Using Decision Tree-Based Algorithms

    Directory of Open Access Journals (Sweden)

    René Roland Colditz

    2015-07-01

    Full Text Available Land cover mapping for large regions often employs satellite images of medium to coarse spatial resolution, which complicates mapping of discrete classes. Class memberships, which estimate the proportion of each class for every pixel, have been suggested as an alternative. This paper compares different strategies of training data allocation for discrete and continuous land cover mapping using classification and regression tree algorithms. In addition to measures of discrete and continuous map accuracy the correct estimation of the area is another important criteria. A subset of the 30 m national land cover dataset of 2006 (NLCD2006 of the United States was used as reference set to classify NADIR BRDF-adjusted surface reflectance time series of MODIS at 900 m spatial resolution. Results show that sampling of heterogeneous pixels and sample allocation according to the expected area of each class is best for classification trees. Regression trees for continuous land cover mapping should be trained with random allocation, and predictions should be normalized with a linear scaling function to correctly estimate the total area. From the tested algorithms random forest classification yields lower errors than boosted trees of C5.0, and Cubist shows higher accuracies than random forest regression.

  5. Binary classification of chalcone derivatives with LDA or KNN based on their antileishmanial activity and molecular descriptors selected using the Successive Projections Algorithm feature-selection technique.

    Science.gov (United States)

    Goodarzi, Mohammad; Saeys, Wouter; de Araujo, Mario Cesar Ugulino; Galvão, Roberto Kawakami Harrop; Vander Heyden, Yvan

    2014-01-23

    Chalcones are naturally occurring aromatic ketones, which consist of an α-, β-unsaturated carbonyl system joining two aryl rings. These compounds are reported to exhibit several pharmacological activities, including antiparasitic, antibacterial, antifungal, anticancer, immunomodulatory, nitric oxide inhibition and anti-inflammatory effects. In the present work, a Quantitative Structure-Activity Relationship (QSAR) study is carried out to classify chalcone derivatives with respect to their antileishmanial activity (active/inactive) on the basis of molecular descriptors. For this purpose, two techniques to select descriptors are employed, the Successive Projections Algorithm (SPA) and the Genetic Algorithm (GA). The selected descriptors are initially employed to build Linear Discriminant Analysis (LDA) models. An additional investigation is then carried out to determine whether the results can be improved by using a non-parametric classification technique (One Nearest Neighbour, 1NN). In a case study involving 100 chalcone derivatives, the 1NN models were found to provide better rates of correct classification than LDA, both in the training and test sets. The best result was achieved by a SPA-1NN model with six molecular descriptors, which provided correct classification rates of 97% and 84% for the training and test sets, respectively.

  6. The documents classification algorithm based on LDA%基于LDA的文本分类算法

    Institute of Scientific and Technical Information of China (English)

    何锦群; 刘朋杰

    2014-01-01

    Latent Dirichlet Allocation is a classic topic model which can extract latent topic from large data corpus. Model assumes that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. Through narrowing the generate scope that each document generated from, in this paper, we present an improved text classification algorithm for adding topic-category distribution parameter to Latent Dirichlet Allocation. Documents in this model are generated from the category they most relevant. Gibbs sampling is employed to conduct approximate inference. And preliminary experiment is presented at the end of this paper.%LDA可以实现大量数据集合中潜在主题的挖掘与文本信息的分类,模型假设,如果文档与某主题相关,那么文档中的所有单词都与该主题相关。然而,在面对实际环境中大规模的数据,这会导致主题范围的扩大,不能对主题单词的潜在语义进行准确定位,限制了模型的鲁棒性和有效性。本文针对LDA的这一弊端提出了新的文档主题分类算法gLDA,该模型通过增加主题类别分布参数确定主题的产生范围,提高分类的准确性。 Reuters-21578数据集与复旦大学文本语料库中的数据结果证明,相对于传统的主题分类模型,该模型的分类效果得到了一定程度的提高。

  7. Classification of textures in satellite image with Gabor filters and a multi layer perceptron with back propagation algorithm obtaining high accuracy

    Directory of Open Access Journals (Sweden)

    Adriano Beluco, Paulo M. Engel, Alexandre Beluco

    2015-01-01

    Full Text Available The classification of images, in many cases, is applied to identify an alphanumeric string, a facial expression or any other characteristic. In the case of satellite images is necessary to classify all the pixels of the image. This article describes a supervised classification method for remote sensing images that integrates the importance of attributes in selecting features with the efficiency of artificial neural networks in the classification process, resulting in high accuracy for real images. The method consists of a texture segmentation based on Gabor filtering followed by an image classification itself with an application of a multi layer artificial neural network with a back propagation algorithm. The method was first applied to a synthetic image, like training, and then applied to a satellite image. Some results of experiments are presented in detail and discussed. The application of the method to the synthetic image resulted in the identification of 89.05% of the pixels of the image, while applying to the satellite image resulted in the identification of 85.15% of the pixels. The result for the satellite image can be considered a result of high accuracy.

  8. Performance Analysis of Anti-Phishing Tools and Study of Classification Data Mining Algorithms for a Novel Anti-Phishing System

    Directory of Open Access Journals (Sweden)

    Rajendra Gupta

    2015-11-01

    Full Text Available The term Phishing is a kind of spoofing website which is used for stealing sensitive and important information of the web user such as online banking passwords, credit card information and user's password etc. In the phishing attack, the attacker generates the warning message to the user about the security issues, ask for confidential information through phishing emails, ask to update the user's account information etc. Several experimental design considerations have been proposed earlier to countermeasure the phishing attack. The earlier systems are not giving more than 90 percentage successful results. In some cases, the system tool gives only 50-60 percentage successful result. In this paper, a novel algorithm is developed to check the performance of the anti-phishing system and compared the received data set with the data set of existing anti-phishing tools. The performance evaluation of novel anti-phishing system is studied with four different classification data mining algorithms which are Class Imbalance Problem (CIP, Rule based Classifier (Sequential Covering Algorithm (SCA, Nearest Neighbour Classification (NNC, Bayesian Classifier (BC on the data set of phishing and legitimate websites. The proposed system shows less error rate and better performance as compared to other existing system tools.

  9. Multi-class Classification Algorithm Based on Local Optimization%基于局部优化的多类分类算法

    Institute of Scientific and Technical Information of China (English)

    单瑾; 刘明纲; 罗侃

    2016-01-01

    In order to solve the blindness and imbalance that appeares commonly in the traditional multi-class classification, this paper designs an improved and localized multi-class classification algorithm based on mutual communication entropy and support vector data description (SVDD), which is known as EL-SVDD algorithm. Firstly, this algorithm calculates the mutual communication entropy with many local classes of samples. Secondly, one class is placed inside the ball based on the mutual communication entropy. Finally, according to the samples and mutual communication entropy, it reinterprets theCvalues of SVDD algorithm. Experiment results show that EL-SVDD algorithm not only has the feasibility, but also can effectively and stably improve the accuracy of many types of analysis.%为了解决传统多类分类问题中普遍出现的偏离性与不平衡性,依据互通信熵理论与支持向量数据描述(SVDD)分类原理,设计出一种改进的局部性SVDD多类分类算法,即EL-SVDD算法。此算法首先以局部样本信息为载体,计算出互通信熵参数值;其次在多维度空间球体中以互通信熵参数值分类放置测试样本数据信息;最后综合分析测试样本大小与互通信熵参数值,重新诠释了SVDD算法中的C值。实验表明,EL-SVDD算法不仅具有可行性,而且能够有效和稳定地提高多类分析精度。

  10. 基于直方图统计分类隐写检测算法设计%THE STEGANOGRAPHY DETECTION ALGORITHM DESIGN BASED ON HISTOGRAM STATISTICAL CLASSIFICATION

    Institute of Scientific and Technical Information of China (English)

    邱志宏

    2013-01-01

    In order to improve the detection performance of steganography detection algorithm,in this paper we put forward a steganography detection algorithm which is based on histogram statistical classification.Through extracting the histogram characteristic parameters of the image information and using the classification mode which is constructed based on artificial neural networks,this algorithm achieves accurate judgement on images embedded with steganography information.We analyse in detail the design,principle and process of this steganography algorithm,and at last construct the experimental test environment.Test results indicate that the detection success rate and the false alarm rate of the steganography detection algorithm designed in the paper are better than those of Ezstego detection tool.%为了提高隐写检测算法的检测性能,提出基于直方图统计分类的隐写检测算法.通过对图片信息的直方图特征参数的提取,使用构建基于人工神经网络的分类模式,实现对嵌入隐写信息的图片准确判定.详细分析基于直方图统计分类隐写算法的设计原理和过程,最后构建实验测试环境.测试结果表明,该隐写检测算法的检测成功率和误检率均优于Ezstego检测工具.

  11. Classification Rule Mining Based on Improved Ant-miner Algorithm%基于改进Ant-miner算法的分类规则挖掘

    Institute of Scientific and Technical Information of China (English)

    肖菁; 梁燕辉

    2012-01-01

    为提高基于传统Ant-miner算法分类规则的预测准确性,提出一种基于改进Ant-miner的分类规则挖掘算法.利用样例在总样本中的密度及比例构造启发式函数,以避免在多个具有相同概率的选择条件下造成算法偏见.对剪枝规则按变异系数进行单点变异,由此扩大规则的搜索空间,提高规则的预测准确度.在Ant-miner算法的信息素更新公式中加入挥发系数,使其更接近现实蚂蚁的觅食行为,防止算法过早收敛.基于UCI标准数据的实验结果表明,该算法相比传统Ant-miner算法具有更高的预测准确度.%In order to improve the classification rule accuracy of the classical Ant-miner algorithm, this paper proposes an improved Ant-miner algorithm for classification rule mining. Heuristic function with sample density and sample proportion is constructed to avoid the bias caused by the same probability in Ant-miner. A pruning strategy with mutation probability is emploied to expand the search space and improve the rule accuracy. An evaporation coefficient in Ant-miner's pheromone update formula is added to slow down the convergence rate of the algorithm. Experimental results on UCI datasets show that the proposed algorithm is promising and can obtain higher predication accuracy than the original Ant-miner algorithm.

  12. An evidence gathering and assessment technique designed for a forest cover classification algorithm based on the Dempster-Shafer theory of evidence

    Science.gov (United States)

    Szymanski, David Lawrence

    This thesis presents a new approach for classifying Landsat 5 Thematic Mapper (TM) imagery that utilizes digitally represented, non-spectral data in the classification step. A classification algorithm that is based on the Dempster-Shafer theory of evidence is developed and tested for its ability to provide an accurate representation of forest cover on the ground at the Anderson et al (1976) level II. The research focuses on defining an objective, systematic method of gathering and assessing the evidence from digital sources including TM data, the normalized difference vegetation index, soils, slope, aspect, and elevation. The algorithm is implemented using the ESRI ArcView Spatial Analyst software package and the Grid spatial data structure with software coded in both ArcView Avenue and also C. The methodology uses frequency of occurrence information to gather evidence and also introduces measures of evidence quality that quantify the ability of the evidence source to differentiate the Anderson forest cover classes. The measures are derived objectively and empirically and are based on common principles of legal argument. The evidence assessment measures augment the Dempster-Shafer theory and the research will determine if they provide an argument that is mentally sound, credible, and consistent. This research produces a method for identifying, assessing, and combining evidence sources using the Dempster-Shafer theory that results in a classified image containing the Anderson forest cover class. Test results indicate that the new classifier performs with accuracy that is similar to the traditional maximum likelihood approach. However, confusion among the deciduous and mixed classes remains. The utility of the evidence gathering method and also the evidence assessment method is demonstrated and confirmed. The algorithm presents an operational method of using the Dempster-Shafer theory of evidence for forest classification.

  13. 一种基于决策表的分类规则挖掘新算法%A New Algorithm of Mining Classification Rules Based on Decision Table

    Institute of Scientific and Technical Information of China (English)

    谢娟英; 冯德民

    2003-01-01

    The mining of classification rules is an important field in Data Mining. Decision table of rough sets theory is an efficient tool for mining classification rules. The elementary concepts corresponding to decision table of Rough Sets Theory are introduced in this paper. A new algorithm for mining classification rules based on Decision Table is presented, along with a discernable function in reduction of attribute values, and a new principle for accuracy of rules. An example of its application to the car's classification problem is included, and the accuracy of rules discovered is analyzed. The potential fields for its application in data mining are also discussed.

  14. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils.

    Science.gov (United States)

    Devos, Olivier; Downey, Gerard; Duponchel, Ludovic

    2014-04-01

    Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates.

  15. Graph classification algorithm based on divide and conquer strategy and Hash linked list%基于分而治之及Hash链表的图分类算法

    Institute of Scientific and Technical Information of China (English)

    孙伟; 朱正礼

    2013-01-01

    主流的图结构数据分类算法大都是基于频繁子结构挖掘策略.这一策略必然导致对全局数据空间的不断重复搜索,从而使得该领域相关算法的效率较低,无法满足特定要求.针对此类算法的不足,采用分而治之方法,设计出一种模块化数据空间和利用Hash链表存取地址及支持度的算法.将原始数据库按照规则划分为有限的子模块,利用gSpan算法对各个模块进行操作获取局部频繁子模式,再利用Hash函数将各模块挖掘结果映射出唯一存储地址,同时记录其相应支持度构成Hash链表,最后得到全局频繁子模式并构造图数据分类器.算法避免了对全局空间的重复搜索,从而大幅度提升了执行效率;也使得模块化后的数据可以一次性装入内存,从而节省了内存开销.实验表明,新算法在分类模型塑造环节的效率较之于主流图分类算法提升了1.2~3.2倍,同时分类准确率没有下降.%The mainstream graph data classification algorithms are based on frequent substructure mining strategy, which inevitably leads to searching the global data space repeatedly and hence the related algorithms have low efficiency and cannot meet specific requirements. Aiming at the disadvantages of such algorithms, firstly, the "divide and conquer" strategy is used to design a modular data space and an algorithm that use the hash linked list to store the address and the support degree. Secondly, the original database is partitioned into a limited number of sub-modules according to the rules, and the gSpan algorithm is used to handle each sub-module to get the locally frequent sub-model. Thirdly, Hash functions are used to calculate the unique memory address of the mining result of each module, and construct the Hash linked list by recording the support degree. Finally, the globally frequent sub-model is obtained and the graph data classifier is built up. The algorithm avoids searching the global space

  16. [Aiming for zero blindness].

    Science.gov (United States)

    Nakazawa, Toru

    2015-03-01

    -independent factors, as well as our investigation of ways to improve the clinical evaluation of the disease. Our research was prompted by the multifactorial nature of glaucoma. There is a high degree of variability in the pattern and speed of the progression of visual field defects in individual patients, presenting a major obstacle for successful clinical trials. To overcome this, we classified the eyes of glaucoma patients into 4 types, corresponding to the 4 patterns of glaucomatous optic nerve head morphology described: by Nicolela et al. and then tested the validity of this method by assessing the uniformity of clinical features in each group. We found that in normal tension glaucoma (NTG) eyes, each disc morphology group had a characteristic location in which the loss of circumpapillary retinal nerve fiber layer thickness (cpRNFLT; measured with optical coherence tomography: OCT) was most likely to occur. Furthermore, the incidence of reductions in visual acuity differed between the groups, as did the speed of visual field loss, the distribution of defective visual field test points, and the location of test points that were most susceptible to progressive damage, measured by Humphrey static perimetry. These results indicate that Nicolela's method of classifying eyes with glaucoma was able to overcome the difficulties caused by the diverse nature of the disease, at least to a certain extent. Building on these findings, we then set out to identify sectors of the visual field that correspond to the distribution of retinal nerve fibers, with the aim of detecting glaucoma progression with improved sensitivity. We first mapped the statistical correlation between visual field test points and cpRNFLT in each temporal clock-hour sector (from 6 to 12 o'clock), using OCT data from NTG patients. The resulting series of maps allowed us to identify areas containing visual field test points that were prone to be affected together as a group. We also used a similar method to identify visual

  17. Quality-Oriented Classification of Aircraft Material Based on SVM

    Directory of Open Access Journals (Sweden)

    Hongxia Cai

    2014-01-01

    Full Text Available The existing material classification is proposed to improve the inventory management. However, different materials have the different quality-related attributes, especially in the aircraft industry. In order to reduce the cost without sacrificing the quality, we propose a quality-oriented material classification system considering the material quality character, Quality cost, and Quality influence. Analytic Hierarchy Process helps to make feature selection and classification decision. We use the improved Kraljic Portfolio Matrix to establish the three-dimensional classification model. The aircraft materials can be divided into eight types, including general type, key type, risk type, and leveraged type. Aiming to improve the classification accuracy of various materials, the algorithm of Support Vector Machine is introduced. Finally, we compare the SVM and BP neural network in the application. The results prove that the SVM algorithm is more efficient and accurate and the quality-oriented material classification is valuable.

  18. Optimization of classification based on combination of Adaboost and CART algorithm%基于Adaboost和CART结合的优化分类算法

    Institute of Scientific and Technical Information of China (English)

    丁雍; 李小霞

    2011-01-01

    提出了一种基于Adaboost算法和CART算法结合的分类算法。以特征为节点生成CART二叉树,用CART二叉树代替传统Adaboost算法中的弱分类器,再由这些弱分类器生成强分类器。将强分类器对数字样本和人脸样本分类,与传统Adaboost算法相比,该方法的错误率分别减少20%和86.5%。将分类器应用于目标检测上,实现了对这两种目标的快速检测和定位。结果表明,改进算法既减小了对样本分类的错误率,又保持了传统Adboost算法对目标检测的快速性。%This paper presents a method based on combination of Adaboost and CART algorithm. The method firstly uses CART binary tree as a weak classifier, and then combines these weak classifiers to generate a strong classifier. Compared with the conventional Adaboost algorithm, using the strong classifier in face and digital number classification, the error rates are reduced by 20% and 86.5%. Using the strong classifier on object detection, targets' positions are quickly found out in pictures. The resuhs show that the improved algorithm can not only reduce the classification error, but also maintain the rapidity feature in object detection of traditional Adboost algorithm.

  19. Content-based and algorithmic classifications of journals: perspectives on the dynamics of scientific communication and indexer effects

    NARCIS (Netherlands)

    Rafols, I.; Leydesdorff, L.; Larsen, B.; Leta, J.

    2009-01-01

    The aggregated journal-journal citation matrix—based on the Journal Citation Reports (JCR) of the Science Citation Index—can be decomposed by indexers and/or algorithmically. In this study, we test the results of two recently available algorithms for the decomposition of large matrices against two c

  20. Content-based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects

    NARCIS (Netherlands)

    Rafols, I; Leydesdorff, L.

    2009-01-01

    The aggregated journal-journal citation matrix—based on the Journal Citation Reports (JCR) of the Science Citation Index—can be decomposed by indexers or algorithmically. In this study, we test the results of two recently available algorithms for the decomposition of large matrices against two conte

  1. Prediction models discriminating between nonlocomotive and locomotive activities in children using a triaxial accelerometer with a gravity-removal physical activity classification algorithm.

    Directory of Open Access Journals (Sweden)

    Yuki Hikihara

    Full Text Available The aims of our study were to examine whether a gravity-removal physical activity classification algorithm (GRPACA is applicable for discrimination between nonlocomotive and locomotive activities for various physical activities (PAs of children and to prove that this approach improves the estimation accuracy of a prediction model for children using an accelerometer. Japanese children (42 boys and 26 girls attending primary school were invited to participate in this study. We used a triaxial accelerometer with a sampling interval of 32 Hz and within a measurement range of ±6 G. Participants were asked to perform 6 nonlocomotive and 5 locomotive activities. We measured raw synthetic acceleration with the triaxial accelerometer and monitored oxygen consumption and carbon dioxide production during each activity with the Douglas bag method. In addition, the resting metabolic rate (RMR was measured with the subject sitting on a chair to calculate metabolic equivalents (METs. When the ratio of unfiltered synthetic acceleration (USA and filtered synthetic acceleration (FSA was 1.12, the rate of correct discrimination between nonlocomotive and locomotive activities was excellent, at 99.1% on average. As a result, a strong linear relationship was found for both nonlocomotive (METs = 0.013×synthetic acceleration +1.220, R2 = 0.772 and locomotive (METs = 0.005×synthetic acceleration +0.944, R2 = 0.880 activities, except for climbing down and up. The mean differences between the values predicted by our model and measured METs were -0.50 to 0.23 for moderate to vigorous intensity (>3.5 METs PAs like running, ball throwing and washing the floor, which were regarded as unpredictable PAs. In addition, the difference was within 0.25 METs for sedentary to mild moderate PAs (<3.5 METs. Our specific calibration model that discriminates between nonlocomotive and locomotive activities for children can be useful to evaluate the sedentary to vigorous

  2. Comparison Research on the Algorithms of Network Traffic Classification%网络流量分类算法比较研究

    Institute of Scientific and Technical Information of China (English)

    彭勃

    2012-01-01

    Accurate traffic classification is of fundamental importance to numerous network activities and it has been a hot topic in network measurement for a long time. A comparison of six algorithms of traffic classification based on flow features is conducted. Analysis and experiment show that using feature seletion method the support vector machine (SVM) method has high accuracy and better computational performance for network traffic classification.%准确的网络流量分类既是众多网络研究工作的重要基础,也是网络测量领域的研究热点.基于流特征的六种分类算法进行比较分析,实验结果表明,使用特征选择方法,SVM算法具有较高的整体准确率和较好的计算性能,适合用于网络流量分类.

  3. Impact of Reducing Polarimetric SAR Input on the Uncertainty of Crop Classifications Based on the Random Forests Algorithm

    DEFF Research Database (Denmark)

    Loosvelt, Lien; Peters, Jan; Skriver, Henning;

    2012-01-01

    Although the use of multidate polarimetric synthetic aperture radar (SAR) data for highly accurate land cover classification has been acknowledged in the literature, the high dimensionality of the data set remains a major issue. This study presents two different strategies to reduce the number...

  4. Paper 5 : Surveillance of Multiple Congenital Anomalies: Implementation of a Computer Algorithm in European Registers for Classification of Cases

    NARCIS (Netherlands)

    Garne, Ester; Dolk, Helen; Loane, Maria; Wellesley, Diana; Barisic, Ingeborg; Calzolari, Elisa; Densem, James

    2011-01-01

    BACKGROUND: Surveillance of multiple congenital anomalies is considered to be more sensitive for the detection of new teratogens than surveillance of all or isolated congenital anomalies. Current literature proposes the manual review of all cases for classification into isolated or multiple congenit

  5. Paper 5: Surveillance of multiple congenital anomalies: implementation of a computer algorithm in European registers for classification of cases

    DEFF Research Database (Denmark)

    Garne, Ester; Dolk, Helen; Loane, Maria;

    2011-01-01

    Surveillance of multiple congenital anomalies is considered to be more sensitive for the detection of new teratogens than surveillance of all or isolated congenital anomalies. Current literature proposes the manual review of all cases for classification into isolated or multiple congenital...

  6. 一种基于独立分类特征的指纹多级分类算法%A MULTI-LEVEL FINGERPRINT CLASSIFICATION ALGORITHM BASED ON INDEPENDENT CLASSIFICATION FEATURES

    Institute of Scientific and Technical Information of China (English)

    左龙; 彭小奇; 钟云飞; 彭曦; 唐英

    2013-01-01

    为了提高基于大容量指纹库的自动指纹识别系统的检索效率,提出一种基于独立分类特征的指纹多级分类算法.依据评测指标对输入指纹图像进行质量评估,若指纹质量不合格,则提醒用户重新输入;若指纹质量合格,则分别利用指纹图像的纹型类别,奇异点间脊线数、中心区域脊线平均频率3个相互独立的分类特征实现多级分类,从而逐级减小检索空间.实验结果表明,该分类算法检索效率高、鲁棒性强,为大容量指纹库提供了一种快速有效的索引机制,具有很强的实用性.%In order to improve retrieval efficiency of the automated fingerprint identification system for large-scale fingerprint database, a multi-level fingerprint classification algorithm based on independent classification features is proposed. Quality evaluation indexes could be applied to evaluate the quality of input fingerprints, remind the users to re-enter fingerprints if the quality is poor. For the good quality fingerprints, three independent features including grain type, number of ridges between singular points and average frequency of ridge line in central region are separately used to realise multi-level classification, which could decrease the retrieval space gradually. Experimental results show, the proposed classification algorithm has high retrieval efficiency and strong robustness, which provides a rapid and effective index mechanism for large-scale fingerprint database. It has good practicality.

  7. An Improved KNN Algorithm Based on Multi-attribute Classification%基于多属性分类的KNN改进算法

    Institute of Scientific and Technical Information of China (English)

    张炯辉; 许尧舜

    2013-01-01

    To improve the classification accuracy of the conventional Euclidean KNN algorithm and the im-proved KNN algorithm based on information entropy,this paper proposes an improved KNN algorithm based on multi-attribute classification. The procedures of the new algorithm comprise:i) classify the attributes according to the percentage of their attribute values in an entire attribute of sample set into those discrete attributes suit-able for entropy-based KNN algorithm and those continuous attributes suitable for conventional Euclidean KNN similarity-based algorithm;ii) process the two types of attributes separately and then sum up the two series of results with weighing and put the sum as the distance between samples;iii) select k samples those are closest to the test sample to determine the decision attribute type of the test sample.%提出了一种基于多属性分类的KNN改进算法,可有效提高传统的欧几里德KNN算法和基于信息熵的KNN改进算法的分类准确度。首先,按照单个属性不同属性值的个数占整个属性包含样本的比例进行属性的分类,分为基于信息熵的KNN算法处理的离散属性和基于传统欧几里德KNN相似度处理的连续属性两类,然后分别对不同属性进行区别处理;其次,将两类不同处理后得到的结果按比例求和作为样本之间的距离;最后,选取与待测样本的距离最小的k个样本判断测试样本的决策属性类别。

  8. Application of Classification Algorithm in Blood-activating and Stasis-dissolving Drugs Mining%分类算法在活血化瘀类药物挖掘中的应用

    Institute of Scientific and Technical Information of China (English)

    马亮

    2015-01-01

    [目的]分析和探讨分类算法在活血化瘀类药物挖掘中的应用。[方法]从收集整理的中药数据库中抽取部分数据进行分类实验,测试Naive Bayes算法和支持向量机算法的分类准确率,寻找药物分类规律和药效特征。[结果]分类算法得到的分类准确率较高,部分类别有鲜明的分类特征。[结论]分类算法有助于活血化瘀类药物药效分类。%Objective] This paper investigates and discusses the application of classification algorithm in blood-activating and sta⁃sis-dissolving drugs mining. [Method] It extracts part of the data in Chinese traditional medicine database to carry out classifica⁃tion experiment using Naive Bayes algorithm and the support vector machine algorithm,analyzes the accuracy, then finds drug classification rules and pharmacological characteristics.[Results] Classification algorithms are effective, and some categories have distinctive taxonomic characteristics. [Conclusion] The classification algorithm helps blood-activating and stasis-dissolving drugs efficacy classification.

  9. 基于贝叶斯算法的图像分类系统设计%The Image Classification System Based on the Bayesian Algorithm

    Institute of Scientific and Technical Information of China (English)

    席伟

    2012-01-01

      Image classification is an important research direction of information processing, which involves a include image fea⁃ture extraction, to establish the image data decision table, select the appropriate pattern recognition algorithm for image classifica⁃tion. This paper selects the commonly used pattern recognition based on the minimum probability of error of the Bias algorithm, based on two kinds of image classification problems. Using the MATLAB graphical user interface ( GUI ) design method, a good human-computer interaction system main interface, finally gives the practical examples of the program running results, to pro⁃mote the theory of pattern recognition in image classification application and popularization, has practical significance.%  图像分类是信息处理的重要研究方向,其中涉及了包括有图像特征提取、建立图像数据决策表,选取适当模式识别算法实现图像的分类。该文选取了模式识别常用的基于最小错误概率的贝叶斯算法,实现了对两类图像的分类问题。利用MATLAB图形用户界面(GUI)方法,设计了良好的人机交互系统的主界面,最后给出了实际例子的程序运行结果,对推动模式识别理论在图像分类问题实践中的应用和普及,具有实际意义。

  10. Classification of Subjective and Objective Clauses Based on Adaboost Algorithm%基于Adaboost算法的主客观句分类

    Institute of Scientific and Technical Information of China (English)

    黄瑾娉; 陶杰

    2015-01-01

    中文语句的语义表达复杂,使用单一分类器进行主客观句分类的效果一般. 该文提出一种基于Adaboost算法进行主客观句分类的方法. 首先介绍了主客观分类的研究现状及一般流程;然后引入Adaboost集成学习算法,并针对算法的退化现象进行了相关的改进;最后在实验中使用了词汇线索特征和2-POS特征作为输入对短文本进行分类,结果表明Adaboost在主客观分类应用中效果良好.%Because the semantic expression of Chinese sentences is complex, the effect is not good to classify subjective and objective sentences by using a single classifier. In this paper, a method based on Adaboost algorithm is proposed for the classification of subjec-tive and objective sentences. The research status and general process of the subjective and objective classification are introduced. Then Adaboost algorithm is introduced and some related improvements are done to avoid the degeneration phenomenon. Finally, lexical fea-ture and 2-POS feature are taken as an input to classify short texts. The experimental results show that Adaboost has a good effect in the application of the subjective and objective classification.

  11. Application of classification algorithm in scientific research management system data mining%分类算法在科研管理系统数据挖掘中的应用

    Institute of Scientific and Technical Information of China (English)

    李景民

    2016-01-01

    分析了科研管理系统自动分类的现状,指明了网页自动分类当前主要采用的是文本自动分类的方法;确定了在分类时的重点和难点问题;提出了一种新型的分类算法,根据实际应用情况将KNN算法和Rocchio算法有机结合,应用了一种Rocchio-KNN分类算法,经实际应用证明,该方法不仅保证了一定的分类准确率,而且还可以提高分类效率。%This paper analyzes the status of research management system automatic classification, indicates the current web page automatic classification method is mainly used in automatic text classification;determines the focus and difficulty in classification;proposes a new classification algorithm, and a Rocchio-KNN classification algorithm was applied whichcan not only guarantee the certain classification accuracy, but also improve the classification efficiency.

  12. A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species

    Directory of Open Access Journals (Sweden)

    Goodacre Royston

    2011-01-01

    Full Text Available Abstract Background The rapid identification of Bacillus spores and bacterial identification are paramount because of their implications in food poisoning, pathogenesis and their use as potential biowarfare agents. Many automated analytical techniques such as Curie-point pyrolysis mass spectrometry (Py-MS have been used to identify bacterial spores giving use to large amounts of analytical data. This high number of features makes interpretation of the data extremely difficult We analysed Py-MS data from 36 different strains of aerobic endospore-forming bacteria encompassing seven different species. These bacteria were grown axenically on nutrient agar and vegetative biomass and spores were analyzed by Curie-point Py-MS. Results We develop a novel genetic algorithm-Bayesian network algorithm that accurately identifies sand selects a small subset of key relevant mass spectra (biomarkers to be further analysed. Once identified, this subset of relevant biomarkers was then used to identify Bacillus spores successfully and to identify Bacillus species via a Bayesian network model specifically built for this reduced set of features. Conclusions This final compact Bayesian network classification model is parsimonious, computationally fast to run and its graphical visualization allows easy interpretation of the probabilistic relationships among selected biomarkers. In addition, we compare the features selected by the genetic algorithm-Bayesian network approach with the features selected by partial least squares-discriminant analysis (PLS-DA. The classification accuracy results show that the set of features selected by the GA-BN is far superior to PLS-DA.

  13. 基于误差模型的混合分类算法%Error-based Hybrid Classification Algorithm

    Institute of Scientific and Technical Information of China (English)

    丛雪燕

    2014-01-01

    A new error-based approach of hybrid classification is presented , when data sets with binary objective variables are classified and it could increase the accuracy of classification .The paper also uses data sets to test the proposed approach and compares with the single classification .The results show that this method greatly improve the property , especially when it is pre-dicted by two methods and the rate of variance is higher , this hybrid approach had demonstrated impressive capacities to improve the prediction accuracy .%针对目标变量为二进制的数据集合进行分类,提出一种新的基于误差模型的混合分类方法,可以提高分类的精度。采用实际数据集作为测试数据,结果表明本文提出的算法性能优于其他的混合算法以及现有的单一使用的分类方法,尤其是当2种方法预测不一致的比率较高时,利用该方法能够显著地改善预测的准确性。

  14. Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.

    Science.gov (United States)

    Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H

    2017-03-01

    For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type.

  15. Research and implementation of classification parallel algorithm based on GPU%基于GPU的分类并行算法的研究与实现

    Institute of Scientific and Technical Information of China (English)

    王坤

    2014-01-01

    This paper describes the method for webpage classification,and the possibility of implementing parallel computing KNN algorithms was analyzed on GPU,the scheme Is presented by using CUDA KNN algorithm implementation . after studying Mechanism of GPU storage access, then a method of avoiding bank conflict was proposed through the reasonable data designing and improving on algorithms.the result show that this implementation can more effectively increase the KNN performance and have a good speedup.%分析了KNN算法在GPU上实现并行计算的可能性,提出了通过使用CUDA实现KNN算法的方案,在研究了GPU对存储访问的机制后,通过设计合理的数据以及对算法的改进,避免存储体冲突的产生,提高了算法的健壮性。研究结果证明该方法在GPU上的并行运算速度明显要快于CPU,有着很好的加速比。

  16. An evaluation of ISOCLS and CLASSY clustering algorithms for forest classification in northern Idaho. [Elk River quadrange of the Clearwater National Forest

    Science.gov (United States)

    Werth, L. F. (Principal Investigator)

    1981-01-01

    Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.

  17. Visual-Based Clothing Attribute Classification Algorithm%基于视觉的服装属性分类算法

    Institute of Scientific and Technical Information of China (English)

    刘聪; 丁贵广

    2016-01-01

    提出了一种服装图像属性分类算法 .针对服装图像噪声多的问题 ,采用人体部位检测技术定位服装关键部位并去除冗余信息 ,提高了属性分类的准确率 .并提出了一种基于人体骨架与皮肤的特征提取算法 ,以较少的维数表达衣型特点 ,显著加快相关属性的分类速度 .针对服装属性语义复杂、需求多样化的问题 ,为不同的属性构建了不同的SVM决策树模型 ,从而提高分类效率 ,并同时满足粗、细粒度的服装分类需求 .实验结果验证了该方法在多种服装属性分类任务上的有效性 .%We propose an algorithm for classifying clothing image attributes .To handle the noise in clothing images , key parts of clothing are located by a well-trained human part detector ,and redundant information is eliminated ,by which means the accuracy of clothing attribute classification is improved .Additionally ,a novel feature descriptor based on human skeleton and skin is also proposed . This descriptor describes clothing feature with fewer dimensions ,which significantly speeds up classifiers of related attributes .To deal with the complex semantic of clothing attributes ,different SVM Decision Tree models are built for different attributes ,which improves the efficiency of classification and achieves the objective of both coarse-grained and fine-grained classification . Experiments demonstrate the effectiveness of the proposed algorithm on multiple clothing attribute classification tasks .

  18. Pap Smear Diagnosis Using a Hybrid Intelligent Scheme Focusing on Genetic Algorithm Based Feature Selection and Nearest Neighbor Classification

    DEFF Research Database (Denmark)

    Marinakis, Yannis; Dounias, Georgios; Jantzen, Jan

    2009-01-01

    The term pap-smear refers to samples of human cells stained by the so-called Papanicolaou method. The purpose of the Papanicolaou method is to diagnose pre-cancerous cell changes before they progress to invasive carcinoma. In this paper a metaheuristic algorithm is proposed in order to classify t...... other previously applied intelligent approaches....

  19. Classification based on CART algorithm for microarray data of lung cancer%基于CART算法的肺癌微阵列数据的分类

    Institute of Scientific and Technical Information of China (English)

    陈磊; 刘毅慧

    2011-01-01

    基因芯片技术是基因组学中的重要研究工具.而基因芯片数据(微阵列数据)往往是高维的,使得降维成为微阵列数据分析中的一个必要步骤.本文对美国哈佛医学院G.J.Gordon等人提供的肺癌微阵列数据进行分析.通过t-test,Wilcox-on秩和检测分别提取微阵列数据特征属性,后根据CART(Classification and Regression Tree)算法,以Gini差异性指标作为误差函数,用提取的特征属性广延的构造分类树;再进行剪枝找到最优规模的树,目的是提高树的泛化性能使得能很好适应新的预测数据.实验证明:该方法对肺癌微阵列数据分类识别率达到96%以上,且很稳定;并可以得到人们容易理解的分类规则和分类关键基因.%The gene chip technology is a significant tool in the genomics research. But the gene chip data ( microar-ray data) is often high -dimensional, make the dimensionality reduction a necessary step. In this paper, the mi-croarray data of lung cancer we analyze that provided by Gavin J. Gordon ect. From the Harvard Medical School. Firstly, t - test, Wilcoxon rank - sum test methods are used for feature selection to reduce the dimensionality of mi-croarray data; then according to CART (Classification and Regression Tree) algorithm, take Gini index as the error function, with the feature attributes fitting an extension to the classification tree, find the optimal size of the tree by pruning, improve the generalization performance of the tree to perfectly adapt to the new samples. Experimental results show; the recognition rate can be up to over 96% for lung cancer microarray data classification using our method, and is very stable; also discovery of significant rules which can be understand easily and key genes information for classification.

  20. Dual-energy cone-beam CT with a flat-panel detector: Effect of reconstruction algorithm on material classification

    Energy Technology Data Exchange (ETDEWEB)

    Zbijewski, W., E-mail: wzbijewski@jhu.edu; Gang, G. J.; Xu, J.; Wang, A. S.; Stayman, J. W. [Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21205 (United States); Taguchi, K.; Carrino, J. A. [Russell H. Morgan Department of Radiology, Johns Hopkins University, Baltimore, Maryland 21205 (United States); Siewerdsen, J. H. [Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21205 and Russell H. Morgan Department of Radiology, Johns Hopkins University, Baltimore, Maryland 21205 (United States)

    2014-02-15

    Purpose: Cone-beam CT (CBCT) with a flat-panel detector (FPD) is finding application in areas such as breast and musculoskeletal imaging, where dual-energy (DE) capabilities offer potential benefit. The authors investigate the accuracy of material classification in DE CBCT using filtered backprojection (FBP) and penalized likelihood (PL) reconstruction and optimize contrast-enhanced DE CBCT of the joints as a function of dose, material concentration, and detail size. Methods: Phantoms consisting of a 15 cm diameter water cylinder with solid calcium inserts (50–200 mg/ml, 3–28.4 mm diameter) and solid iodine inserts (2–10 mg/ml, 3–28.4 mm diameter), as well as a cadaveric knee with intra-articular injection of iodine were imaged on a CBCT bench with a Varian 4343 FPD. The low energy (LE) beam was 70 kVp (+0.2 mm Cu), and the high energy (HE) beam was 120 kVp (+0.2 mm Cu, +0.5 mm Ag). Total dose (LE+HE) was varied from 3.1 to 15.6 mGy with equal dose allocation. Image-based DE classification involved a nearest distance classifier in the space of LE versus HE attenuation values. Recognizing the differences in noise between LE and HE beams, the LE and HE data were differentially filtered (in FBP) or regularized (in PL). Both a quadratic (PLQ) and a total-variation penalty (PLTV) were investigated for PL. The performance of DE CBCT material discrimination was quantified in terms of voxelwise specificity, sensitivity, and accuracy. Results: Noise in the HE image was primarily responsible for classification errors within the contrast inserts, whereas noise in the LE image mainly influenced classification in the surrounding water. For inserts of diameter 28.4 mm, DE CBCT reconstructions were optimized to maximize the total combined accuracy across the range of calcium and iodine concentrations, yielding values of ∼88% for FBP and PLQ, and ∼95% for PLTV at 3.1 mGy total dose, increasing to ∼95% for FBP and PLQ, and ∼98% for PLTV at 15.6 mGy total dose. For a

  1. Application of full spectral matching algorithm in apple classification%全光谱匹配算法在苹果分类识别中的应用

    Institute of Scientific and Technical Information of China (English)

    周万怀; 谢丽娟; 应义斌

    2013-01-01

    while 1 means the raw spectrum is ascending in the according small region. Different from common full spectral matching algorithms, the new proposed one calculates the similarity between different spectra with a spectral waveform but not with the absorbance or reflectance directly. Therefore, the influence of absolute absorbance or reflectance intensity was reduced and the influence of the similarity of the spectral waveform was enhanced. This mean that what substances are contained in the sample is more important than the contents of these substances. In this way, the influence of noise and the differences caused by different spectral collecting areas of solid samples was reduced to a quite low level. Comparisons among common full spectral matching algorithms and our new proposed algorithm have been carried out, and the results showed that 94.5% of the samples were correctly classified by our new proposed algorithm (4 varieties of apples, each number was 100) and the second highest classification accuracy was 73%obtained with a Euclidean distance (ED) method. This conclusion indicated that the proposed algorithm was more suitable for the classification of different kinds of samples and it would be helpful to reduce the database query scope, shorten the time consuming, and improve the accuracy of the data query. From the principle of this algorithm, it was obvious that it must be affected by the interval among the data points of the spectra. Thus, the effect of spectral resolution on the proposed algorithm was studied. In total, seven different resolutions (2~128 cm-1) were tested. It is a pity that our new proposed algorithm is sensitive to spectral resolution and the optimal resolution for this algorithm approximately is 8 or 16 cm-1 for apples’ near infrared spectra. Therefore, the optimal resolution of this algorithm should be determined at first when it is used for the spectral matching of new objects. In short, our proposed spectral matching algorithm can

  2. Development of a deterministic downscaling algorithm for remote sensing soil moisture footprint using soil and vegetation classifications

    Science.gov (United States)

    Shin, Yongchul; Mohanty, Binayak P.

    2013-10-01

    Soil moisture (SM) at the local scale is required to account for small-scale spatial heterogeneity of land surface because many hydrological processes manifest at scales ranging from cm to km. Although remote sensing (RS) platforms provide large-scale soil moisture dynamics, scale discrepancy between observation scale (e.g., approximately several kilometers) and modeling scale (e.g., few hundred meters) leads to uncertainties in the performance of land surface hydrologic models. To overcome this drawback, we developed a new deterministic downscaling algorithm (DDA) for estimating fine-scale soil moisture with pixel-based RS soil moisture and evapotranspiration (ET) products using a genetic algorithm. This approach was evaluated under various synthetic and field experiments (Little Washita-LW 13 and 21, Oklahoma) conditions including homogeneous and heterogeneous land surface conditions composed of different soil textures and vegetations. Our algorithm is based on determining effective soil hydraulic properties for different subpixels within a RS pixel and estimating the long-term soil moisture dynamics of individual subpixels using the hydrological model with the extracted soil hydraulic parameters. The soil moisture dynamics of subpixels from synthetic experiments matched well with the observations under heterogeneous land surface condition, although uncertainties (Mean Bias Error, MBE: -0.073 to -0.049) exist. Field experiments have typically more variations due to weather conditions, measurement errors, unknown bottom boundary conditions, and scale discrepancy between remote sensing pixel and model grid resolution. However, the soil moisture estimates of individual subpixels (from the airborne Electronically Scanned Thinned Array Radiometer (ESTAR) footprints of 800 m × 800 m) downscaled by this approach matched well (R: 0.724 to -0.914, MBE: -0.203 to -0.169 for the LW 13; R: 0.343-0.865, MBE: -0.165 to -0.122 for the LW 21) with the in situ local scale soil

  3. A New Classification Method to Overcome Over-Branching

    Institute of Scientific and Technical Information of China (English)

    ZHOU Aoying(周傲英); QIAN Weining(钱卫宁); QIAN Hailei(钱海蕾); JIN Wen(金文)

    2002-01-01

    Classification is an important technique in data mining. The decision trees built by most of the existing classification algorithms commonly feature over-branching, which will lead to poor efficiency in the subsequent classification period. In this paper, we present a new value-oriented classification method, which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible, based on the concepts of frequentpattern-node and exceptive-child-node. The experiments show that while using relevant analysis as pre-processing, our classification method, without loss of accuracy, can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do.

  4. ASSESSMENT OF THE SFIM ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    XU Han-qiu

    2004-01-01

    Fusion of images with different spatial and spectral resolutions can improve the visualization of the images. Many fusion techniques have been developed to improve the spectral fidelity and/or spatial texture quality of fused imagery. Of them, a recently proposed algorithm, the SF1M (Smoothing Filter-based Intensity Modulation), is known for its high spectral fidelity and simplicity. However, the study and evaluation of the algorithm were only based on spectral and spatial criteria. Therefore, this paper aims to further study the classification accuracy of the SFIM-fused imagery. Three other simple fusion algorithms, High-Pass Filter (HPF), Multiplication (MLT), and Modified Brovey (MB), have been employed for further evaluation of the SFIM. The study is based on a Landsat-7 ETM+sub-scene covering the urban fringe of southeastern Fuzhou City of China. The effectiveness of the algorithm has been evaluated on the basis of spectral fidelity, high spatial frequency information absorption, and classification accuracy.The study reveals that the difference in smoothing filter kernel sizes used in producing the SFIM-fused images can affect the classification accuracy. Compared with three other algorithms, the SFIM transform is the best method in retaining spectral information of the original image and in getting best classification results.

  5. Tissue Classification

    DEFF Research Database (Denmark)

    Van Leemput, Koen; Puonti, Oula

    2015-01-01

    Computational methods for automatically segmenting magnetic resonance images of the brain have seen tremendous advances in recent years. So-called tissue classification techniques, aimed at extracting the three main brain tissue classes (white matter, gray matter, and cerebrospinal fluid), are now...... well established. In their simplest form, these methods classify voxels independently based on their intensity alone, although much more sophisticated models are typically used in practice. This article aims to give an overview of often-used computational techniques for brain tissue classification...

  6. Two Linear Unmixing Algorithms to Recognize Targets Using Supervised Classification and Orthogonal Rotation in Airborne Hyperspectral Images

    Directory of Open Access Journals (Sweden)

    Michael Zheludev

    2012-02-01

    Full Text Available The goal of the paper is to detect pixels that contain targets of known spectra. The target can be present in a sub- or above pixel. Pixels without targets are classified as background pixels. Each pixel is treated via the content of its neighborhood. A pixel whose spectrum is different from its neighborhood is classified as a “suspicious point”. In each suspicious point there is a mix of target(s and background. The main objective in a supervised detection (also called “target detection” is to search for a specific given spectral material (target in hyperspectral imaging (HSI where the spectral signature of the target is known a priori from laboratory measurements. In addition, the fractional abundance of the target is computed. To achieve this we present two linear unmixing algorithms that recognize targets with known (given spectral signatures. The CLUN is based on automatic feature extraction from the target’s spectrum. These features separate the target from the background. The ROTU algorithm is based on embedding the spectra space into a special space by random orthogonal transformation and on the statistical properties of the embedded result. Experimental results demonstrate that the targets’ locations were extracted correctly and these algorithms are robust and efficient.

  7. KNN Text Classification Algorithm Based on Chaotic Binary Particle Swarm Optimization%基于混沌二进制粒子群优化的KNN文本分类算法

    Institute of Scientific and Technical Information of China (English)

    徐辉

    2012-01-01

    中文文本分类的主要问题是特征空间的高维性.提出了基于混沌二进制粒子群的KNN文本分类算法,利用混沌二进制粒子群算法遍历训练集的特征空间,选择特征子空间,然后在特征子空间中使用KNN算法进行文本分类.在粒子群的迭代优化过程中,利用混沌映射,指导群体进行混沌搜索,使算法摆脱局部最优,扩大寻找全局最优解的能力.实验结果表明,提出的新分类算法对中文文本分类是有效的,其分类准确率、召回率都优于KNN算法.%The main problem of Chinese text classification is the high dimenmonat teature space particle swarm optimization, KNN text classification algorithm is proposed. It uses chaotic particle swarm algorithm to traverse feature space of the training set, selects the feature subspace, and then it uses KNN algorithm to classify text in feature subspace. In particle swarm' s iterative process, It uses chaotic map to guide swarms for chaotic search,it makes the algorithm out of local optimum, and expands the ability of finding global optimal solution. Experimental results show that the proposed new classification algorithm for Chinese text classification is effective, the classification accuracy and recall are better than KNN algorithm.

  8. Automated classification of seismic sources in large database using random forest algorithm: First results at Piton de la Fournaise volcano (La Réunion).

    Science.gov (United States)

    Hibert, Clément; Provost, Floriane; Malet, Jean-Philippe; Stumpf, André; Maggi, Alessia; Ferrazzini, Valérie

    2016-04-01

    In the past decades the increasing quality of seismic sensors and capability to transfer remotely large quantity of data led to a fast densification of local, regional and global seismic networks for near real-time monitoring. This technological advance permits the use of seismology to document geological and natural/anthropogenic processes (volcanoes, ice-calving, landslides, snow and rock avalanches, geothermal fields), but also led to an ever-growing quantity of seismic data. This wealth of seismic data makes the construction of complete seismicity catalogs, that include earthquakes but also other sources of seismic waves, more challenging and very time-consuming as this critical pre-processing stage is classically done by human operators. To overcome this issue, the development of automatic methods for the processing of continuous seismic data appears to be a necessity. The classification algorithm should satisfy the need of a method that is robust, precise and versatile enough to be deployed to monitor the seismicity in very different contexts. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forests classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied

  9. Algorithm for Chinese short-text classification using concept description%使用概念描述的中文短文本分类算法

    Institute of Scientific and Technical Information of China (English)

    杨天平; 朱征宇

    2012-01-01

    In order to solve the problem that traditional classification is not very satisfactory due to fewer text features in short text, an algorithm using concept description was presented. At first, a global semantic concept word list was built. Then the test set and training set were conceptualized by the global semantic concept word list to combine the test short texts by the same description of concept in the training set, and at the same time, training long texts were combined by the training short texts in the training set. At last, the long text was classified by traditional classification algorithm. The experiments show that the proposed method could mine implicit semantic information in short text efficiently while expanding short text on semantics adequately, and improving the accuracy of short text classification.%针对短文本特征较少而导致使用传统文本分类算法进行分类效果并不理想的问题,提出了一种使用了概念描述的短文本分类算法,该方法首先构建出全局的语义概念词表;然后,使用概念词表分别对预测短文本和训练短文本概念化描述,使得预测短文本在训练集中找出拥有相似概念描述的训练短文本组合成预测长文本,同时将训练集内部的短文本也进行自组合形成训练长文本;最后,再使用传统的长文本分类算法进行分类.实验证明,该方法能够有效挖掘短文本内部隐含的语义信息,充分对短文本进行语义扩展,提高了短文本分类的准确度.

  10. Development of visible/infrared/microwave agriculture classification and biomass estimation algorithms. [Guyton, Oklahoma and Dalhart, Texas

    Science.gov (United States)

    Rosenthal, W. D.; Mcfarland, M. J.; Theis, S. W.; Jones, C. L. (Principal Investigator)

    1982-01-01

    Agricultural crop classification models using two or more spectral regions (visible through microwave) are considered in an effort to estimate biomass at Guymon, Oklahoma Dalhart, Texas. Both grounds truth and aerial data were used. Results indicate that inclusion of C, L, and P band active microwave data, from look angles greater than 35 deg from nadir, with visible and infrared data improve crop discrimination and biomass estimates compared to results using only visible and infrared data. The microwave frequencies were sensitive to different biomass levels. The K and C band were sensitive to differences at low biomass levels, while P band was sensitive to differences at high biomass levels. Two indices, one using only active microwave data and the other using data from the middle and near infrared bands, were well correlated to total biomass. It is implied that inclusion of active microwave sensors with visible and infrared sensors on future satellites could aid in crop discrimination and biomass estimation.

  11. Classification of bladder cancer cell lines using Raman spectroscopy: a comparison of excitation wavelength, sample substrate and statistical algorithms

    Science.gov (United States)

    Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.

    2014-05-01

    Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.

  12. Development of visible/infrared/microwave agriculture classification and biomass estimation algorithms, volume 2. [Oklahoma and Texas

    Science.gov (United States)

    Rosenthal, W. D.; Mcfarland, M. J.; Theis, S. W.; Jones, C. L. (Principal Investigator)

    1982-01-01

    Agricultural crop classification models using two or more spectral regions (visible through microwave) were developed and tested and biomass was estimated by including microwave with visible and infrared data. The study was conducted at Guymon, Oklahoma and Dalhart, Texas utilizing aircraft multispectral data and ground truth soil moisture and biomass information. Results indicate that inclusion of C, L, and P band active microwave data from look angles greater than 35 deg from nadir with visible and infrared data improved crop discrimination and biomass estimates compared to results using only visible and infrared data. The active microwave frequencies were sensitive to different biomass levels. In addition, two indices, one using only active microwave data and the other using data from the middle and near infrared bands, were well correlated to total biomass.

  13. Automated classifications of topography from DEMs by an unsupervised nested-means algorithm and a three-part geometric signature

    Science.gov (United States)

    Iwahashi, J.; Pike, R.J.

    2007-01-01

    An iterative procedure that implements the classification of continuous topography as a problem in digital image-processing automatically divides an area into categories of surface form; three taxonomic criteria-slope gradient, local convexity, and surface texture-are calculated from a square-grid digital elevation model (DEM). The sequence of programmed operations combines twofold-partitioned maps of the three variables converted to greyscale images, using the mean of each variable as the dividing threshold. To subdivide increasingly subtle topography, grid cells sloping at less than mean gradient of the input DEM are classified by designating mean values of successively lower-sloping subsets of the study area (nested means) as taxonomic thresholds, thereby increasing the number of output categories from the minimum 8 to 12 or 16. Program output is exemplified by 16 topographic types for the world at 1-km spatial resolution (SRTM30 data), the Japanese Islands at 270??m, and part of Hokkaido at 55??m. Because the procedure is unsupervised and reflects frequency distributions of the input variables rather than pre-set criteria, the resulting classes are undefined and must be calibrated empirically by subsequent analysis. Maps of the example classifications reflect physiographic regions, geological structure, and landform as well as slope materials and processes; fine-textured terrain categories tend to correlate with erosional topography or older surfaces, coarse-textured classes with areas of little dissection. In Japan the resulting classes approximate landform types mapped from airphoto analysis, while in the Americas they create map patterns resembling Hammond's terrain types or surface-form classes; SRTM30 output for the United States compares favorably with Fenneman's physical divisions. Experiments are suggested for further developing the method; the Arc/Info AML and the map of terrain classes for the world are available as online downloads. ?? 2006 Elsevier

  14. Identification of Data Fragment Classification Algorithm Based on PCA-LDA and KNN-SMO%基于 PCA-LDA 和 KNN-SMO 的数据碎片分类识别算法

    Institute of Scientific and Technical Information of China (English)

    傅德胜; 经正俊

    2015-01-01

    在计算机取证领域,数据碎片的取证分析已成为获取数字证据的一种重要手段。本文针对取证中数据碎片的取证问题提出了一种新的基于内容特征的数据碎片类型识别算法,该方法首先对数据碎片进行分块主成分分析PCA 后,对 PCA 特征向量进行线性鉴别分析 LDA 获取组合特征向量,然后利用 K 最邻近 KNN 算法和序列最小优化SMO 算法组成融合分类器,运用获取的组合特征向量对数据碎片进行分类识别。实验表明,该算法与其他相关算法相比,具有较高的识别准确率和识别速率,取得了良好的识别效果。%In the computer forensics field, the forensic analysis of data fragment has become an important means to obtain digital evidence. Aiming at the problem of data fragment forensics, this paper proposes a novel algorithm of data classification identification based on the content feature. Firstly, it makes principal component analysis (PCA) of each blocks in the data fragment; secondly, it makes linear discriminant analysis (LDA) of each PCA feature vector so as to get the combinational feature vector; finally, the author identifies the type of data fragment with the combinational fea-ture vector by using the fusion classifier of k nearest neighbor (KNN) algorithm and sequential minimal optimization algorithm (SMO). Experimental results have shown that compared with the related algorithms the proposed algorithm has better identification accuracy and identification rate which achieves better identification results.

  15. An Efficient, Scalable Time-Frequency Method for Tracking Energy Usage of Domestic Appliances Using a Two-Step Classification Algorithm

    Directory of Open Access Journals (Sweden)

    Paula Meehan

    2014-10-01

    Full Text Available Load monitoring is the practice of measuring electrical signals in a domestic environment in order to identify which electrical appliances are consuming power. One reason for developing a load monitoring system is to reduce power consumption by increasing consumers’ awareness of which appliances consume the most energy. Another example of an application of load monitoring is activity sensing in the home for the provision of healthcare services. This paper outlines the development of a load disaggregation method that measures the aggregate electrical signals of a domestic environment and extracts features to identify each power consuming appliance. A single sensor is deployed at the main incoming power point, to sample the aggregate current signal. The method senses when an appliance switches ON or OFF and uses a two-step classification algorithm to identify which appliance has caused the event. Parameters from the current in the temporal and frequency domains are used as features to define each appliance. These parameters are the steady-state current harmonics and the rate of change of the transient signal. Each appliance’s electrical characteristics are distinguishable using these parameters. There are three Types of loads that an appliance can fall into, linear nonreactive, linear reactive or nonlinear reactive. It has been found that by identifying the load type first and then using a second classifier to identify individual appliances within these Types, the overall accuracy of the identification algorithm is improved.

  16. 基于K-均值聚类的小样本集KNN分类算法%KNN CLASSIFICATION ALGORITHM FOR SMALL SAMPLE SETS BASED ON K-MEANS CLUSTERING

    Institute of Scientific and Technical Information of China (English)

    刘应东; 牛惠民

    2011-01-01

    When KNN and its improved algorithms are performing classification, it always influences the final classification accuracy because of either too dense or too few the samples or too large the density differences among various kinds of samples. The paper proposes a small sample set KNN classification algorithm based on clustering technology. A new sample set is generated through clustering and editing which contains various kinds of samples with close densities. That new sample set is used to classify and label data objects whose classification and label numbers are unknown. Tests by standard data sets reveal that the algorithm can improve KNN classification accuracy and obtain satisfactory results.%KNN及其改进算法进行分类时,如样本集中、样本过少或各类样本的密度差异较大,都将会影响最后的分类精度.提出一种基于聚类技术的小样本集KNN分类算法.通过聚类和剪理,形成各类的样本密度接近的新的样本集,并利用该新样本集对类标号未知数据对象进行类别标识.通过使用标准数据集的测试,发现该算法能够提高KNN的分类精度,取得了较满意的结果.

  17. 基于随机森林算法的B2B客户分级系统的设计%Design of B2B client classification system based on random forest algorithm

    Institute of Scientific and Technical Information of China (English)

    李军

    2015-01-01

    对分类数据挖掘算法进行研究,发现随机森林算法精度高、训练速度快、支持在线学习,因此提出在系统中使用该算法。针对随机森林算法抗噪声能力一般的问题,采用Bagging方法随机选择几组历史客户分级数据作为算法的训练数据,通过随机森林算法训练出分级模型,并通过这个模型对新客户数据进行自动分级。%The classification data mining algorithm is studied in this paper. The random forest algorithm has the advantages of high precision,fast training speed and supporting online learning,which is applied in classification system. Since random forest algorithm has general noise resisted ability,several groups classification data of history client are selected by using Bagging method randomly as the algorithm′s training data. The classification model is obtained by random forest algorithm training. New client data are classified automatically by using this model.

  18. Application of the Honeybee Mating Optimization Algorithm to Patent Document Classification in Combination with the Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Chui-Yu Chiu

    2013-08-01

    Full Text Available Patent rights have the property of exclusiveness. Inventors can protect their rights in the legal range and have monopoly for their open inventions. People are not allowed to use an invention before the inventors permit them to use it. Companies try to avoid the research and development investment in inventions that have been protected by patent. Patent retrieval and categorization technologies are used to uncover patent information to reduce the cost of torts. In this research, we propose a novel method which integrates the Honey-Bee Mating Optimization algorithm with Support Vector Machines for patent categorization. First, the CKIP method is utilized to extract phrases of the patent summary and title. Then we calculate the probability that a specific key phrase contains a certain concept based on Term Frequency - Inverse Document Frequency (TF-IDF methods. By combining frequencies and the probabilities of key phases generated by using the Honey-Bee Mating Optimization algorithm, our proposed method is expected to obtain better representative input values for the SVM model. Finally, this research uses patents from Chemical Mechanical Polishing (CMP as case examples to illustrate and demonstrate the superior results produced by the proposed methodology.

  19. ESVM Algorithm in Transfer Learning Data Classification%迁移学习数据分类中的ESVM算法

    Institute of Scientific and Technical Information of China (English)

    张建军; 王士同; 王骏

    2012-01-01

    在迁移学习中对变化后的数据集进行分类时,噪音导致分类结果不合理.为此,提出一种迁移学习数据分类中的扩展支持向量机(ESVM)算法.使用变化前数据集的概率分布信息及学习经验,指导缓慢变化后的数据集进行分类,使分割面既可以准确分割现有数据集,同时也保留原先数据集的一些属性.实验结果表明,该算法具有一定的抗噪性能.%In transfer learning process, noise makes the result unreasonable when you classify slow changing dataset. Here is an algorithm called Extended Support Vector Machine(ESVM) proposed to solve this problem. Because it makes full use of probability distribution of original data and uses the learning experience of the previous dataset to classify the latter dataset, ES VM can correctly classify the changing dataset with inheriting the characteristics from the previous dataset. Experimental result shows the antinoise performance of the algorithm.

  20. Classification of Non-Small Cell Lung Cancer Using Significance Analysis of Microarray-Gene Set Reduction Algorithm

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    2016-01-01

    Full Text Available Among non-small cell lung cancer (NSCLC, adenocarcinoma (AC, and squamous cell carcinoma (SCC are two major histology subtypes, accounting for roughly 40% and 30% of all lung cancer cases, respectively. Since AC and SCC differ in their cell of origin, location within the lung, and growth pattern, they are considered as distinct diseases. Gene expression signatures have been demonstrated to be an effective tool for distinguishing AC and SCC. Gene set analysis is regarded as irrelevant to the identification of gene expression signatures. Nevertheless, we found that one specific gene set analysis method, significance analysis of microarray-gene set reduction (SAMGSR, can be adopted directly to select relevant features and to construct gene expression signatures. In this study, we applied SAMGSR to a NSCLC gene expression dataset. When compared with several novel feature selection algorithms, for example, LASSO, SAMGSR has equivalent or better performance in terms of predictive ability and model parsimony. Therefore, SAMGSR is a feature selection algorithm, indeed. Additionally, we applied SAMGSR to AC and SCC subtypes separately to discriminate their respective stages, that is, stage II versus stage I. Few overlaps between these two resulting gene signatures illustrate that AC and SCC are technically distinct diseases. Therefore, stratified analyses on subtypes are recommended when diagnostic or prognostic signatures of these two NSCLC subtypes are constructed.

  1. Classification of heavy metal ions present in multi-frequency multi-electrode potable water data using evolutionary algorithm

    Science.gov (United States)

    Karkra, Rashmi; Kumar, Prashant; Bansod, Baban K. S.; Bagchi, Sudeshna; Sharma, Pooja; Krishna, C. Rama

    2016-12-01

    Access to potable water for the common people is one of the most challenging tasks in the present era. Contamination of drinking water has become a serious problem due to various anthropogenic and geogenic events. The paper demonstrates the application of evolutionary algorithms, viz., particle swan optimization and genetic algorithm to 24 water samples containing eight different heavy metal ions (Cd, Cu, Co, Pb, Zn, Ar, Cr and Ni) for the optimal estimation of electrode and frequency to classify the heavy metal ions. The work has been carried out on multi-variate data, viz., single electrode multi-frequency, single frequency multi-electrode and multi-frequency multi-electrode water samples. The electrodes used are platinum, gold, silver nanoparticles and glassy carbon electrodes. Various hazardous metal ions present in the water samples have been optimally classified and validated by the application of Davis Bouldin index. Such studies are useful in the segregation of hazardous heavy metal ions found in water resources, thereby quantifying the degree of water quality.

  2. Audio Classification from Time-Frequency Texture

    CERN Document Server

    Yu, Guoshen

    2008-01-01

    Time-frequency representations of audio signals often resemble texture images. This paper derives a simple audio classification algorithm based on treating sound spectrograms as texture images. The algorithm is inspired by an earlier visual classification scheme particularly efficient at classifying textures. While solely based on time-frequency texture features, the algorithm achieves surprisingly good performance in musical instrument classification experiments.

  3. 基于朴素贝叶斯与ID3算法的决策树分类%Decision Tree Classification Based on Naive Bayesian and ID3 Algorithm

    Institute of Scientific and Technical Information of China (English)

    黄宇达; 王迤冉

    2012-01-01

    在朴素贝叶斯算法和ID3算法的基础上,提出一种改进的决策树分类算法.引入客观属性重要度参数,给出弱化的朴素贝叶斯条件独立性假设,并采用加权独立信息熵作为分类属性的选取标准.理论分析和实验结果表明,改进算法能在一定程度上克服ID3算法的多值偏向问题,并且具有较高的执行效率和分类准确度.%This paper proposes an improved decision tree classification algorithm based on naive Bayes algorithm and ID3 algorithm. It introduces objective attribute importance parameter, gives a kind of conditional independence assumption that is weaker than naive Bayesian algorithm, and uses the weighted independent information entropy as splitting attribute's selection criteria. Theoretical analysis and experimental results show that the improved algorithm, to a certain extent well overcomes ID3 algorithm's shortcoming of multi-value tendency, and improves algorithm's implementation efficiency and classification accuracy.

  4. A Comparison of Machine Learning Algorithms for Chemical Toxicity Classification Using a Simulated Multi-Scale Data Model

    Science.gov (United States)

    Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in hig...

  5. Towards automatic classification of all WISE sources

    CERN Document Server

    Kurcz, Agnieszka; Solarz, Aleksandra; Krupa, Magdalena; Pollo, Agnieszka; Małek, Katarzyna

    2016-01-01

    The WISE satellite has detected hundreds of millions sources over the entire sky. Classifying them reliably is however a challenging task due to degeneracies in WISE multicolour space and low levels of detection in its two longest-wavelength bandpasses. Here we aim at obtaining comprehensive and reliable star, galaxy and quasar catalogues based on automatic source classification in full-sky WISE data. This means that the final classification will employ only parameters available from WISE itself, in particular those reliably measured for a majority of sources. For the automatic classification we applied the support vector machines (SVM) algorithm, which requires a training sample with relevant classes already identified, and we chose to use the SDSS spectroscopic dataset for that purpose. By calibrating the classifier on the test data drawn from SDSS, we first established that a polynomial kernel is preferred over a radial one for this particular dataset. Next, using three classification parameters (W1 magnit...

  6. A Scalable IP Packet Classification Algorithm Using Indexed Pointers%一种基于索引指针的可扩展IP包分类算法

    Institute of Scientific and Technical Information of China (English)

    李金库; 马建峰; 张德运

    2012-01-01

    设计并实现了一种基于索引指针的可扩展IP包分类算法.该算法通过分析源/目的端口号和协议类型字段在实际应用中的分布特性,将这3个字段映射到一个8比特元组上,压缩了分类维数;算法依据压缩后的8比特元组将分类规则集划分为256个子集,并为每个子集建立一个索引指针,指向该子集的存贮起始地址;算法通过计算IP包中"源/目的IP地址联合字段"中各个比特的信息熵值,找出最优的比特序列作为根和子节点,为每个规则子集建立一棵Tries查找树,既保证了存贮空间和查找时间最小,而且不存在回溯问题.实验结果证明,该算法分类效率高.%In this paper,a scalable IP packet classification algorithm using indexed pointers has been proposed.According to the distribution of source port,destination port and protocol type fields in the real applications,the algorithm maps the three fields to an eight-bit value and divides the whole rule set into 256 subsets.It assigns each subset an indexed pointer that points to the starting address of its storage space.The algorithm finds the best bit sequence and uses them as root and child nodes by calculating each bit's information entropy value of the combined field of source IP address and destination IP address,then it establishes a Tries lookup tree for each rule subset.By doing so,it requires the least storage space and lookup time without retrospect.The experimental results indicate that the new algorithm is highly efficient.

  7. 基于自适应遗传神经网络的银行客户分类研究%Research on Classification of Bank Customers Based on Adaptive GA-BP Algorithm

    Institute of Scientific and Technical Information of China (English)

    汤亚玲; 黄华; 程泽凯

    2014-01-01

    银行产品的营销行为都是针对广大客户的。若能提前分辨出哪些是优质客户,再为其定制合理的营销策略,那银行就能获得更大的竞争力。文中将遗传算法与BP神经网络结合用于对银行客户分类进而预测客户是否会购买银行产品。该方法有效地克服了BP神经网络容易陷入局部极小值和收敛速度慢的问题,并且针对其中遗传算法的计算时间和精度问题提出了一种新的自适应遗传算法。实验结果表明,基于这种自适应的遗传神经网络的方法用更短的计算时间达到了更高的预测精度,可以准确地为银行客户分类。%The products in bank marketing are faced to the majority of customers. If tell in which are high-quality customers in advance and then develop reasonable marketing strategy for them,bank will be able to achieve greater competitiveness. It combines genetic algo-rithm with BP network for bank customers classification to predict whether the customers will buy the bank marketing products. It can ef-fectively overcome the shortcomings of BP network,such as trapping to the local minimum and slowness in training speed. Aiming at the computation time and accuracy of genetic algorithm,a new adaptive GA-BP algorithm is proposed. Experimental results show that the a-daptive GA-BP algorithm can reach a higher prediction accuracy with a shorter calculation time and it can classify bank customers accu-rately.

  8. An automated sleep-state classification algorithm for quantifying sleep timing and sleep-dependent dynamics of electroencephalographic and cerebral metabolic parameters

    Directory of Open Access Journals (Sweden)

    Rempe MJ

    2015-09-01

    Full Text Available Michael J Rempe,1,2 William C Clegern,2 Jonathan P Wisor2 1Mathematics and Computer Science, Whitworth University, Spokane, WA, USA; 2College of Medical Sciences and Sleep and Performance Research Center, Washington State University, Spokane, WA, USAIntroduction: Rodent sleep research uses electroencephalography (EEG and electromyography (EMG to determine the sleep state of an animal at any given time. EEG and EMG signals, typically sampled at >100 Hz, are segmented arbitrarily into epochs of equal duration (usually 2–10 seconds, and each epoch is scored as wake, slow-wave sleep (SWS, or rapid-eye-movement sleep (REMS, on the basis of visual inspection. Automated state scoring can minimize the burden associated with state and thereby facilitate the use of shorter epoch durations.Methods: We developed a semiautomated state-scoring procedure that uses a combination of principal component analysis and naïve Bayes classification, with the EEG and EMG as inputs. We validated this algorithm against human-scored sleep-state scoring of data from C57BL/6J and BALB/CJ mice. We then applied a general homeostatic model to characterize the state-dependent dynamics of sleep slow-wave activity and cerebral glycolytic flux, measured as lactate concentration.Results: More than 89% of epochs scored as wake or SWS by the human were scored as the same state by the machine, whether scoring in 2-second or 10-second epochs. The majority of epochs scored as REMS by the human were also scored as REMS by the machine. However, of epochs scored as REMS by the human, more than 10% were scored as SWS by the machine and 18 (10-second epochs to 28% (2-second epochs were scored as wake. These biases were not strain-specific, as strain differences in sleep-state timing relative to the light/dark cycle, EEG power spectral profiles, and the homeostatic dynamics of both slow waves and lactate were detected equally effectively with the automated method or the manual scoring

  9. A constructive algorithm to solve "convex recursive deletion" (CoRD) classification problems via two-layer perceptron networks.

    Science.gov (United States)

    Cabrelli, C; Molter, U; Shonkwiler, R

    2000-01-01

    A sufficient condition that a region be classifiable by a two-layer feedforward neural net (a two-layer perceptron) using threshold activation functions is that either it be a convex polytope or that intersected with the complement of a convex polytope in its interior, or that intersected with the complement of a convex polytope in its interior or ... recursively. These have been called convex recursive deletion (CoRD) regions.We give a simple algorithm for finding the weights and thresholds in both layers for a feedforward net that implements such a region. The results of this work help in understanding the relationship between the decision region of a perceptron and its corresponding geometry in input space. Our construction extends in a simple way to the case that the decision region is the disjoint union of CoRD regions (requiring three layers). Therefore this work also helps in understanding how many neurons are needed in the second layer of a general three-layer network. In the event that the decision region of a network is known and is the union of CoRD regions, our results enable the calculation of the weights and thresholds of the implementing network directly and rapidly without the need for thousands of backpropagation iterations.

  10. Classification and uptake of reservoir operation optimization methods

    Science.gov (United States)

    Dobson, Barnaby; Pianosi, Francesca; Wagener, Thorsten

    2016-04-01

    Reservoir operation optimization algorithms aim to improve the quality of reservoir release and transfer decisions. They achieve this by creating and optimizing the reservoir operating policy; a function that returns decisions based on the current system state. A range of mathematical optimization algorithms and techniques has been applied to the reservoir operation problem of policy optimization. In this work, we propose a classification of reservoir optimization approaches by focusing on the formulation of the water management problem rather than the optimization algorithm type. We believe that decision makers and operators will find it easier to navigate a classification system based on the problem characteristics, something they can clearly define, rather than the optimization algorithm. Part of this study includes an investigation regarding the extent of algorithm uptake and the possible reasons that limit real world application.

  11. APPLYING MULTIDIMENSIONAL PACKET CLASSIFICATION ALGORITHM IN FIREWALL%多维包分类算法在防火墙中的应用

    Institute of Scientific and Technical Information of China (English)

    夏淑华

    2011-01-01

    Along with the globalisation of Internet application, the attendant problems of the security of network information and so on have however affected the users on their trust of the safety and reliability of Internet services and their use of it. At present the firewall technology is the important security means in dealing with the problem of network security, on the basis of the introduction of firewall technology classification, in this paper we have studied the main idea of the AFBV algorthm. To solve the problem of this algorithm that in multidimensional rule library with large number it might appear the problem of time performance excessive consumption, we make the optimisation and the improvement, the deficiency of the AFBV algorithm in complex network environment has been overcome effectively. The contrast expermental result has been given through simulation experiment as well.%随着互联网应用的全球化发展,随之而来的网络信息安全等问题却影响了用户对互联网络服务安全性和可靠性的信任与使用.防火墙技术是目前应对网络安全问题的重要安全技术,在介绍防火墙技术分类的基础上,研究了AFBV算法的主要思想.针对该算法在数目较大的多维规则库下可能出现时间性能消耗过大的问题,进行了优化和改进,有效地克服了AFBV算法在复杂网络环境中的缺陷,并通过仿真实验给出了对比的实验结果.

  12. Classification of voltage-gated K(+) ion channels from 3D pseudo-folding graph representation of protein sequences using genetic algorithm-optimized support vector machines.

    Science.gov (United States)

    Fernández, Michael; Fernández, Leyden; Abreu, Jose Ignacio; Garriga, Miguel

    2008-06-01

    Voltage-gated K(+) ion channels (VKCs) are membrane proteins that regulate the passage of potassium ions through membranes. This work reports a classification scheme of VKCs according to the signs of three electrophysiological variables: activation threshold voltage (V(t)), half-activation voltage (V(a50)) and half-inactivation voltage (V(h50)). A novel 3D pseudo-folding graph representation of protein sequences encoded the VKC sequences. Amino acid pseudo-folding 3D distances count (AAp3DC) descriptors, calculated from the Euclidean distances matrices (EDMs) were tested for building the classifiers. Genetic algorithm (GA)-optimized support vector machines (SVMs) with a radial basis function (RBF) kernel well discriminated between VKCs having negative and positive/zero V(t), V(a50) and V(h50) values with overall accuracies about 80, 90 and 86%, respectively, in crossvalidation test. We found contributions of the "pseudo-core" and "pseudo-surface" of the 3D pseudo-folded proteins to the discrimination between VKCs according to the three electrophysiological variables.

  13. Multi-Level Audio Classification Architecture

    Directory of Open Access Journals (Sweden)

    Jozef Vavrek

    2015-01-01

    Full Text Available A multi-level classification architecture for solving binary discrimination problem is proposed in this paper. The main idea of proposed solution is derived from the fact that solving one binary discrimination problem multiple times can reduce the overall miss-classification error. We aimed our effort towards building the classification architecture employing the combination of multiple binary SVM (Support Vector Machine classifiers for solving two-class discrimination problem. Therefore, we developed a binary discrimination architecture employing the SVM classifier (BDASVM with intention to use it for classification of broadcast news (BN audio data. The fundamental element of BDASVM is the binary decision (BD algorithm that performs discrimination between each pair of acoustic classes utilizing decision function modeled by separating hyperplane. The overall classification accuracy is conditioned by finding the optimal parameters for discrimination function resulting in higher computational complexity. The final form of proposed BDASVM is created by combining four BDSVM discriminators supplemented by decision table. Experimental results show that the proposed classification architecture can decrease the overall classification error in comparison with binary decision trees SVM (BDTSVM architecture.

  14. 中医诊断模型构建中的两种常用数据挖掘分类技术%Two Classification Algorithms of Data Mining Frequently Adopted in Creating Diagnosis Model of Traditional Chinese Medicine

    Institute of Scientific and Technical Information of China (English)

    车立娟; 马利庄

    2013-01-01

    决策树和神经网络是经典的数据挖掘分类技术,介绍这两种常用分类技术及其在中医诊断模型构建中的应用,分析总结了算法的优势与不足,以期为研究者在选择算法时提供依据。%The creation of diagnosis model of traditional Chinese medicine is the classification of tradi-tional Chinese medicine samples in essence .Classification decision tree and neural network are two classical data mining technologies .These two classification technologies and their application in the creation of diag-nosis model of traditional Chinese medicine are introduced in this paper .T he advantages and disadvantages of both technologies are analyzed in detail ,providing a basis for researchers to choose the appropriate algo-rithm .

  15. Bus Fare Classification with Genetic Algorithm%基于遗传算法的公交票价分级策略研究

    Institute of Scientific and Technical Information of China (English)

    白翰; 刘浩学; 倪怀州; 杨震

    2012-01-01

    A reasonable fare classification strategy can often make the bus passengers and operators achieve a "win-win" situation. To realize that, the information of passenger choice is collected by a SP survey, the key factors affecting the generalized cost are identifited and the function of the generalized cost is formulated by the statistical analysis method. Then a method of fare classification based on the bi-level programming model is developed by taking the maximum generalized cost and maximum interests of operation as the ultimate goal. The model is solved by the genetic algorithm, and the bus line 2 in Jinan city of China is taken as an example to verify the rationality of the model. The result shows the model can raise not only the generalized cost of the bus passenger but the interest of the operators.%合理的票价分级策略往往能够使得公交乘客和经营者实现“双赢”的局面.为了制定合理的票价分级策略,本文首先通过SP调查采集乘客的选择意向信息,研究了影响乘客广义费用函数的因素变化规律,采用统计分析的方法建立了广义费用函数;其次,以乘客广义费用最小和公交经营者经营效益最大为目标,建立票价分级策略的双层优化模型;最后,基于遗传算法设计了该模型的求解方法,并以济南市2路公交做实例研究.结果表明,本文所研究的基于遗传算法的公交票价分级策略不仅能够使公交乘客的综合效用增加,而且能够使经营者的收益有所提高.

  16. A Classification Leveraged Object Detector

    OpenAIRE

    Sun, Miao; Han, Tony X.; He, Zhihai

    2016-01-01

    Currently, the state-of-the-art image classification algorithms outperform the best available object detector by a big margin in terms of average precision. We, therefore, propose a simple yet principled approach that allows us to leverage object detection through image classification on supporting regions specified by a preliminary object detector. Using a simple bag-of- words model based image classification algorithm, we leveraged the performance of the deformable model objector from 35.9%...

  17. Classification, disease, and diagnosis.

    Science.gov (United States)

    Jutel, Annemarie

    2011-01-01

    Classification shapes medicine and guides its practice. Understanding classification must be part of the quest to better understand the social context and implications of diagnosis. Classifications are part of the human work that provides a foundation for the recognition and study of illness: deciding how the vast expanse of nature can be partitioned into meaningful chunks, stabilizing and structuring what is otherwise disordered. This article explores the aims of classification, their embodiment in medical diagnosis, and the historical traditions of medical classification. It provides a brief overview of the aims and principles of classification and their relevance to contemporary medicine. It also demonstrates how classifications operate as social framing devices that enable and disable communication, assert and refute authority, and are important items for sociological study.

  18. An Efficient Audio Classification Approach Based on Support Vector Machines

    Directory of Open Access Journals (Sweden)

    Lhoucine Bahatti

    2016-05-01

    Full Text Available In order to achieve an audio classification aimed to identify the composer, the use of adequate and relevant features is important to improve performance especially when the classification algorithm is based on support vector machines. As opposed to conventional approaches that often use timbral features based on a time-frequency representation of the musical signal using constant window, this paper deals with a new audio classification method which improves the features extraction according the Constant Q Transform (CQT approach and includes original audio features related to the musical context in which the notes appear. The enhancement done by this work is also lay on the proposal of an optimal features selection procedure which combines filter and wrapper strategies. Experimental results show the accuracy and efficiency of the adopted approach in the binary classification as well as in the multi-class classification.

  19. A comparison of selected classification algorithms for mapping bamboo patches in lower Gangetic plains using very high resolution WorldView 2 imagery

    Science.gov (United States)

    Ghosh, Aniruddha; Joshi, P. K.

    2014-02-01

    Bamboo is used by different communities in India to develop indigenous products, maintain livelihood and sustain life. Indian National Bamboo Mission focuses on evaluation, monitoring and development of bamboo as an important plant resource. Knowledge of spatial distribution of bamboo therefore becomes necessary in this context. The present study attempts to map bamboo patches using very high resolution (VHR) WorldView 2 (WV 2) imagery in parts of South 24 Parganas, West Bengal, India using both pixel and object-based approaches. A combined layer of pan-sharpened multi-spectral (MS) bands, first 3 principal components (PC) of these bands and seven second order texture measures based Gray Level Co-occurrence Matrices (GLCM) of first three PC were used as input variables. For pixel-based image analysis (PBIA), recursive feature elimination (RFE) based feature selection was carried out to identify the most important input variables. Results of the feature selection indicate that the 10 most important variables include PC 1, PC 2 and their GLCM mean along with 6 MS bands. Three different sets of predictor variables (5 and 10 most important variables and all 32 variables) were classified with Support Vector Machine (SVM) and Random Forest (RF) algorithms. Producer accuracy of bamboo was found to be highest when 10 most important variables selected from RFE were classified with SVM (82%). However object-based image analysis (OBIA) achieved higher classification accuracy than PBIA using the same 32 variables, but with less number of training samples. Using object-based SVM classifier, the producer accuracy of bamboo reached 94%. The significance of this study is that the present framework is capable of accurately identifying bamboo patches as well as detecting other tree species in a tropical region with heterogeneous land use land cover (LULC), which could further aid the mandate of National Bamboo Mission and related programs.

  20. AUTOMATIC REMOTE SENSING IMAGE CLASSIFICATION ALGORITHM BASED ONFCM AND BP NEURAL NETWORK%基于模糊C均值和BP神经网络的遥感影像自动分类算法

    Institute of Scientific and Technical Information of China (English)

    黄奇瑞

    2015-01-01

    针对非监督分类算法分类精度不高、监督法分类算法的训练样本需要人工选择且容易误选的问题,提出了一种基于模糊C均值聚类( FCM)和BP神经网络相结合的遥感影像自动分类算法. 首先利用FCM对影像进行初始聚类,然后根据聚类结果,由该算法自动选取其中的纯净像元作为训练样本,并送入BP网络进行学习,用最终训练得到的BP神经网络分类器对TM遥感影像进行分类,实验结果表明该算法具有较高的分类精度,能够满足大尺度地物类别判定的需要.%As for the problems that low classification accuracy of non-supervise classification algorithm and training sample of super-vise classification algorithm needs manual selection which is easy to be made wrongly, there is an automatic classfication algorithm of remote sensing image which is based on the combination of FCM and BP neural network. First, this paper uses FCM to make initial clusters of images. Then in accordance with the results of clusters, this paper picks out the endmembers which are automatically select-ed by the algorithm as the traaning samples, sends the samples to study in BP network and uses the BP neural network classifier which is got from the final training to classify the TM remote sensing images. The result shows that the algorithm owns high accuracy which could meet the requirements of determination of object types in a large scale.

  1. Two Projection Pursuit Algorithms for Machine Learning under Non-Stationarity

    CERN Document Server

    Blythe, Duncan A J

    2011-01-01

    This thesis derives, tests and applies two linear projection algorithms for machine learning under non-stationarity. The first finds a direction in a linear space upon which a data set is maximally non-stationary. The second aims to robustify two-way classification against non-stationarity. The algorithm is tested on a key application scenario, namely Brain Computer Interfacing.

  2. 基于极限学习机的分类算法及在故障识别中的应用%Classification algorithm based on extreme learning machine and its application in fault identification of Tennessee Eastman process

    Institute of Scientific and Technical Information of China (English)

    裘日辉; 刘康玲; 谭海龙; 梁军

    2016-01-01

    利用极限学习机(ELM )分类器的结构特点重新设计面向多分类任务的ELM分类器,提出基于ELM的优化分类算法One‐Class‐PCA‐ELM 。该算法的实现过程如下:对故障数据进行主元分析(PCA )处理,降低数据维数,去除噪声与冗余信息;将训练数据集按类分割,建立各类对应的单分类模型,整合得到One‐Class‐PCA‐ELM 分类模型;将待分类数据输入One‐Class‐PCA‐ELM 分类模型,得到待分类数据的类标号,完成分类。仿真实验结果表明,该算法保持了极限学习机极快的训练速度,具有较高的分类准确率及较理想的分类稳定性。%A new extreme learning machine (ELM ) classifier for multi‐classification task was designed based on the structural features of the ELM classifier ,and an improved classification algorithm based on ELM (One‐Class‐PCA‐ELM ) was purposed .The classification algorithm was realized as follows .PCA method was utilized to process the fault data for dimensionality reduction as well as removing noise and redundant information .Then the training data were allocated into different categories according to their respective class labels and the corresponding classification model was constructed for each training data category ,obtaining One‐Class‐PCA‐ELM model . An unclassified fault data was constructed into the trained One‐Class‐PCA‐ELM model ,getting its class label and making classification process completed . Experimental results show that the proposed algorithm maintains the fast training speed of ELM ,and has high classification accuracy and ideal classification stability .

  3. AIM: Attracting Women into Sciences.

    Science.gov (United States)

    Hartman, Icial S.

    1995-01-01

    Addresses how to attract more college women into the sciences. Attracting Women into Sciences (AIM) is a comprehensive approach that begins with advising, advertising, and ambiguity. The advising process includes dispelling stereotypes and reviewing the options open to a female basic science major. Interaction, involvement and instruction, finding…

  4. Efficient Fingercode Classification

    Science.gov (United States)

    Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang

    In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.

  5. Graduates employment classification using data mining approach

    Science.gov (United States)

    Aziz, Mohd Tajul Rizal Ab; Yusof, Yuhanis

    2016-08-01

    Data Mining is a platform to extract hidden knowledge in a collection of data. This study investigates the suitable classification model to classify graduates employment for one of the MARA Professional College (KPM) in Malaysia. The aim is to classify the graduates into either as employed, unemployed or further study. Five data mining algorithms offered in WEKA were used; Naïve Bayes, Logistic regression, Multilayer perceptron, k-nearest neighbor and Decision tree J48. Based on the obtained result, it is learned that the Logistic regression produces the highest classification accuracy which is at 92.5%. Such result was obtained while using 80% data for training and 20% for testing. The produced classification model will benefit the management of the college as it provides insight to the quality of graduates that they produce and how their curriculum can be improved to cater the needs from the industry.

  6. 一种改进的基于支持向量机的多类分类方法%AN IMPROVED SVM-BASED MULTI-CLASS CLASSIFICATION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    赵亮

    2014-01-01

    针对现有支持向量机多类分类算法在分类精度上的不足,提出一种改进的支持向量机决策树多类分类算法。为了最大限度地减少误差积累的影响,该算法利用投影向量的思想作为衡量类分离性的标准,由此构建非平衡决策树,并且在决策树节点处对正负样本选取不同的惩罚因子来处理不平衡数据集的影响,最后引入KNN算法与SVM共同识别数据集。通过在手写体数字识别数据集上的仿真实验,分析比较各种方法,表明该方法能有效提高分类精度。%In light of the deficiency of existing SVM multi-class classification algorithm in classification accuracy, we propose an improved SVM decision tree multi-class classification algorithm.In order to minimise the impact of the error accumulation to greatest extent, the algorithm uses the idea of vector projection as the standard to measure class separation, thus constructs an unbalanced decision tree.Furthermore, it selects different punishment factors from positive and negative samples at the nodes of decision tree to counteract the impact from unbalanced data sets.At last, it introduces KNN to co-recognise the data sets with SVM.Analysing and comparing diffident methods by the simulation experiment on handwritten digit recognition data sets, it is shown that this method can effectively improve the classification accuracy.

  7. Algorithm of remote sensing image classification improved by bands selection and hybrid kernel functions%改进的波段选择混合核函数遥感图像分类算法

    Institute of Scientific and Technical Information of China (English)

    徐倩; 何建农

    2012-01-01

    针对遥感图像多波段不易成像、其图像信息冗余不适合图像分类以及传统LMBP算法迭代次数多且分类不够精确的问题,改进了OIF指数和可分性距离公式,分组并选出遥感图像最佳波段组合,并运用改进的LMBP混合核函数算法进行分类.仿真实验表明,改进算法对各波段信息分析更加全面客观,波段选择更加优化;与传统算法相比,网络训练迭代次数有明显减少,分类精度及Kappa系数分别提高了5%和6.625%,遥感图像分类更有效.%As the multi-band of remote sensing image is not easy to imaging, ila redundancy image information is not suitable for image classification, what's more, ihe traditional LMBP algorithm has. large iteration number and classification imprecise problems. This paper improved the formula of the OIF index number and separability distance, separated to chose the best band combination, and then used the LMBP algorithm refinement of hybrid kernel function to classify. The simulation results show that the improved method can analyze information of the bands more comprehensive and objective, comparing with the traditional algorithm,the network training iterations are significantly reduced,the classification accuracy and Kappa coefficient can be increased by 5% and 6. 625% , the classification of remote sensing image more effectively.

  8. 数据流分类器算法在水质环境中的应用%The Application of Data Stream Classification Algorithm in Water Quality Environment

    Institute of Scientific and Technical Information of China (English)

    曹红; 郑鑫

    2014-01-01

    许多现实应用中,由于数据流的特性,使人们难以获得全部数据的类标签。为了解决类标签不完整数据流的分类问题,本文首先分析了有标签数据集对基于聚类假设半监督分类算法分类误差的影响;然后,利用分类误差影响分析以及数据流的特点,提出一种基于聚类假设半监督数据流集成分类器算法(semi-supervised data stream ensemble classifiers under the cluster assumption, SSDSEC),并针对个体分类器的权值设定进行了探讨;最后,利用仿真实验验证本文算法的有效性。%In many real-world applications, due to the characteristics of the data stream, makes it difficult to get the class labels of all data. This paper first analyzes in order to solve the problem of the class label incomplete data stream classification, labeled data set based on clustering assuming semi-supervised classification algorithms classification error; then use classification errors affect the analysis as well as the characteristics of the data stream is proposed semi-supervised data stream the integrated classifier algorithm (Semi-supervised data stream ensemble classifiers under the cluster assumption, SSDSEC), and assigning weights for individual classifier based clustering assumptions; Finally, the simulation results verify the proposed algorithm effectiveness.

  9. The application of mixed recommendation algorithm with user clustering in the microblog advertisements promotion

    Science.gov (United States)

    Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen

    2017-03-01

    The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.

  10. China Aims to Promote Import

    Institute of Scientific and Technical Information of China (English)

    Wang Ting

    2010-01-01

    @@ With the theme of"An Opening Market and Global Trade",aim at promoting communications and exchanges among governments,industries and business to achieve mutual benefit and a win-win situation,nearly 300 representatives from the relevant departments of the Chinese government,foreign embassies in China,industrial associations and major enterprises,as well as well-known Chinese and foreign experts and scholars were invited to take part in the forum and share their iews on Chinese market and foreign trade policies.

  11. 一种新的中文文本分类算法——One Class SVM-KNN算法%A New Text Classification Algorithm-One Class SVM-KNN

    Institute of Scientific and Technical Information of China (English)

    刘文; 吴陈

    2012-01-01

    中文文本分类在数据库及搜索引擎中得到广泛的应用,K-近邻(KNN)算法是常用于中文文本分类中的分类方法,但K-近邻在分类过程中需要存储所有的训练样本,并且直到待测样本需要分类时才建立分类,而且还存在类倾斜现象以及存储和计算的开销大等缺陷.单类SVM对只有一类的分类问题具有很好的效果,但不适用于多类分类问题,因此针对KNN存在的缺陷及单类SVM的特点提出One Class SVM-KNN算法,并给出了算法的定义及详细分析.通过实验证明此方法很好地克服了KNN算法的缺陷,并且查全率、查准率明显优于K-近邻算法.%Text classification is widely used in database and search engine. KNN is widely used in Chinese text categorization,however, KNN has many defects in the application of text classification. The deficiency of KNN classification algorithm is that all the training samples are kept until the samples are classified. When the size of samples is very large, the storage and computation will be costly, which will result in classification deviation. One class SVM is a simple and effective classification algorithm in one class. To solve KNN problems, a new algorithm based on harmonic one-class-SVM and KNN was proposed, which will achieve better classification effect. The experiment result is shown that the recall computed using the proposed method is obviously more highly than the KNN method.

  12. IMAGE CLASSIFICATION ALGORITHM BASED ON PCA-SIFT FEATURES AND BAYESIAN DECISION%基于PCA-SIFT特征与贝叶斯决策的图像分类算法

    Institute of Scientific and Technical Information of China (English)

    涂秋洁; 王晅

    2016-01-01

    In order to cope with the problems that existing SIFT-based image classification algorithms require a large amount of storage space and are sensitive to image backgrounds,this paper presents a novel image classification algorithm which is based on PCA-SIFT features and Bayesian decision.The algorithm first applies the principal component analysis (PCA)to reduce the dimensionality of SIFT from 128 to 36,in training process,it makes regional matching on PCA-SIFT descriptors of the training sample images.In order to improve its robustness on background image interference,we selected the stable PCA-SIFT descriptors in object images based on their matching rates,and then used the maximum likelihood estimation to estimate the probability distribution parameters.Finally we used Bayesian decision theory to implement the image classification.Simulation experiment showed that this algorithm has higher classification accuracy compared with existing SIFT-based image classification methods.It also has minimum storage space requirement and higher computation efficiency.%针对现有的基于SIFT特征的图像分类算法具有较大的储存空间需求及对图像背景较为敏感的问题,提出一种基于PCA-SIFT特征和贝叶斯决策的图像分类算法。该算法首先应用主成分分析将SIFT特征从128维降为36维,在训练过程中,对训练样本图像的PCA-SIFT进行区域匹配。基于匹配率选择目标图像中的稳定PCA-SIFT以提高算法对背景图像干扰的鲁棒性,然后应用最大似然估计估计概率分布参数,最后,应用贝叶斯决策理论实现图像分类。仿真实验表明,该算法与现有的SIFT图像分类算法相比分类精度高,而且具有最小的储存空间需求与较高的计算效率。

  13. 基于最小距离的多中心向量的增量分类算法%Incremental Classification Algorithm of Multiple Center Vector Based on Minimun-Distance

    Institute of Scientific and Technical Information of China (English)

    王伟

    2015-01-01

    Classification is an important research content of data mining. In this paper, a new incremental classification algorithm of multiple center vector base on minimun-distance is presented after analysising the existing methods on incremental classifica⁃tion algorithm of minimun-distance. In the method, first, cluster the training data according to class attribute ,then eliminate the overlap of class space by adjusting the space between classes, finally, incremental classification algorithm of multiple center vector base on minimun-distance is presented to classify incremental training data, reduce the selected number of representative sample. The experiments indicate that compare with the Algorithm from literature[14],the classification accuracy is basically the same, but the demand of storage space and the time complex decreased in different degree, it is significant for big data classification.%分类是数据挖掘的一项重要研究内容。在分析了现有分类方法后,提出了基于最小距离的多中心向量的增量分类算法。该方法首先按照属性类聚类训练样本,通过类间调整,消除类域空间重叠。针对增量分类,提出了多中心向量的分类算法,通过空间区域划分的方法,减少增量分类选取的代表样本数量。实验结果表明,与文献[14]提出的增量分类算法相比,分类精度近似相同,但所需时间复杂度和存储空间则有不同程度的下降,这对大数据的处理是具有重要意义的。

  14. Classification of eight dimensional perfect forms

    NARCIS (Netherlands)

    Dutour Sikiric, M.; Schuermann, A.; Vallentin, F.

    2007-01-01

    In this paper, we classify the perfect lattices in dimension 8. There are 10916 of them. Our classification heavily relies on exploiting symmetry in polyhedral computations. Here we describe algorithms making the classification possible.

  15. Classification of Spreadsheet Errors

    OpenAIRE

    Rajalingham, Kamalasen; Chadwick, David R.; Knight, Brian

    2008-01-01

    This paper describes a framework for a systematic classification of spreadsheet errors. This classification or taxonomy of errors is aimed at facilitating analysis and comprehension of the different types of spreadsheet errors. The taxonomy is an outcome of an investigation of the widespread problem of spreadsheet errors and an analysis of specific types of these errors. This paper contains a description of the various elements and categories of the classification and is supported by appropri...

  16. Supernova Photometric Classification Challenge

    CERN Document Server

    Kessler, Richard; Jha, Saurabh; Kuhlmann, Stephen

    2010-01-01

    We have publicly released a blinded mix of simulated SNe, with types (Ia, Ib, Ic, II) selected in proportion to their expected rate. The simulation is realized in the griz filters of the Dark Energy Survey (DES) with realistic observing conditions (sky noise, point spread function and atmospheric transparency) based on years of recorded conditions at the DES site. Simulations of non-Ia type SNe are based on spectroscopically confirmed light curves that include unpublished non-Ia samples donated from the Carnegie Supernova Project (CSP), the Supernova Legacy Survey (SNLS), and the Sloan Digital Sky Survey-II (SDSS-II). We challenge scientists to run their classification algorithms and report a type for each SN. A spectroscopically confirmed subset is provided for training. The goals of this challenge are to (1) learn the relative strengths and weaknesses of the different classification algorithms, (2) use the results to improve classification algorithms, and (3) understand what spectroscopically confirmed sub-...

  17. Comparison of Support Vector Machine, Neural Network, and CART Algorithms for the Land-Cover Classification Using Limited Training Data Points

    Science.gov (United States)

    Support vector machine (SVM) was applied for land-cover characterization using MODIS time-series data. Classification performance was examined with respect to training sample size, sample variability, and landscape homogeneity (purity). The results were compared to two convention...

  18. 基于粗集分类和遗传算法的知识库集成方法%The Methods of Knowledge Database Integration Based on the Rough Set Classification and Genetic Algorithm

    Institute of Scientific and Technical Information of China (English)

    郭平; 程代杰

    2003-01-01

    As the base of intelligent system, it is very important to guarantee the consistency and non-redundancy of knowledge in knowledge database. Since the variety of knowledge sources, it is necessary to dispose knowledge with redundancy, inclusion and even contradiction during the integration of knowledge database. This paper researches the integration method based on the multi-knowledge database. Firstly, it finds out the inconsistent knowledge sets between the knowledge databases by rough set classification and presents one method eliminating the inconsistency by test data. Then, it regards consistent knowledge sets as the initial population of genetic calculation and constructs a genetic adaptive function based on accuracy, practicability and spreadability of knowledge representation to carry on the genetic calculation. Lastly, classifying the results of genetic calculation reduces the knowledge redundancy of knowledge database. This paper also presents a frameworkfor knowledge database integration based on the rough set classification and genetic algorithm.

  19. Significance of Classification Techniques in Prediction of Learning Disabilities

    CERN Document Server

    Balakrishnan, Julie M David And Kannan

    2010-01-01

    The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified.

  20. Classifier in Age classification

    Directory of Open Access Journals (Sweden)

    B. Santhi

    2012-12-01

    Full Text Available Face is the important feature of the human beings. We can derive various properties of a human by analyzing the face. The objective of the study is to design a classifier for age using facial images. Age classification is essential in many applications like crime detection, employment and face detection. The proposed algorithm contains four phases: preprocessing, feature extraction, feature selection and classification. The classification employs two class labels namely child and Old. This study addresses the limitations in the existing classifiers, as it uses the Grey Level Co-occurrence Matrix (GLCM for feature extraction and Support Vector Machine (SVM for classification. This improves the accuracy of the classification as it outperforms the existing methods.