WorldWideScience

Sample records for supervised classification algorithms

  1. Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification

    Directory of Open Access Journals (Sweden)

    R. Sathya

    2013-02-01

    Full Text Available This paper presents a comparative account of unsupervised and supervised learning models and their pattern classification evaluations as applied to the higher education scenario. Classification plays a vital role in machine based learning algorithms and in the present study, we found that, though the error back-propagation learning algorithm as provided by supervised learning model is very efficient for a number of non-linear real-time problems, KSOM of unsupervised learning model, offers efficient solution and classification in the present study.

  2. QUEST: Eliminating Online Supervised Learning for Efficient Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Ardjan Zwartjes

    2016-10-01

    Full Text Available In this work, we introduce QUEST (QUantile Estimation after Supervised Training, an adaptive classification algorithm for Wireless Sensor Networks (WSNs that eliminates the necessity for online supervised learning. Online processing is important for many sensor network applications. Transmitting raw sensor data puts high demands on the battery, reducing network life time. By merely transmitting partial results or classifications based on the sampled data, the amount of traffic on the network can be significantly reduced. Such classifications can be made by learning based algorithms using sampled data. An important issue, however, is the training phase of these learning based algorithms. Training a deployed sensor network requires a lot of communication and an impractical amount of human involvement. QUEST is a hybrid algorithm that combines supervised learning in a controlled environment with unsupervised learning on the location of deployment. Using the SITEX02 dataset, we demonstrate that the presented solution works with a performance penalty of less than 10% in 90% of the tests. Under some circumstances, it even outperforms a network of classifiers completely trained with supervised learning. As a result, the need for on-site supervised learning and communication for training is completely eliminated by our solution.

  3. Benchmarking protein classification algorithms via supervised cross-validation.

    Science.gov (United States)

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and

  4. A Supervised Classification Algorithm for Note Onset Detection

    Directory of Open Access Journals (Sweden)

    Douglas Eck

    2007-01-01

    Full Text Available This paper presents a novel approach to detecting onsets in music audio files. We use a supervised learning algorithm to classify spectrogram frames extracted from digital audio as being onsets or nononsets. Frames classified as onsets are then treated with a simple peak-picking algorithm based on a moving average. We present two versions of this approach. The first version uses a single neural network classifier. The second version combines the predictions of several networks trained using different hyperparameters. We describe the details of the algorithm and summarize the performance of both variants on several datasets. We also examine our choice of hyperparameters by describing results of cross-validation experiments done on a custom dataset. We conclude that a supervised learning approach to note onset detection performs well and warrants further investigation.

  5. Benchmarking protein classification algorithms via supervised cross-validation

    NARCIS (Netherlands)

    Kertész-Farkas, A.; Dhir, S.; Sonego, P.; Pacurar, M.; Netoteia, S.; Nijveen, H.; Kuzniar, A.; Leunissen, J.A.M.; Kocsor, A.; Pongor, S.

    2008-01-01

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-o

  6. Research of Plant-Leaves Classification Algorithm Based on Supervised LLE

    Directory of Open Access Journals (Sweden)

    Yan Qing

    2013-06-01

    Full Text Available A new supervised LLE method based on the fisher projection was proposed in this paper, and combined it with a new classification algorithm based on manifold learning to realize the recognition of the plant leaves. Firstly,the method utilizes the Fisher projection distance to replace the sample's geodesic distance, and a new supervised LLE algorithm is obtained .Then, a classification algorithm which uses the manifold reconstruction error to distinguish the sample classification directly is adopted. This algorithm can utilize the category information better,and improve recognition rate effectively. At the same time, it has the advantage of the easily parameter estimation. The experimental results based on the real-world plant leaf databases shows its average accuracy of recognition was up to 95.17%.

  7. Synthesis of supervised classification algorithm using intelligent and statistical tools

    Directory of Open Access Journals (Sweden)

    Ali Douik

    2009-09-01

    Full Text Available A fundamental task in detecting foreground objects in both static and dynamic scenes is to take the best choice of color system representation and the efficient technique for background modeling. We propose in this paper a non-parametric algorithm dedicated to segment and to detect objects in color images issued from a football sports meeting. Indeed segmentation by pixel concern many applications and revealed how the method is robust to detect objects, even in presence of strong shadows and highlights. In the other hand to refine their playing strategy such as in football, handball, volley ball, Rugby, the coach need to have a maximum of technical-tactics information about the on-going of the game and the players. We propose in this paper a range of algorithms allowing the resolution of many problems appearing in the automated process of team identification, where each player is affected to his corresponding team relying on visual data. The developed system was tested on a match of the Tunisian national competition. This work is prominent for many next computer vision studies as it's detailed in this study.

  8. Synthesis of supervised classification algorithm using intelligent and statistical tools

    CERN Document Server

    Douik, Ali

    2009-01-01

    A fundamental task in detecting foreground objects in both static and dynamic scenes is to take the best choice of color system representation and the efficient technique for background modeling. We propose in this paper a non-parametric algorithm dedicated to segment and to detect objects in color images issued from a football sports meeting. Indeed segmentation by pixel concern many applications and revealed how the method is robust to detect objects, even in presence of strong shadows and highlights. In the other hand to refine their playing strategy such as in football, handball, volley ball, Rugby..., the coach need to have a maximum of technical-tactics information about the on-going of the game and the players. We propose in this paper a range of algorithms allowing the resolution of many problems appearing in the automated process of team identification, where each player is affected to his corresponding team relying on visual data. The developed system was tested on a match of the Tunisian national c...

  9. Generation of a Supervised Classification Algorithm for Time-Series Variable Stars with an Application to the LINEAR Dataset

    CERN Document Server

    Johnston, Kyle B

    2016-01-01

    With the advent of digital astronomy, new benefits and new problems have been presented to the modern day astronomer. While data can be captured in a more efficient and accurate manor using digital means, the efficiency of data retrieval has led to an overload of scientific data for processing and storage. This paper will focus on the construction and application of a supervised pattern classification algorithm for the identification of variable stars. Given the reduction of a survey of stars into a standard feature space, the problem of using prior patterns to identify new observed patterns can be reduced to time tested classification methodologies and algorithms. Such supervised methods, so called because the user trains the algorithms prior to application using patterns with known classes or labels, provide a means to probabilistically determine the estimated class type of new observations. This paper will demonstrate the construction and application of a supervised classification algorithm on variable sta...

  10. A Novel Classification Algorithm Based on Incremental Semi-Supervised Support Vector Machine

    Science.gov (United States)

    Gao, Fei; Mei, Jingyuan; Sun, Jinping; Wang, Jun; Yang, Erfu; Hussain, Amir

    2015-01-01

    For current computational intelligence techniques, a major challenge is how to learn new concepts in changing environment. Traditional learning schemes could not adequately address this problem due to a lack of dynamic data selection mechanism. In this paper, inspired by human learning process, a novel classification algorithm based on incremental semi-supervised support vector machine (SVM) is proposed. Through the analysis of prediction confidence of samples and data distribution in a changing environment, a “soft-start” approach, a data selection mechanism and a data cleaning mechanism are designed, which complete the construction of our incremental semi-supervised learning system. Noticeably, with the ingenious design procedure of our proposed algorithm, the computation complexity is reduced effectively. In addition, for the possible appearance of some new labeled samples in the learning process, a detailed analysis is also carried out. The results show that our algorithm does not rely on the model of sample distribution, has an extremely low rate of introducing wrong semi-labeled samples and can effectively make use of the unlabeled samples to enrich the knowledge system of classifier and improve the accuracy rate. Moreover, our method also has outstanding generalization performance and the ability to overcome the concept drift in a changing environment. PMID:26275294

  11. A Novel Classification Algorithm Based on Incremental Semi-Supervised Support Vector Machine.

    Directory of Open Access Journals (Sweden)

    Fei Gao

    Full Text Available For current computational intelligence techniques, a major challenge is how to learn new concepts in changing environment. Traditional learning schemes could not adequately address this problem due to a lack of dynamic data selection mechanism. In this paper, inspired by human learning process, a novel classification algorithm based on incremental semi-supervised support vector machine (SVM is proposed. Through the analysis of prediction confidence of samples and data distribution in a changing environment, a "soft-start" approach, a data selection mechanism and a data cleaning mechanism are designed, which complete the construction of our incremental semi-supervised learning system. Noticeably, with the ingenious design procedure of our proposed algorithm, the computation complexity is reduced effectively. In addition, for the possible appearance of some new labeled samples in the learning process, a detailed analysis is also carried out. The results show that our algorithm does not rely on the model of sample distribution, has an extremely low rate of introducing wrong semi-labeled samples and can effectively make use of the unlabeled samples to enrich the knowledge system of classifier and improve the accuracy rate. Moreover, our method also has outstanding generalization performance and the ability to overcome the concept drift in a changing environment.

  12. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms

    DEFF Research Database (Denmark)

    Yoo, C.; Gernaey, Krist

    2008-01-01

    In this paper, a new supervised clustering and classification method is proposed. First, the application of discriminant partial least squares (DPLS) for the selection of a minimum number of key genes is applied on a gene expression microarray data set. Second, supervised hierarchical clustering ...

  13. Generation of a supervised classification algorithm for time-series variable stars with an application to the LINEAR dataset

    Science.gov (United States)

    Johnston, K. B.; Oluseyi, H. M.

    2017-04-01

    With the advent of digital astronomy, new benefits and new problems have been presented to the modern day astronomer. While data can be captured in a more efficient and accurate manner using digital means, the efficiency of data retrieval has led to an overload of scientific data for processing and storage. This paper will focus on the construction and application of a supervised pattern classification algorithm for the identification of variable stars. Given the reduction of a survey of stars into a standard feature space, the problem of using prior patterns to identify new observed patterns can be reduced to time-tested classification methodologies and algorithms. Such supervised methods, so called because the user trains the algorithms prior to application using patterns with known classes or labels, provide a means to probabilistically determine the estimated class type of new observations. This paper will demonstrate the construction and application of a supervised classification algorithm on variable star data. The classifier is applied to a set of 192,744 LINEAR data points. Of the original samples, 34,451 unique stars were classified with high confidence (high level of probability of being the true class).

  14. A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification

    Science.gov (United States)

    2016-07-01

    closed-form expression for the class of each node is derived. Moreover, the authors of [50] describe a semi-supervised method for classifying data using...manifold smoothing and image denoising. In addition to image processing, methods in- volving spectral graph theory [17,56], based on a graphical setting...pagerank and Section 3 presents a model using heat kernel pagerank directly as a classifier . Section 4 formulates the new algorithm as well as provides

  15. Application of supervised machine learning algorithms for the classification of regulatory RNA riboswitches.

    Science.gov (United States)

    Singh, Swadha; Singh, Raghvendra

    2016-04-03

    Riboswitches, the small structured RNA elements, were discovered about a decade ago. It has been the subject of intense interest to identify riboswitches, understand their mechanisms of action and use them in genetic engineering. The accumulation of genome and transcriptome sequence data and comparative genomics provide unprecedented opportunities to identify riboswitches in the genome. In the present study, we have evaluated the following six machine learning algorithms for their efficiency to classify riboswitches: J48, BayesNet, Naïve Bayes, Multilayer Perceptron, sequential minimal optimization, hidden Markov model (HMM). For determining effective classifier, the algorithms were compared on the statistical measures of specificity, sensitivity, accuracy, F-measure and receiver operating characteristic (ROC) plot analysis. The classifier Multilayer Perceptron achieved the best performance, with the highest specificity, sensitivity, F-score and accuracy, and with the largest area under the ROC curve, whereas HMM was the poorest performer. At present, the available tools for the prediction and classification of riboswitches are based on covariance model, support vector machine and HMM. The present study determines Multilayer Perceptron as a better classifier for the genome-wide riboswitch searches.

  16. Classification models for clear cell renal carcinoma stage progression, based on tumor RNAseq expression trained supervised machine learning algorithms.

    Science.gov (United States)

    Jagga, Zeenia; Gupta, Dinesh

    2014-01-01

    Clear-cell Renal Cell Carcinoma (ccRCC) is the most- prevalent, chemotherapy resistant and lethal adult kidney cancer. There is a need for novel diagnostic and prognostic biomarkers for ccRCC, due to its heterogeneous molecular profiles and asymptomatic early stage. This study aims to develop classification models to distinguish early stage and late stage of ccRCC based on gene expression profiles. We employed supervised learning algorithms- J48, Random Forest, SMO and Naïve Bayes; with enriched model learning by fast correlation based feature selection to develop classification models trained on sequencing based gene expression data of RNAseq experiments, obtained from The Cancer Genome Atlas. Different models developed in the study were evaluated on the basis of 10 fold cross validations and independent dataset testing. Random Forest based prediction model performed best amongst the models developed in the study, with a sensitivity of 89%, accuracy of 77% and area under Receivers Operating Curve of 0.8. We anticipate that the prioritized subset of 62 genes and prediction models developed in this study will aid experimental oncologists to expedite understanding of the molecular mechanisms of stage progression and discovery of prognostic factors for ccRCC tumors.

  17. Supervised Classification Performance of Multispectral Images

    CERN Document Server

    Perumal, K

    2010-01-01

    Nowadays government and private agencies use remote sensing imagery for a wide range of applications from military applications to farm development. The images may be a panchromatic, multispectral, hyperspectral or even ultraspectral of terra bytes. Remote sensing image classification is one amongst the most significant application worlds for remote sensing. A few number of image classification algorithms have proved good precision in classifying remote sensing data. But, of late, due to the increasing spatiotemporal dimensions of the remote sensing data, traditional classification algorithms have exposed weaknesses necessitating further research in the field of remote sensing image classification. So an efficient classifier is needed to classify the remote sensing images to extract information. We are experimenting with both supervised and unsupervised classification. Here we compare the different classification methods and their performances. It is found that Mahalanobis classifier performed the best in our...

  18. Automated Classification and Correlation of Drill Cores using High-Resolution Hyperspectral Images and Supervised Pattern Classification Algorithms. Applications to Paleoseismology

    Science.gov (United States)

    Ragona, D. E.; Minster, B.; Rockwell, T.; Jasso, H.

    2006-12-01

    The standard methodology to describe, classify and correlate geologic materials in the field or lab rely on physical inspection of samples, sometimes with the assistance of conventional analytical techniques (e. g. XRD, microscopy, particle size analysis). This is commonly both time-consuming and inherently subjective. Many geological materials share identical visible properties (e.g. fine grained materials, alteration minerals) and therefore cannot be mapped using the human eye alone. Recent investigations have shown that ground- based hyperspectral imaging provides an effective method to study and digitally store stratigraphic and structural data from cores or field exposures. Neural networks and Naive Bayesian classifiers supply a variety of well-established techniques towards pattern recognition, especially for data examples with high- dimensionality input-outputs. In this poster, we present a new methodology for automatic mapping of sedimentary stratigraphy in the lab (drill cores, samples) or the field (outcrops, exposures) using short wave infrared (SWIR) hyperspectral images and these two supervised classification algorithms. High-spatial/spectral resolution data from large sediment samples (drill cores) from a paleoseismic excavation site were collected using a portable hyperspectral scanner with 245 continuous channels measured across the 960 to 2404 nm spectral range. The data were corrected for geometric and radiometric distortions and pre-processed to obtain reflectance at each pixel of the images. We built an example set using hundreds of reflectance spectra collected from the sediment core images. The examples were grouped into eight classes corresponding to materials found in the samples. We constructed two additional example sets by computing the 2-norm normalization, the derivative of the smoothed original reflectance examples. Each example set was divided into four subsets: training, training test, verification and validation. A multi

  19. A New Method for Solving Supervised Data Classification Problems

    Directory of Open Access Journals (Sweden)

    Parvaneh Shabanzadeh

    2014-01-01

    Full Text Available Supervised data classification is one of the techniques used to extract nontrivial information from data. Classification is a widely used technique in various fields, including data mining, industry, medicine, science, and law. This paper considers a new algorithm for supervised data classification problems associated with the cluster analysis. The mathematical formulations for this algorithm are based on nonsmooth, nonconvex optimization. A new algorithm for solving this optimization problem is utilized. The new algorithm uses a derivative-free technique, with robustness and efficiency. To improve classification performance and efficiency in generating classification model, a new feature selection algorithm based on techniques of convex programming is suggested. Proposed methods are tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithms.

  20. Two Linear Unmixing Algorithms to Recognize Targets Using Supervised Classification and Orthogonal Rotation in Airborne Hyperspectral Images

    Directory of Open Access Journals (Sweden)

    Michael Zheludev

    2012-02-01

    Full Text Available The goal of the paper is to detect pixels that contain targets of known spectra. The target can be present in a sub- or above pixel. Pixels without targets are classified as background pixels. Each pixel is treated via the content of its neighborhood. A pixel whose spectrum is different from its neighborhood is classified as a “suspicious point”. In each suspicious point there is a mix of target(s and background. The main objective in a supervised detection (also called “target detection” is to search for a specific given spectral material (target in hyperspectral imaging (HSI where the spectral signature of the target is known a priori from laboratory measurements. In addition, the fractional abundance of the target is computed. To achieve this we present two linear unmixing algorithms that recognize targets with known (given spectral signatures. The CLUN is based on automatic feature extraction from the target’s spectrum. These features separate the target from the background. The ROTU algorithm is based on embedding the spectra space into a special space by random orthogonal transformation and on the statistical properties of the embedded result. Experimental results demonstrate that the targets’ locations were extracted correctly and these algorithms are robust and efficient.

  1. TV-SVM: Total Variation Support Vector Machine for Semi-Supervised Data Classification

    OpenAIRE

    Bresson, Xavier; Zhang, Ruiliang

    2012-01-01

    We introduce semi-supervised data classification algorithms based on total variation (TV), Reproducing Kernel Hilbert Space (RKHS), support vector machine (SVM), Cheeger cut, labeled and unlabeled data points. We design binary and multi-class semi-supervised classification algorithms. We compare the TV-based classification algorithms with the related Laplacian-based algorithms, and show that TV classification perform significantly better when the number of labeled data is small.

  2. Document Classification Using Expectation Maximization with Semi Supervised Learning

    CERN Document Server

    Nigam, Bhawna; Salve, Sonal; Vamney, Swati

    2011-01-01

    As the amount of online document increases, the demand for document classification to aid the analysis and management of document is increasing. Text is cheap, but information, in the form of knowing what classes a document belongs to, is expensive. The main purpose of this paper is to explain the expectation maximization technique of data mining to classify the document and to learn how to improve the accuracy while using semi-supervised approach. Expectation maximization algorithm is applied with both supervised and semi-supervised approach. It is found that semi-supervised approach is more accurate and effective. The main advantage of semi supervised approach is "Dynamically Generation of New Class". The algorithm first trains a classifier using the labeled document and probabilistically classifies the unlabeled documents. The car dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.

  3. Supervised Ensemble Classification of Kepler Variable Stars

    CERN Document Server

    Bass, Gideon

    2016-01-01

    Variable star analysis and classification is an important task in the understanding of stellar features and processes. While historically classifications have been done manually by highly skilled experts, the recent and rapid expansion in the quantity and quality of data has demanded new techniques, most notably automatic classification through supervised machine learning. We present an expansion of existing work on the field by analyzing variable stars in the {\\em Kepler} field using an ensemble approach, combining multiple characterization and classification techniques to produce improved classification rates. Classifications for each of the roughly 150,000 stars observed by {\\em Kepler} are produced separating the stars into one of 14 variable star classes.

  4. Results of Evolution Supervised by Genetic Algorithms

    CERN Document Server

    Jäntschi, Lorentz; Bălan, Mugur C; Sestraş, Radu E

    2010-01-01

    A series of results of evolution supervised by genetic algorithms with interest to agricultural and horticultural fields are reviewed. New obtained original results from the use of genetic algorithms on structure-activity relationships are reported.

  5. 7 CFR 27.80 - Fees; classification, Micronaire, and supervision.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Fees; classification, Micronaire, and supervision. 27... Classification and Micronaire § 27.80 Fees; classification, Micronaire, and supervision. For services rendered by... classification and Micronaire determination results certified on cotton class certificates.) (e) Supervision,...

  6. An AdaBoost algorithm for multiclass semi-supervised learning

    NARCIS (Netherlands)

    Tanha, J.; van Someren, M.; Afsarmanesh, H.; Zaki, M.J.; Siebes, A.; Yu, J.X.; Goethals, B.; Webb, G.; Wu, X.

    2012-01-01

    We present an algorithm for multiclass Semi-Supervised learning which is learning from a limited amount of labeled data and plenty of unlabeled data. Existing semi-supervised algorithms use approaches such as one-versus-all to convert the multiclass problem to several binary classification problems

  7. Quintic spline smooth semi-supervised support vector classification machine

    Institute of Scientific and Technical Information of China (English)

    Xiaodan Zhang; Jinggai Ma; Aihua Li; Ang Li

    2015-01-01

    A semi-supervised vector machine is a relatively new learning method using both labeled and unlabeled data in classifi-cation. Since the objective function of the model for an unstrained semi-supervised vector machine is not smooth, many fast opti-mization algorithms cannot be applied to solve the model. In order to overcome the difficulty of dealing with non-smooth objective functions, new methods that can solve the semi-supervised vector machine with desired classification accuracy are in great demand. A quintic spline function with three-times differentiability at the ori-gin is constructed by a general three-moment method, which can be used to approximate the symmetric hinge loss function. The approximate accuracy of the quintic spline function is estimated. Moreover, a quintic spline smooth semi-support vector machine is obtained and the convergence accuracy of the smooth model to the non-smooth one is analyzed. Three experiments are performed to test the efficiency of the model. The experimental results show that the new model outperforms other smooth models, in terms of classification performance. Furthermore, the new model is not sensitive to the increasing number of the labeled samples, which means that the new model is more efficient.

  8. Generative supervised classification using Dirichlet process priors.

    Science.gov (United States)

    Davy, Manuel; Tourneret, Jean-Yves

    2010-10-01

    Choosing the appropriate parameter prior distributions associated to a given bayesian model is a challenging problem. Conjugate priors can be selected for simplicity motivations. However, conjugate priors can be too restrictive to accurately model the available prior information. This paper studies a new generative supervised classifier which assumes that the parameter prior distributions conditioned on each class are mixtures of Dirichlet processes. The motivations for using mixtures of Dirichlet processes is their known ability to model accurately a large class of probability distributions. A Monte Carlo method allowing one to sample according to the resulting class-conditional posterior distributions is then studied. The parameters appearing in the class-conditional densities can then be estimated using these generated samples (following bayesian learning). The proposed supervised classifier is applied to the classification of altimetric waveforms backscattered from different surfaces (oceans, ices, forests, and deserts). This classification is a first step before developing tools allowing for the extraction of useful geophysical information from altimetric waveforms backscattered from nonoceanic surfaces.

  9. Random forest automated supervised classification of Hipparcos periodic variable stars

    CERN Document Server

    Dubath, P; Süveges, M; Blomme, J; López, M; Sarro, L M; De Ridder, J; Cuypers, J; Guy, L; Lecoeur, I; Nienartowicz, K; Jan, A; Beck, M; Mowlavi, N; De Cat, P; Lebzelter, T; Eyer, L

    2011-01-01

    We present an evaluation of the performance of an automated classification of the Hipparcos periodic variable stars into 26 types. The sub-sample with the most reliable variability types available in the literature is used to train supervised algorithms to characterize the type dependencies on a number of attributes. The most useful attributes evaluated with the random forest methodology include, in decreasing order of importance, the period, the amplitude, the V-I colour index, the absolute magnitude, the residual around the folded light-curve model, the magnitude distribution skewness and the amplitude of the second harmonic of the Fourier series model relative to that of the fundamental frequency. Random forests and a multi-stage scheme involving Bayesian network and Gaussian mixture methods lead to statistically equivalent results. In standard 10-fold cross-validation experiments, the rate of correct classification is between 90 and 100%, depending on the variability type. The main mis-classification case...

  10. Genetic classification of populations using supervised learning.

    LENUS (Irish Health Repository)

    Bridges, Michael

    2011-01-01

    There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.

  11. Semi-supervised binary classification algorithm based on global and local regularization%结合全局和局部正则化的半监督二分类算法

    Institute of Scientific and Technical Information of China (English)

    吕佳

    2012-01-01

    As for semi-supervised classification problem, it is difficult to obtain a good classification function for the entire input space if global learning is used alone, while if local learning is utilized alone, a good classification function on some specified regions of the input space can be got. Accordingly, a new semi-supervised binary classification algorithm based on a mixed local and global regularization was presented in this paper. The algorithm integrated the benefits of global regularizer and local regularizes Global regularizer was built to smooth the class labels of the data so as to lessen insufficient training of local regularizer, and based upon the neighboring region, local regularizer was constructed to make class label of each data have the desired property, thus the objective function of semi-supervised binary classification problem was constructed. Comparative semi-supervised binary classification experiments on some benchmark datasets validate that the average classification accuracy and the standard error of the proposed algorithm are obviously superior to other algorithms.%针对在半监督分类问题中单独使用全局学习容易出现的在整个输入空间中较难获得一个优良的决策函数的问题,以及单独使用局部学习可在特定的局部区域内习得较好的决策函数的特点,提出了一种结合全局和局部正则化的半监督二分类算法.该算法综合全局正则项和局部正则项的优点,基于先验知识构建的全局正则项能平滑样本的类标号以避免局部正则项学习不充分的问题,通过基于局部邻域内样本信息构建的局部正则项使得每个样本的类标号具有理想的特性,从而构造出半监督二分类问题的目标函数.通过在标准二类数据集上的实验,结果表明所提出的算法其平均分类正确率和标准误差均优于基于拉普拉斯正则项方法、基于正则化拉普拉斯正则项方法和基于局部学习正则项方法.

  12. Towards designing an email classification system using multi-view based semi-supervised learning

    NARCIS (Netherlands)

    Li, Wenjuan; Meng, Weizhi; Tan, Zhiyuan; Xiang, Yang

    2014-01-01

    The goal of email classification is to classify user emails into spam and legitimate ones. Many supervised learning algorithms have been invented in this domain to accomplish the task, and these algorithms require a large number of labeled training data. However, data labeling is a labor intensive t

  13. Projected estimators for robust semi-supervised classification

    DEFF Research Database (Denmark)

    Krijthe, Jesse H.; Loog, Marco

    2017-01-01

    For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts. We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semi-supervised learning, the procedure...... proposed in this work does not rely on assumptions that are not intrinsic to the classifier at hand. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in terms of this quadratic loss. More...... specifically, we prove that, measured on the labeled and unlabeled training data, this semi-supervised procedure never gives a lower quadratic loss than the supervised alternative. To our knowledge this is the first approach that offers such strong, albeit conservative, guarantees for improvement over...

  14. Random forest automated supervised classification of Hipparcos periodic variable stars

    Science.gov (United States)

    Dubath, P.; Rimoldini, L.; Süveges, M.; Blomme, J.; López, M.; Sarro, L. M.; De Ridder, J.; Cuypers, J.; Guy, L.; Lecoeur, I.; Nienartowicz, K.; Jan, A.; Beck, M.; Mowlavi, N.; De Cat, P.; Lebzelter, T.; Eyer, L.

    2011-07-01

    We present an evaluation of the performance of an automated classification of the Hipparcos periodic variable stars into 26 types. The sub-sample with the most reliable variability types available in the literature is used to train supervised algorithms to characterize the type dependencies on a number of attributes. The most useful attributes evaluated with the random forest methodology include, in decreasing order of importance, the period, the amplitude, the V-I colour index, the absolute magnitude, the residual around the folded light-curve model, the magnitude distribution skewness and the amplitude of the second harmonic of the Fourier series model relative to that of the fundamental frequency. Random forests and a multi-stage scheme involving Bayesian network and Gaussian mixture methods lead to statistically equivalent results. In standard 10-fold cross-validation (CV) experiments, the rate of correct classification is between 90 and 100 per cent, depending on the variability type. The main mis-classification cases, up to a rate of about 10 per cent, arise due to confusion between SPB and ACV blue variables and between eclipsing binaries, ellipsoidal variables and other variability types. Our training set and the predicted types for the other Hipparcos periodic stars are available online.

  15. Out-of-Sample Generalizations for Supervised Manifold Learning for Classification

    Science.gov (United States)

    Vural, Elif; Guillemot, Christine

    2016-03-01

    Supervised manifold learning methods for data classification map data samples residing in a high-dimensional ambient space to a lower-dimensional domain in a structure-preserving way, while enhancing the separation between different classes in the learned embedding. Most nonlinear supervised manifold learning methods compute the embedding of the manifolds only at the initially available training points, while the generalization of the embedding to novel points, known as the out-of-sample extension problem in manifold learning, becomes especially important in classification applications. In this work, we propose a semi-supervised method for building an interpolation function that provides an out-of-sample extension for general supervised manifold learning algorithms studied in the context of classification. The proposed algorithm computes a radial basis function (RBF) interpolator that minimizes an objective function consisting of the total embedding error of unlabeled test samples, defined as their distance to the embeddings of the manifolds of their own class, as well as a regularization term that controls the smoothness of the interpolation function in a direction-dependent way. The class labels of test data and the interpolation function parameters are estimated jointly with a progressive procedure. Experimental results on face and object images demonstrate the potential of the proposed out-of-sample extension algorithm for the classification of manifold-modeled data sets.

  16. Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

    OpenAIRE

    Kurth, Thorsten; Zhang, Jian; Satish, Nadathur; Mitliagkas, Ioannis; Racah, Evan; Patwary, Mostofa Ali; Malas, Tareq; Sundaram, Narayanan; Bhimji, Wahid; Smorkalov, Mikhail; Deslippe, Jack; Shiryaev, Mikhail; Sridharan, Srinivas; Prabhat; Dubey, Pradeep

    2017-01-01

    This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation obtains $\\sim$2TFLOP/s on a single Cori Phase-II Xeon-Phi node. We use a hybrid strategy employin...

  17. Use of Sub-Aperture Decomposition for Supervised PolSAR Classification in Urban Area

    Directory of Open Access Journals (Sweden)

    Lei Deng

    2015-01-01

    Full Text Available A novel approach is proposed for classifying the polarimetric SAR (PolSAR data by integrating polarimetric decomposition, sub-aperture decomposition and decision tree algorithm. It is composed of three key steps: sub-aperture decomposition, feature extraction and combination, and decision tree classification. Feature extraction and combination is the main contribution to the innovation of the proposed method. Firstly, the full-resolution PolSAR image and its two sub-aperture images are decomposed to obtain the scattering entropy, average scattering angle and anisotropy, respectively. Then, the difference information between the two sub-aperture images are extracted, and combined with the target decomposition features from full-resolution images to form the classification feature set. Finally, C5.0 decision tree algorithm is used to classify the PolSAR image. A comparison between the proposed method and commonly-used Wishart supervised classification was made to verify the improvement of the proposed method on the classification. The overall accuracy using the proposed method was 88.39%, much higher than that using the Wishart supervised classification, which exhibited an overall accuracy of 69.82%. The Kappa Coefficient was 0.83, whereas that using the Wishart supervised classification was 0.56. The results indicate that the proposed method performed better than Wishart supervised classification for landscape classification in urban area using PolSAR data. Further investigation was carried out on the contribution of difference information to PolSAR classification. It was found that the sub-aperture decomposition improved the classification accuracy of forest, buildings and grassland effectively in high-density urban area. Compared with support vector machine (SVM and QUEST classifier, C5.0 decision tree classifier performs more efficient in time consumption, feature selection and construction of decision rule.

  18. Semi-supervised SVM for individual tree crown species classification

    Science.gov (United States)

    Dalponte, Michele; Ene, Liviu Theodor; Marconcini, Mattia; Gobakken, Terje; Næsset, Erik

    2015-12-01

    In this paper a novel semi-supervised SVM classifier is presented, specifically developed for tree species classification at individual tree crown (ITC) level. In ITC tree species classification, all the pixels belonging to an ITC should have the same label. This assumption is used in the learning of the proposed semi-supervised SVM classifier (ITC-S3VM). This method exploits the information contained in the unlabeled ITC samples in order to improve the classification accuracy of a standard SVM. The ITC-S3VM method can be easily implemented using freely available software libraries. The datasets used in this study include hyperspectral imagery and laser scanning data acquired over two boreal forest areas characterized by the presence of three information classes (Pine, Spruce, and Broadleaves). The experimental results quantify the effectiveness of the proposed approach, which provides classification accuracies significantly higher (from 2% to above 27%) than those obtained by the standard supervised SVM and by a state-of-the-art semi-supervised SVM (S3VM). Particularly, by reducing the number of training samples (i.e. from 100% to 25%, and from 100% to 5% for the two datasets, respectively) the proposed method still exhibits results comparable to the ones of a supervised SVM trained with the full available training set. This property of the method makes it particularly suitable for practical forest inventory applications in which collection of in situ information can be very expensive both in terms of cost and time.

  19. Comparison of Classification Algorithms and Training Sample Sizes in Urban Land Classification with Landsat Thematic Mapper Imagery

    OpenAIRE

    Congcong Li; Jie Wang; Lei Wang; Luanyun Hu; Peng Gong

    2014-01-01

    Although a large number of new image classification algorithms have been developed, they are rarely tested with the same classification task. In this research, with the same Landsat Thematic Mapper (TM) data set and the same classification scheme over Guangzhou City, China, we tested two unsupervised and 13 supervised classification algorithms, including a number of machine learning algorithms that became popular in remote sensing during the past 20 years. Our analysis focused primarily on ...

  20. Classification of Global Illumination Algorithms

    OpenAIRE

    Lesev, Hristo

    2010-01-01

    This article describes and classifies various approaches for solving the global illumination problem. The classification aims to show the similarities between different types of algorithms. We introduce the concept of Light Manager, as a central element and mediator between illumination algorithms in a heterogeneous environment of a graphical system. We present results and analysis of the implementation of the described ideas.

  1. Enhanced manifold regularization for semi-supervised classification.

    Science.gov (United States)

    Gan, Haitao; Luo, Zhizeng; Fan, Yingle; Sang, Nong

    2016-06-01

    Manifold regularization (MR) has become one of the most widely used approaches in the semi-supervised learning field. It has shown superiority by exploiting the local manifold structure of both labeled and unlabeled data. The manifold structure is modeled by constructing a Laplacian graph and then incorporated in learning through a smoothness regularization term. Hence the labels of labeled and unlabeled data vary smoothly along the geodesics on the manifold. However, MR has ignored the discriminative ability of the labeled and unlabeled data. To address the problem, we propose an enhanced MR framework for semi-supervised classification in which the local discriminative information of the labeled and unlabeled data is explicitly exploited. To make full use of labeled data, we firstly employ a semi-supervised clustering method to discover the underlying data space structure of the whole dataset. Then we construct a local discrimination graph to model the discriminative information of labeled and unlabeled data according to the discovered intrinsic structure. Therefore, the data points that may be from different clusters, though similar on the manifold, are enforced far away from each other. Finally, the discrimination graph is incorporated into the MR framework. In particular, we utilize semi-supervised fuzzy c-means and Laplacian regularized Kernel minimum squared error for semi-supervised clustering and classification, respectively. Experimental results on several benchmark datasets and face recognition demonstrate the effectiveness of our proposed method.

  2. Quality of Service Routing Strategy Using Supervised Genetic Algorithm

    Institute of Scientific and Technical Information of China (English)

    WANG Zhaoxia; SUN Yugeng; WANG Zhiyong; SHEN Huayu

    2007-01-01

    A supervised genetic algorithm (SGA) is proposed to solve the quality of service (QoS)routing problems in computer networks. The supervised rules of intelligent concept are introduced into genetic algorithms (GAs) to solve the constraint optimization problem. One of the main characteristics of SGA is its searching space can be limited in feasible regions rather than infeasible regions. The superiority of SGA to other GAs lies in that some supervised search rules in which the information comes from the problems are incorporated into SGA. The simulation results show that SGA improves the ability of searching an optimum solution and accelerates the convergent process up to 20 times.

  3. Semi Supervised Weighted K-Means Clustering for Multi Class Data Classification

    Directory of Open Access Journals (Sweden)

    Vijaya Geeta Dharmavaram

    2013-01-01

    Full Text Available Supervised Learning techniques require large number of labeled examples to train a classifier model. Research on Semi Supervised Learning is motivated by the availability of unlabeled examples in abundance even in domains with limited number of labeled examples. In such domains semi supervised classifier uses the results of clustering for classifier development since clustering does not rely only on labeled examples as it groups the objects based on their similarities. In this paper, the authors propose a new algorithm for semi supervised classification namely Semi Supervised Weighted K-Means (SSWKM. In this algorithm, the authors suggest the usage of weighted Euclidean distance metric designed as per the purpose of clustering for estimating the proximity between a pair of points and used it for building semi supervised classifier. The authors propose a new approach for estimating the weights of features by appropriately adopting the results of multiple discriminant analysis. The proposed method was then tested on benchmark datasets from UCI repository with varied percentage of labeled examples and found to be consistent and promising.

  4. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

    Directory of Open Access Journals (Sweden)

    Deborah Galpert

    2015-01-01

    Full Text Available Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

  5. Musical Instrument Classification Based on Nonlinear Recurrence Analysis and Supervised Learning

    Directory of Open Access Journals (Sweden)

    R.Rui

    2013-04-01

    Full Text Available In this paper, the phase space reconstruction of time series produced by different instruments is discussed based on the nonlinear dynamic theory. The dense ratio, a novel quantitative recurrence parameter, is proposed to describe the difference of wind instruments, stringed instruments and keyboard instruments in the phase space by analyzing the recursive property of every instrument. Furthermore, a novel supervised learning algorithm for automatic classification of individual musical instrument signals is addressed deriving from the idea of supervised non-negative matrix factorization (NMF algorithm. In our approach, the orthogonal basis matrix could be obtained without updating the matrix iteratively, which NMF is unable to do. The experimental results indicate that the accuracy of the proposed method is improved by 3% comparing with the conventional features in the individual instrument classification.

  6. A new supervised learning algorithm for spiking neurons.

    Science.gov (United States)

    Xu, Yan; Zeng, Xiaoqin; Zhong, Shuiming

    2013-06-01

    The purpose of supervised learning with temporal encoding for spiking neurons is to make the neurons emit a specific spike train encoded by the precise firing times of spikes. If only running time is considered, the supervised learning for a spiking neuron is equivalent to distinguishing the times of desired output spikes and the other time during the running process of the neuron through adjusting synaptic weights, which can be regarded as a classification problem. Based on this idea, this letter proposes a new supervised learning method for spiking neurons with temporal encoding; it first transforms the supervised learning into a classification problem and then solves the problem by using the perceptron learning rule. The experiment results show that the proposed method has higher learning accuracy and efficiency over the existing learning methods, so it is more powerful for solving complex and real-time problems.

  7. Supervised Classification Methods for Seismic Phase Identification

    Science.gov (United States)

    Schneider, Jeff; Given, Jeff; Le Bras, Ronan; Fisseha, Misrak

    2010-05-01

    The Comprehensive Nuclear Test Ban Treaty Organization (CTBTO) is tasked with monitoring compliance with the CTBT. The organization is installing the International Monitoring System (IMS), a global network of seismic, hydroacoustic, infrasound, and radionuclide sensor stations. The International Data Centre (IDC) receives the data from seismic stations either in real time or on request. These data are first processed on a station per station basis. This initial step yields discrete detections which are then assembled on a network basis (with the addition of hydroacoustic and infrasound data) to produce automatic and analyst reviewed bulletins containing seismic, hydroacoustic, and infrasound detections. The initial station processing step includes the identification of seismic and acoustic phases which are given a label. Subsequent network processing relies on this preliminary labeling, and as a consequence, the accuracy and reliability of automatic and reviewed bulletins also depend on this initial step. A very large ground truth database containing massive amounts of detections with analyst-reviewed labels is available to improve on the current operational system using machine learning methods. An initial study using a limited amount of data was conducted during the ISS09 project of the CTBTO. Several classification methods were tested: decision tree with bagging; logistic regression; neural networks trained with back-propagation; Bayesian networks as generative class models; naive Bayse classification; support vector machines. The initial assessment was that the phase identification process could be improved by at least 13% over the current operational system and that the method obtaining the best results was the decision tree with bagging. We present the results of a study using a much larger learning dataset and preliminary implementation results.

  8. Semi-supervised Learning for Photometric Supernova Classification

    CERN Document Server

    Richards, Joseph W; Freeman, Peter E; Schafer, Chad M; Poznanski, Dovi

    2011-01-01

    We present a semi-supervised method for photometric supernova typing. Our approach is to first use the nonlinear dimension reduction technique diffusion map to detect structure in a database of supernova light curves and subsequently employ random forest classification on a spectroscopically confirmed training set to learn a model that can predict the type of each newly observed supernova. We demonstrate that this is an effective method for supernova typing. As supernova numbers increase, our semi-supervised method efficiently utilizes this information to improve classification, a property not enjoyed by template based methods. Applied to supernova data simulated by Kessler et al. (2010b) to mimic those of the Dark Energy Survey, our methods achieve (cross-validated) 96% Type Ia purity and 86% Type Ia efficiency on the spectroscopic sample, but only 56% Type Ia purity and 48% efficiency on the photometric sample due to their spectroscopic followup strategy. To improve the performance on the photometric sample...

  9. Supervised and unsupervised classification - The case of IRAS point sources

    Science.gov (United States)

    Adorf, Hans-Martin; Meurs, E. J. A.

    Progress is reported on a project which aims at mapping the extragalactic sky in order to derive the large scale distribution of luminous matter. The approach consists in selecting from the IRAS Point Source Catalog a set of galaxies which is as clean and as complete as possible. The decision and discrimination problems involved lend themselves to a treatment using methods from multivariate statistics, in particular statistical pattern recognition. Two different approaches, one based on supervised Bayesian classification, the other on unsupervised data-driven classification, are presented and some preliminary results are reported.

  10. Phenotype classification of zebrafish embryos by supervised learning.

    Directory of Open Access Journals (Sweden)

    Nathalie Jeanray

    Full Text Available Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification.

  11. Phenotype classification of zebrafish embryos by supervised learning.

    Science.gov (United States)

    Jeanray, Nathalie; Marée, Raphaël; Pruvot, Benoist; Stern, Olivier; Geurts, Pierre; Wehenkel, Louis; Muller, Marc

    2015-01-01

    Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification.

  12. Improved supervised classification of accelerometry data to distinguish behaviors of soaring birds

    Science.gov (United States)

    Sur, Maitreyi; Suffredini, Tony; Wessells, Stephen M.; Bloom, Peter H; Lanzone, Michael; Blackshire, Sheldon; Sridhar, Srisarguru; Katzner, Todd

    2017-01-01

    Soaring birds can balance the energetic costs of movement by switching between flapping, soaring and gliding flight. Accelerometers can allow quantification of flight behavior and thus a context to interpret these energetic costs. However, models to interpret accelerometry data are still being developed, rarely trained with supervised datasets, and difficult to apply. We collected accelerometry data at 140Hz from a trained golden eagle (Aquila chrysaetos) whose flight we recorded with video that we used to characterize behavior. We applied two forms of supervised classifications, random forest (RF) models and K-nearest neighbor (KNN) models. The KNN model was substantially easier to implement than the RF approach but both were highly accurate in classifying basic behaviors such as flapping (85.5% and 83.6% accurate, respectively), soaring (92.8% and 87.6%) and sitting (84.1% and 88.9%) with overall accuracies of 86.6% and 92.3% respectively. More detailed classification schemes, with specific behaviors such as banking and straight flights were well classified only by the KNN model (91.24% accurate; RF = 61.64% accurate). The RF model maintained its accuracy of classifying basic behavior classification accuracy of basic behaviors at sampling frequencies as low as 10Hz, the KNN at sampling frequencies as low as 20Hz. Classification of accelerometer data collected from free ranging birds demonstrated a strong dependence of predicted behavior on the type of classification model used. Our analyses demonstrate the consequence of different approaches to classification of accelerometry data, the potential to optimize classification algorithms with validated flight behaviors to improve classification accuracy, ideal sampling frequencies for different classification algorithms, and a number of ways to improve commonly used analytical techniques and best practices for classification of accelerometry data.

  13. [RVM supervised feature extraction and Seyfert spectra classification].

    Science.gov (United States)

    Li, Xiang-Ru; Hu, Zhan-Yi; Zhao, Yong-Heng; Li, Xiao-Ming

    2009-06-01

    With recent technological advances in wide field survey astronomy and implementation of several large-scale astronomical survey proposals (e. g. SDSS, 2dF and LAMOST), celestial spectra are becoming very abundant and rich. Therefore, research on automated classification methods based on celestial spectra has been attracting more and more attention in recent years. Feature extraction is a fundamental problem in automated spectral classification, which not only influences the difficulty and complexity of the problem, but also determines the performance of the designed classifying system. The available methods of feature extraction for spectra classification are usually unsupervised, e. g. principal components analysis (PCA), wavelet transform (WT), artificial neural networks (ANN) and Rough Set theory. These methods extract features not by their capability to classify spectra, but by some kind of power to approximate the original celestial spectra. Therefore, the extracted features by these methods usually are not the best ones for classification. In the present work, the authors pointed out the necessary to investigate supervised feature extraction by analyzing the characteristics of the spectra classification research in available literature and the limitations of unsupervised feature extracting methods. And the authors also studied supervised feature extracting based on relevance vector machine (RVM) and its application in Seyfert spectra classification. RVM is a recently introduced method based on Bayesian methodology, automatic relevance determination (ARD), regularization technique and hierarchical priors structure. By this method, the authors can easily fuse the information in training data, the authors' prior knowledge and belief in the problem, etc. And RVM could effectively extract the features and reduce the data based on classifying capability. Extensive experiments show its superior performance in dimensional reduction and feature extraction for Seyfert

  14. Developing a supervised training algorithm for limited precision feed-forward spiking neural networks

    CERN Document Server

    Stromatias, Evangelos

    2011-01-01

    Spiking neural networks have been referred to as the third generation of artificial neural networks where the information is coded as time of the spikes. There are a number of different spiking neuron models available and they are categorized based on their level of abstraction. In addition, there are two known learning methods, unsupervised and supervised learning. This thesis focuses on supervised learning where a new algorithm is proposed, based on genetic algorithms. The proposed algorithm is able to train both synaptic weights and delays and also allow each neuron to emit multiple spikes thus taking full advantage of the spatial-temporal coding power of the spiking neurons. In addition, limited synaptic precision is applied; only six bits are used to describe and train a synapse, three bits for the weights and three bits for the delays. Two limited precision schemes are investigated. The proposed algorithm is tested on the XOR classification problem where it produces better results for even smaller netwo...

  15. Detection and Evaluation of Cheating on College Exams using Supervised Classification

    Directory of Open Access Journals (Sweden)

    Elmano Ramalho CAVALCANTI

    2012-10-01

    Full Text Available Text mining has been used for various purposes, such as document classification and extraction of domain-specific information from text. In this paper we present a study in which text mining methodology and algorithms were properly employed for academic dishonesty (cheating detection and evaluation on open-ended college exams, based on document classification techniques. Firstly, we propose two classification models for cheating detection by using a decision tree supervised algorithm. Then, both classifiers are compared against the result produced by a domain expert. The results point out that one of the classifiers achieved an excellent quality in detecting and evaluating cheating in exams, making possible its use in real school and college environments.

  16. Supervised learning algorithms for visual object categorization

    NARCIS (Netherlands)

    bin Abdullah, A.

    2010-01-01

    This thesis presents novel techniques for image recognition systems for better understanding image content. More specifically, it looks at the algorithmic aspects and experimental verification to demonstrate the capability of the proposed algorithms. These techniques aim to improve the three major

  17. Supervised and Unsupervised Classification for Pattern Recognition Purposes

    Directory of Open Access Journals (Sweden)

    Catalina COCIANU

    2006-01-01

    Full Text Available A cluster analysis task has to identify the grouping trends of data, to decide on the sound clusters as well as to validate somehow the resulted structure. The identification of the grouping tendency existing in a data collection assumes the selection of a framework stated in terms of a mathematical model allowing to express the similarity degree between couples of particular objects, quasi-metrics expressing the similarity between an object an a cluster and between clusters, respectively. In supervised classification, we are provided with a collection of preclassified patterns, and the problem is to label a newly encountered pattern. Typically, the given training patterns are used to learn the descriptions of classes which in turn are used to label a new pattern. The final section of the paper presents a new methodology for supervised learning based on PCA. The classes are represented in the measurement/feature space by a continuous repartitions

  18. Artificial neural network classification using a minimal training set - Comparison to conventional supervised classification

    Science.gov (United States)

    Hepner, George F.; Logan, Thomas; Ritter, Niles; Bryant, Nevin

    1990-01-01

    Recent research has shown an artificial neural network (ANN) to be capable of pattern recognition and the classification of image data. This paper examines the potential for the application of neural network computing to satellite image processing. A second objective is to provide a preliminary comparison and ANN classification. An artificial neural network can be trained to do land-cover classification of satellite imagery using selected sites representative of each class in a manner similar to conventional supervised classification. One of the major problems associated with recognition and classifications of pattern from remotely sensed data is the time and cost of developing a set of training sites. This reseach compares the use of an ANN back propagation classification procedure with a conventional supervised maximum likelihood classification procedure using a minimal training set. When using a minimal training set, the neural network is able to provide a land-cover classification superior to the classification derived from the conventional classification procedure. This research is the foundation for developing application parameters for further prototyping of software and hardware implementations for artificial neural networks in satellite image and geographic information processing.

  19. Classification of ETM+ Remote Sensing Image Based on Hybrid Algorithm of Genetic Algorithm and Back Propagation Neural Network

    Directory of Open Access Journals (Sweden)

    Haisheng Song

    2013-01-01

    Full Text Available The back propagation neural network (BPNN algorithm can be used as a supervised classification in the processing of remote sensing image classification. But its defects are obvious: falling into the local minimum value easily, slow convergence speed, and being difficult to determine intermediate hidden layer nodes. Genetic algorithm (GA has the advantages of global optimization and being not easy to fall into local minimum value, but it has the disadvantage of poor local searching capability. This paper uses GA to generate the initial structure of BPNN. Then, the stable, efficient, and fast BP classification network is gotten through making fine adjustments on the improved BP algorithm. Finally, we use the hybrid algorithm to execute classification on remote sensing image and compare it with the improved BP algorithm and traditional maximum likelihood classification (MLC algorithm. Results of experiments show that the hybrid algorithm outperforms improved BP algorithm and MLC algorithm.

  20. Weakly supervised histopathology cancer image segmentation and classification.

    Science.gov (United States)

    Xu, Yan; Zhu, Jun-Yan; Chang, Eric I-Chao; Lai, Maode; Tu, Zhuowen

    2014-04-01

    Labeling a histopathology image as having cancerous regions or not is a critical task in cancer diagnosis; it is also clinically important to segment the cancer tissues and cluster them into various classes. Existing supervised approaches for image classification and segmentation require detailed manual annotations for the cancer pixels, which are time-consuming to obtain. In this paper, we propose a new learning method, multiple clustered instance learning (MCIL) (along the line of weakly supervised learning) for histopathology image segmentation. The proposed MCIL method simultaneously performs image-level classification (cancer vs. non-cancer image), medical image segmentation (cancer vs. non-cancer tissue), and patch-level clustering (different classes). We embed the clustering concept into the multiple instance learning (MIL) setting and derive a principled solution to performing the above three tasks in an integrated framework. In addition, we introduce contextual constraints as a prior for MCIL, which further reduces the ambiguity in MIL. Experimental results on histopathology colon cancer images and cytology images demonstrate the great advantage of MCIL over the competing methods.

  1. Semi-Supervised Learning for Classification of Protein Sequence Data

    Directory of Open Access Journals (Sweden)

    Brian R. King

    2008-01-01

    Full Text Available Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.

  2. Supervised Cross-Modal Factor Analysis for Multiple Modal Data Classification

    KAUST Repository

    Wang, Jingbin

    2015-10-09

    In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., An image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods.

  3. A supervised contextual classifier based on a region-growth algorithm

    DEFF Research Database (Denmark)

    Lira, Jorge; Maletti, Gabriela Mariel

    2002-01-01

    A supervised classification scheme to segment optical multi-spectral images has been developed. In this classifier, an automated region-growth algorithm delineates the training sets. This algorithm handles three parameters: an initial pixel seed, a window size and a threshold for each class...... pixel seed. The grown regions therefore constitute suitable training sets for each class. Comparing the statistical behavior of the pixel population of a sliding window with that of each class performs the classification. For region-growth, a window size is employed for each class. For classification....... A suitable pixel seed is manually implanted through visual inspection of the image classes. The best value for the window and the threshold are obtained from a spectral distance and heuristic criteria. This distance is calculated from a mathematical model of spectral separability. A pixel is incorporated...

  4. Classification algorithms using adaptive partitioning

    KAUST Repository

    Binev, Peter

    2014-12-01

    © 2014 Institute of Mathematical Statistics. Algorithms for binary classification based on adaptive tree partitioning are formulated and analyzed for both their risk performance and their friendliness to numerical implementation. The algorithms can be viewed as generating a set approximation to the Bayes set and thus fall into the general category of set estimators. In contrast with the most studied tree-based algorithms, which utilize piecewise constant approximation on the generated partition [IEEE Trans. Inform. Theory 52 (2006) 1335.1353; Mach. Learn. 66 (2007) 209.242], we consider decorated trees, which allow us to derive higher order methods. Convergence rates for these methods are derived in terms the parameter - of margin conditions and a rate s of best approximation of the Bayes set by decorated adaptive partitions. They can also be expressed in terms of the Besov smoothness β of the regression function that governs its approximability by piecewise polynomials on adaptive partition. The execution of the algorithms does not require knowledge of the smoothness or margin conditions. Besov smoothness conditions are weaker than the commonly used Holder conditions, which govern approximation by nonadaptive partitions, and therefore for a given regression function can result in a higher rate of convergence. This in turn mitigates the compatibility conflict between smoothness and margin parameters.

  5. Automatic age and gender classification using supervised appearance model

    Science.gov (United States)

    Bukar, Ali Maina; Ugail, Hassan; Connah, David

    2016-11-01

    Age and gender classification are two important problems that recently gained popularity in the research community, due to their wide range of applications. Research has shown that both age and gender information are encoded in the face shape and texture, hence the active appearance model (AAM), a statistical model that captures shape and texture variations, has been one of the most widely used feature extraction techniques for the aforementioned problems. However, AAM suffers from some drawbacks, especially when used for classification. This is primarily because principal component analysis (PCA), which is at the core of the model, works in an unsupervised manner, i.e., PCA dimensionality reduction does not take into account how the predictor variables relate to the response (class labels). Rather, it explores only the underlying structure of the predictor variables, thus, it is no surprise if PCA discards valuable parts of the data that represent discriminatory features. Toward this end, we propose a supervised appearance model (sAM) that improves on AAM by replacing PCA with partial least-squares regression. This feature extraction technique is then used for the problems of age and gender classification. Our experiments show that sAM has better predictive power than the conventional AAM.

  6. Sentiment Analysis of Twitter tweets using supervised classification technique

    Directory of Open Access Journals (Sweden)

    Pranav Waykar

    2016-05-01

    Full Text Available Making use of social media for analyzing the perceptions of the masses over a product, event or a person has gained momentum in recent times. Out of a wide array of social networks, we chose Twitter for our analysis as the opinions expressed their, are concise and bear a distinctive polarity. Here, we collect the most recent tweets on users' area of interest and analyze them. The extracted tweets are then segregated as positive, negative and neutral. We do the classification in following manner: collect the tweets using Twitter API; then we process the collected tweets to convert all letters to lowercase, eliminate special characters etc. which makes the classification more efficient; the processed tweets are classified using a supervised classification technique. We make use of Naive Bayes classifier to segregate the tweets as positive, negative and neutral. We use a set of sample tweets to train the classifier. The percentage of the tweets in each category is then computed and the result is represented graphically. The result can be used further to gain an insight into the views of the people using Twitter about a particular topic that is being searched by the user. It can help corporate houses devise strategies on the basis of the popularity of their product among the masses. It may help the consumers to make informed choices based on the general sentiment expressed by the Twitter users on a product

  7. Genomic-enabled prediction with classification algorithms.

    Science.gov (United States)

    Ornella, L; Pérez, P; Tapia, E; González-Camacho, J M; Burgueño, J; Zhang, X; Singh, S; Vicente, F S; Bonnett, D; Dreisigacker, S; Singh, R; Long, N; Crossa, J

    2014-06-01

    Pearson's correlation coefficient (ρ) is the most commonly reported metric of the success of prediction in genomic selection (GS). However, in real breeding ρ may not be very useful for assessing the quality of the regression in the tails of the distribution, where individuals are chosen for selection. This research used 14 maize and 16 wheat data sets with different trait-environment combinations. Six different models were evaluated by means of a cross-validation scheme (50 random partitions each, with 90% of the individuals in the training set and 10% in the testing set). The predictive accuracy of these algorithms for selecting individuals belonging to the best α=10, 15, 20, 25, 30, 35, 40% of the distribution was estimated using Cohen's kappa coefficient (κ) and an ad hoc measure, which we call relative efficiency (RE), which indicates the expected genetic gain due to selection when individuals are selected based on GS exclusively. We put special emphasis on the analysis for α=15%, because it is a percentile commonly used in plant breeding programmes (for example, at CIMMYT). We also used ρ as a criterion for overall success. The algorithms used were: Bayesian LASSO (BL), Ridge Regression (RR), Reproducing Kernel Hilbert Spaces (RHKS), Random Forest Regression (RFR), and Support Vector Regression (SVR) with linear (lin) and Gaussian kernels (rbf). The performance of regression methods for selecting the best individuals was compared with that of three supervised classification algorithms: Random Forest Classification (RFC) and Support Vector Classification (SVC) with linear (lin) and Gaussian (rbf) kernels. Classification methods were evaluated using the same cross-validation scheme but with the response vector of the original training sets dichotomised using a given threshold. For α=15%, SVC-lin presented the highest κ coefficients in 13 of the 14 maize data sets, with best values ranging from 0.131 to 0.722 (statistically significant in 9 data sets

  8. Supervised neural networks for the classification of structures.

    Science.gov (United States)

    Sperduti, A; Starita, A

    1997-01-01

    Standard neural networks and statistical methods are usually believed to be inadequate when dealing with complex structures because of their feature-based approach. In fact, feature-based approaches usually fail to give satisfactory solutions because of the sensitivity of the approach to the a priori selection of the features, and the incapacity to represent any specific information on the relationships among the components of the structures. However, we show that neural networks can, in fact, represent and classify structured patterns. The key idea underpinning our approach is the use of the so called "generalized recursive neuron", which is essentially a generalization to structures of a recurrent neuron. By using generalized recursive neurons, all the supervised networks developed for the classification of sequences, such as backpropagation through time networks, real-time recurrent networks, simple recurrent networks, recurrent cascade correlation networks, and neural trees can, on the whole, be generalized to structures. The results obtained by some of the above networks (with generalized recursive neurons) on the classification of logic terms are presented.

  9. Semi-Supervised Classification based on Gaussian Mixture Model for remote imagery

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Semi-Supervised Classification (SSC),which makes use of both labeled and unlabeled data to determine classification borders in feature space,has great advantages in extracting classification information from mass data.In this paper,a novel SSC method based on Gaussian Mixture Model (GMM) is proposed,in which each class’s feature space is described by one GMM.Experiments show the proposed method can achieve high classification accuracy with small amount of labeled data.However,for the same accuracy,supervised classification methods such as Support Vector Machine,Object Oriented Classification,etc.should be provided with much more labeled data.

  10. Establishing a Supervised Classification of Global Blue Carbon Mangrove Ecosystems

    Science.gov (United States)

    Baltezar, P.

    2016-12-01

    Understanding change in mangroves over time will aid forest management systems working to protect them from over exploitation. Mangroves are one of the most carbon dense terrestrial ecosystems on the planet and are therefore a high priority for sustainable forest management. Although they represent 1% of terrestrial cover, they could account for about 10% of global carbon emissions. The foundation of this analysis uses remote sensing to establish a supervised classification of mangrove forests for discrete regions in the Zambezi Delta of Mozambique and the Rufiji Delta of Tanzania. Open-source mapping platforms provided a dynamic space for analyzing satellite imagery in the Google Earth Engine (GEE) coding environment. C-Band Synthetic Aperture Radar data from Sentinel 1 was used in the model as a mask by optimizing SAR parameters. Exclusion metrics identified within Global Land Surface Temperature data from MODIS and the Shuttle Radar Topography Mission were used to accentuate mangrove features. Variance was accounted for in exclusion metrics by statistically calculating thresholds for radar, thermal, and elevation data. Optical imagery from the Landsat 8 archive aided a quality mosaic in extracting the highest spectral index values most appropriate for vegetative mapping. The enhanced radar, thermal, and digital elevation imagery were then incorporated into the quality mosaic. Training sites were selected from Google Earth imagery and used in the classification with a resulting output of four mangrove cover map models for each site. The model was assessed for accuracy by observing the differences between the mangrove classification models to the reference maps. Although the model was over predicting mangroves in non-mangrove regions, it was more accurately classifying mangrove regions established by the references. Future refinements will expand the model with an objective degree of accuracy.

  11. A new tool for supervised classification of satellite images available on web servers: Google Maps as a case study

    Science.gov (United States)

    García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun

    2016-10-01

    This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.

  12. Tuning, Diagnostics & Data Preparation for Generalized Linear Models Supervised Algorithm in Data Mining Technologies

    Directory of Open Access Journals (Sweden)

    Sachin Bhaskar

    2015-07-01

    Full Text Available Data mining techniques are the result of a long process of research and product development. Large amount of data are searched by the practice of Data Mining to find out the trends and patterns that go beyond simple analysis. For segmentation of data and also to evaluate the possibility of future events, complex mathematical algorithms are used here. Specific algorithm produces each Data Mining model. More than one algorithms are used to solve in best way by some Data Mining problems. Data Mining technologies can be used through Oracle. Generalized Linear Models (GLM Algorithm is used in Regression and Classification Oracle Data Mining functions. For linear modelling, GLM is one the popular statistical techniques. For regression and binary classification, GLM is implemented by Oracle Data Mining. Row diagnostics as well as model statistics and extensive co-efficient statistics are provided by GLM. It also supports confidence bounds.. This paper outlines and produces analysis of GLM algorithm, which will guide to understand the tuning, diagnostics & data preparation process and the importance of Regression & Classification supervised Oracle Data Mining functions and it is utilized in marketing, time series prediction, financial forecasting, overall business planning, trend analysis, environmental modelling, biomedical and drug response modelling, etc.

  13. Classification of autism spectrum disorder using supervised learning of brain connectivity measures extracted from synchrostates

    Science.gov (United States)

    Jamal, Wasifa; Das, Saptarshi; Oprescu, Ioana-Anastasia; Maharatna, Koushik; Apicella, Fabio; Sicca, Federico

    2014-08-01

    Objective. The paper investigates the presence of autism using the functional brain connectivity measures derived from electro-encephalogram (EEG) of children during face perception tasks. Approach. Phase synchronized patterns from 128-channel EEG signals are obtained for typical children and children with autism spectrum disorder (ASD). The phase synchronized states or synchrostates temporally switch amongst themselves as an underlying process for the completion of a particular cognitive task. We used 12 subjects in each group (ASD and typical) for analyzing their EEG while processing fearful, happy and neutral faces. The minimal and maximally occurring synchrostates for each subject are chosen for extraction of brain connectivity features, which are used for classification between these two groups of subjects. Among different supervised learning techniques, we here explored the discriminant analysis and support vector machine both with polynomial kernels for the classification task. Main results. The leave one out cross-validation of the classification algorithm gives 94.7% accuracy as the best performance with corresponding sensitivity and specificity values as 85.7% and 100% respectively. Significance. The proposed method gives high classification accuracies and outperforms other contemporary research results. The effectiveness of the proposed method for classification of autistic and typical children suggests the possibility of using it on a larger population to validate it for clinical practice.

  14. Fall detection using supervised machine learning algorithms: A comparative study

    KAUST Repository

    Zerrouki, Nabil

    2017-01-05

    Fall incidents are considered as the leading cause of disability and even mortality among older adults. To address this problem, fall detection and prevention fields receive a lot of intention over the past years and attracted many researcher efforts. We present in the current study an overall performance comparison between fall detection systems using the most popular machine learning approaches which are: Naïve Bayes, K nearest neighbor, neural network, and support vector machine. The analysis of the classification power associated to these most widely utilized algorithms is conducted on two fall detection databases namely FDD and URFD. Since the performance of the classification algorithm is inherently dependent on the features, we extracted and used the same features for all classifiers. The classification evaluation is conducted using different state of the art statistical measures such as the overall accuracy, the F-measure coefficient, and the area under ROC curve (AUC) value.

  15. Supervised classification of solar features using prior information

    Directory of Open Access Journals (Sweden)

    De Visscher Ruben

    2015-01-01

    Full Text Available Context: The Sun as seen by Extreme Ultraviolet (EUV telescopes exhibits a variety of large-scale structures. Of particular interest for space-weather applications is the extraction of active regions (AR and coronal holes (CH. The next generation of GOES-R satellites will provide continuous monitoring of the solar corona in six EUV bandpasses that are similar to the ones provided by the SDO-AIA EUV telescope since May 2010. Supervised segmentations of EUV images that are consistent with manual segmentations by for example space-weather forecasters help in extracting useful information from the raw data. Aims: We present a supervised segmentation method that is based on the Maximum A Posteriori rule. Our method allows integrating both manually segmented images as well as other type of information. It is applied on SDO-AIA images to segment them into AR, CH, and the remaining Quiet Sun (QS part. Methods: A Bayesian classifier is applied on training masks provided by the user. The noise structure in EUV images is non-trivial, and this suggests the use of a non-parametric kernel density estimator to fit the intensity distribution within each class. Under the Naive Bayes assumption we can add information such as latitude distribution and total coverage of each class in a consistent manner. Those information can be prescribed by an expert or estimated with an Expectation-Maximization algorithm. Results: The segmentation masks are in line with the training masks given as input and show consistency over time. Introduction of additional information besides pixel intensity improves upon the quality of the final segmentation. Conclusions: Such a tool can aid in building automated segmentations that are consistent with some ground truth’ defined by the users.

  16. A novel supervised trajectory segmentation algorithm identifies distinct types of human adenovirus motion in host cells.

    Science.gov (United States)

    Helmuth, Jo A; Burckhardt, Christoph J; Koumoutsakos, Petros; Greber, Urs F; Sbalzarini, Ivo F

    2007-09-01

    Biological trajectories can be characterized by transient patterns that may provide insight into the interactions of the moving object with its immediate environment. The accurate and automated identification of trajectory motifs is important for the understanding of the underlying mechanisms. In this work, we develop a novel trajectory segmentation algorithm based on supervised support vector classification. The algorithm is validated on synthetic data and applied to the identification of trajectory fingerprints of fluorescently tagged human adenovirus particles in live cells. In virus trajectories on the cell surface, periods of confined motion, slow drift, and fast drift are efficiently detected. Additionally, directed motion is found for viruses in the cytoplasm. The algorithm enables the linking of microscopic observations to molecular phenomena that are critical in many biological processes, including infectious pathogen entry and signal transduction.

  17. Image Classification through integrated K- Means Algorithm

    Directory of Open Access Journals (Sweden)

    Balasubramanian Subbiah

    2012-03-01

    Full Text Available Image Classification has a significant role in the field of medical diagnosis as well as mining analysis and is even used for cancer diagnosis in the recent years. Clustering analysis is a valuable and useful tool for image classification and object diagnosis. A variety of clustering algorithms are available and still this is a topic of interest in the image processing field. However, these clustering algorithms are confronted with difficulties in meeting the optimum quality requirements, automation and robustness requirements. In this paper, we propose two clustering algorithm combinations with integration of K-Means algorithm that can tackle some of these problems. Comparison study is made between these two novel combination algorithms. The experimental results demonstrate that the proposed algorithms are very effective in producing desired clusters of the given data sets as well as diagnosis. These algorithms are very much useful for image classification as well as extraction of objects.

  18. CFSO3: A New Supervised Swarm-Based Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Antonino Laudani

    2013-01-01

    Full Text Available We present CFSO3, an optimization heuristic within the class of the swarm intelligence, based on a synergy among three different features of the Continuous Flock-of-Starlings Optimization. One of the main novelties is that this optimizer is no more a classical numerical algorithm since it now can be seen as a continuous dynamic system, which can be treated by using all the mathematical instruments available for managing state equations. In addition, CFSO3 allows passing from stochastic approaches to supervised deterministic ones since the random updating of parameters, a typical feature for numerical swam-based optimization algorithms, is now fully substituted by a supervised strategy: in CFSO3 the tuning of parameters is a priori designed for obtaining both exploration and exploitation. Indeed the exploration, that is, the escaping from a local minimum, as well as the convergence and the refinement to a solution can be designed simply by managing the eigenvalues of the CFSO state equations. Virtually in CFSO3, just the initial values of positions and velocities of the swarm members have to be randomly assigned. Both standard and parallel versions of CFSO3 together with validations on classical benchmarks are presented.

  19. MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING

    Data.gov (United States)

    National Aeronautics and Space Administration — MULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING MOHAMMAD SALIM AHMED, LATIFUR KHAN, NIKUNJ OZA, AND MANDAVA RAJESWARI Abstract....

  20. Summarizing Relational Data Using Semi-Supervised Genetic Algorithm-Based Clustering Techniques

    Directory of Open Access Journals (Sweden)

    Rayner Alfred

    2010-01-01

    Full Text Available Problem statement: In solving a classification problem in relational data mining, traditional methods, for example, the C4.5 and its variants, usually require data transformations from datasets stored in multiple tables into a single table. Unfortunately, we may loss some information when we join tables with a high degree of one-to-many association. Therefore, data transformation becomes a tedious trial-and-error work and the classification result is often not very promising especially when the number of tables and the degree of one-to-many association are large. Approach: We proposed a genetic semi-supervised clustering technique as a means of aggregating data stored in multiple tables to facilitate the task of solving a classification problem in relational database. This algorithm is suitable for classification of datasets with a high degree of one-to-many associations. It can be used in two ways. One is user-controlled clustering, where the user may control the result of clustering by varying the compactness of the spherical cluster. The other is automatic clustering, where a non-overlap clustering strategy is applied. In this study, we use the latter method to dynamically cluster multiple instances, as a means of aggregating them and illustrate the effectiveness of this method using the semi-supervised genetic algorithm-based clustering technique. Results: It was shown in the experimental results that using the reciprocal of Davies-Bouldin Index for cluster dispersion and the reciprocal of Gini Index for cluster purity, as the fitness function in the Genetic Algorithm (GA, finds solutions with much greater accuracy. The results obtained in this study showed that automatic clustering (seeding, by optimizing the cluster dispersion or cluster purity alone using GA, provides one with good results compared to the traditional k-means clustering. However, the best result can be achieved by optimizing the combination values of both the cluster

  1. Semi-supervised image classification algorithm based on fuzzy rough sets%基于模糊粗糙集的半监督影像分类算法

    Institute of Scientific and Technical Information of China (English)

    张德军; 何发智; 袁志勇; 石强

    2016-01-01

    To address the problem that only a small number of samples are labeled in image classifica‐tion ,a semi‐supervised image classification approach based on fuzzy rough sets was proposed .Firstly , the fuzziness and roughness of data were modeled by fuzzy rough sets simultaneously ,then the rele‐vancy between the features and the decisions were measured by fuzzy entropy ,and the membership of one sample belonging to one class was approximated by fuzzy rough approximation operators .Second‐ly ,the feature evaluation approach was improved by fuzzy entropy under the regularization frame‐work ,and the optimal subset was selected under the framework of the semi‐supervised feature selec‐tion via spectral analysis .Thirdly ,the prediction of unlabeled sample was improved with neighbor‐constraints ,and the informative samples which were unlabeled were selected by constrained self‐learn‐ing based on fuzzy rough sets to update the training set .Finally ,the classifier was trained by updating sample set .Several experiments demonstrate that the proposed method can achieve higher classifica‐tion accuracy based on a small amount of samples .%针对影像分类中少量标记样本问题,提出了基于模糊粗糙集的影像半监督分类算法。首先,通过模糊粗糙集对数据的粗糙性与模糊性进行建模,采用归一化的模糊互信息来度量特征与类别信息的相关性,并利用模糊上下近似度量样本的类别隶属度;然后,结合归一化的模糊互信息改进正则化框架下的特征评价方法,在谱图分析的半监督特征选择框架下实现特征优选;其次,结合近邻约束提高模糊上下近似预测样本类别的准确性,设计基于模糊粗糙集的约束自学习,选择信息量大的未标记样本更新训练样本集;最后,利用新的样本集训练分类器,完成影像分类任务。多组实验表明所提算法能够在少量标记样本的条件下有效提高影像的分类精度。

  2. A supervised contextual classifier based on a region-growth algorithm

    DEFF Research Database (Denmark)

    Lira, Jorge; Maletti, Gabriela Mariel

    2002-01-01

    A supervised classification scheme to segment optical multi-spectral images has been developed. In this classifier, an automated region-growth algorithm delineates the training sets. This algorithm handles three parameters: an initial pixel seed, a window size and a threshold for each class....... A suitable pixel seed is manually implanted through visual inspection of the image classes. The best value for the window and the threshold are obtained from a spectral distance and heuristic criteria. This distance is calculated from a mathematical model of spectral separability. A pixel is incorporated...... into a region if a spectral homogeneity criterion is satisfied in the pixel-centered window for a given threshold. The homogeneity criterion is obtained from the model of spectral distance. The set of pixels forming a region represents a statistically valid sample of a defined class signaled by the initial...

  3. Supervised Classification of Polarimetric SAR Imagery Using Temporal and Contextual Information

    Science.gov (United States)

    Dargahi, A.; Maghsoudi, Y.; Abkar, A. A.

    2013-09-01

    Using the context as a source of ancillary information in classification process provides a powerful tool to obtain better class discrimination. Modelling context using Markov Random Fields (MRFs) and combining with Bayesian approach, a context-based supervised classification method is proposed. In this framework, to have a full use of the statistical a priori knowledge of the data, the spatial relation of the neighbouring pixels was used. The proposed context-based algorithm combines a Gaussian-based wishart distribution of PolSAR images with temporal and contextual information. This combination was done through the Bayes decision theory: the class-conditional probability density function and the prior probability are modelled by the wishart distribution and the MRF model. Given the complexity and similarity of classes, in order to enhance the class separation, simultaneously two PolSAR images from two different seasons (leaf-on and leaf-off) were used. According to the achieved results, the maximum improvement in the overall accuracy of classification using WMRF (Combining Wishart and MRF) compared to the wishart classifier when the leaf-on image was used. The highest accuracy obtained was when using the combined datasets. In this case, the overall accuracy of the wishart and WMRF methods were 72.66% and 78.95% respectively.

  4. SUPERVISED CLASSIFICATION OF POLARIMETRIC SAR IMAGERY USING TEMPORAL AND CONTEXTUAL INFORMATION

    Directory of Open Access Journals (Sweden)

    A. Dargahi

    2013-09-01

    Full Text Available Using the context as a source of ancillary information in classification process provides a powerful tool to obtain better class discrimination. Modelling context using Markov Random Fields (MRFs and combining with Bayesian approach, a context-based supervised classification method is proposed. In this framework, to have a full use of the statistical a priori knowledge of the data, the spatial relation of the neighbouring pixels was used. The proposed context-based algorithm combines a Gaussian-based wishart distribution of PolSAR images with temporal and contextual information. This combination was done through the Bayes decision theory: the class-conditional probability density function and the prior probability are modelled by the wishart distribution and the MRF model. Given the complexity and similarity of classes, in order to enhance the class separation, simultaneously two PolSAR images from two different seasons (leaf-on and leaf-off were used. According to the achieved results, the maximum improvement in the overall accuracy of classification using WMRF (Combining Wishart and MRF compared to the wishart classifier when the leaf-on image was used. The highest accuracy obtained was when using the combined datasets. In this case, the overall accuracy of the wishart and WMRF methods were 72.66% and 78.95% respectively.

  5. SPAM CLASSIFICATION BASED ON SUPERVISED LEARNING USING MACHINE LEARNING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    T. Hamsapriya

    2011-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Spam emails are invading users without their consent and filling their mail boxes. They consume more network capacity as well as time in checking and deleting spam mails. The vast majority of Internet users are outspoken in their disdain for spam, although enough of them respond to commercial offers that spam remains a viable source of income to spammers. While most of the users want to do right think to avoid and get rid of spam, they need clear and simple guidelines on how to behave. In spite of all the measures taken to eliminate spam, they are not yet eradicated. Also when the counter measures are over sensitive, even legitimate emails will be eliminated. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifier-related issues. In recent days, Machine learning for spam classification is an important research issue. The effectiveness of the proposed work is explores and identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysis among the algorithms has also been presented.

  6. Automated segmentation of geographic atrophy in fundus autofluorescence images using supervised pixel classification.

    Science.gov (United States)

    Hu, Zhihong; Medioni, Gerard G; Hernandez, Matthias; Sadda, Srinivas R

    2015-01-01

    Geographic atrophy (GA) is a manifestation of the advanced or late stage of age-related macular degeneration (AMD). AMD is the leading cause of blindness in people over the age of 65 in the western world. The purpose of this study is to develop a fully automated supervised pixel classification approach for segmenting GA, including uni- and multifocal patches in fundus autofluorescene (FAF) images. The image features include region-wise intensity measures, gray-level co-occurrence matrix measures, and Gaussian filter banks. A [Formula: see text]-nearest-neighbor pixel classifier is applied to obtain a GA probability map, representing the likelihood that the image pixel belongs to GA. Sixteen randomly chosen FAF images were obtained from 16 subjects with GA. The algorithm-defined GA regions are compared with manual delineation performed by a certified image reading center grader. Eight-fold cross-validation is applied to evaluate the algorithm performance. The mean overlap ratio (OR), area correlation (Pearson's [Formula: see text]), accuracy (ACC), true positive rate (TPR), specificity (SPC), positive predictive value (PPV), and false discovery rate (FDR) between the algorithm- and manually defined GA regions are [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text], respectively.

  7. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    Science.gov (United States)

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  8. Web Classification Using DYN FP Algorithm

    Directory of Open Access Journals (Sweden)

    Bhanu Pratap Singh

    2014-01-01

    Full Text Available Web mining is the application of data mining techniques to extract knowledge from Web. Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that includes Web Search, Classification and Personalization etc. The primary goal of the web site is to provide the relevant information to the users. Web mining technique is used to categorize users and pages by analyzing users behavior, the content of pages and order of URLs accessed. In this paper, proposes an auto-classification algorithm of web pages using data mining techniques. The problem of discovering association rules between terms in a set of web pages belonging to a category in a search engine database, and present an auto – classification algorithm for solving this problem that are fundamentally based on FP-growth algorithm

  9. Supervised learning classification models for prediction of plant virus encoded RNA silencing suppressors.

    Directory of Open Access Journals (Sweden)

    Zeenia Jagga

    Full Text Available Viral encoded RNA silencing suppressor proteins interfere with the host RNA silencing machinery, facilitating viral infection by evading host immunity. In plant hosts, the viral proteins have several basic science implications and biotechnology applications. However in silico identification of these proteins is limited by their high sequence diversity. In this study we developed supervised learning based classification models for plant viral RNA silencing suppressor proteins in plant viruses. We developed four classifiers based on supervised learning algorithms: J48, Random Forest, LibSVM and Naïve Bayes algorithms, with enriched model learning by correlation based feature selection. Structural and physicochemical features calculated for experimentally verified primary protein sequences were used to train the classifiers. The training features include amino acid composition; auto correlation coefficients; composition, transition, and distribution of various physicochemical properties; and pseudo amino acid composition. Performance analysis of predictive models based on 10 fold cross-validation and independent data testing revealed that the Random Forest based model was the best and achieved 86.11% overall accuracy and 86.22% balanced accuracy with a remarkably high area under the Receivers Operating Characteristic curve of 0.95 to predict viral RNA silencing suppressor proteins. The prediction models for plant viral RNA silencing suppressors can potentially aid identification of novel viral RNA silencing suppressors, which will provide valuable insights into the mechanism of RNA silencing and could be further explored as potential targets for designing novel antiviral therapeutics. Also, the key subset of identified optimal features may help in determining compositional patterns in the viral proteins which are important determinants for RNA silencing suppressor activities. The best prediction model developed in the study is available as a

  10. Classification and Weakly Supervised Pain Localization using Multiple Segment Representation

    Science.gov (United States)

    Sikka, Karan; Dhall, Abhinav; Bartlett, Marian Stewart

    2014-01-01

    Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through ‘concept frames’ to ‘concept segments’ and argues through extensive experiments that algorithms such as MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Extensive experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of the approach by achieving competitive results on both tasks of pain classification and localization in videos. We also empirically evaluate the contributions of different components of MS-MIL. The paper also includes the visualization of discriminative facial patches, important for pain detection, as discovered by

  11. Classification and Weakly Supervised Pain Localization using Multiple Segment Representation.

    Science.gov (United States)

    Sikka, Karan; Dhall, Abhinav; Bartlett, Marian Stewart

    2014-10-01

    Automatic pain recognition from videos is a vital clinical application and, owing to its spontaneous nature, poses interesting challenges to automatic facial expression recognition (AFER) research. Previous pain vs no-pain systems have highlighted two major challenges: (1) ground truth is provided for the sequence, but the presence or absence of the target expression for a given frame is unknown, and (2) the time point and the duration of the pain expression event(s) in each video are unknown. To address these issues we propose a novel framework (referred to as MS-MIL) where each sequence is represented as a bag containing multiple segments, and multiple instance learning (MIL) is employed to handle this weakly labeled data in the form of sequence level ground-truth. These segments are generated via multiple clustering of a sequence or running a multi-scale temporal scanning window, and are represented using a state-of-the-art Bag of Words (BoW) representation. This work extends the idea of detecting facial expressions through 'concept frames' to 'concept segments' and argues through extensive experiments that algorithms such as MIL are needed to reap the benefits of such representation. The key advantages of our approach are: (1) joint detection and localization of painful frames using only sequence-level ground-truth, (2) incorporation of temporal dynamics by representing the data not as individual frames but as segments, and (3) extraction of multiple segments, which is well suited to signals with uncertain temporal location and duration in the video. Extensive experiments on UNBC-McMaster Shoulder Pain dataset highlight the effectiveness of the approach by achieving competitive results on both tasks of pain classification and localization in videos. We also empirically evaluate the contributions of different components of MS-MIL. The paper also includes the visualization of discriminative facial patches, important for pain detection, as discovered by our

  12. RSTFC: A Novel Algorithm for Spatio-Temporal Filtering and Classification of Single-Trial EEG.

    Science.gov (United States)

    Qi, Feifei; Li, Yuanqing; Wu, Wei

    2015-12-01

    Learning optimal spatio-temporal filters is a key to feature extraction for single-trial electroencephalogram (EEG) classification. The challenges are controlling the complexity of the learning algorithm so as to alleviate the curse of dimensionality and attaining computational efficiency to facilitate online applications, e.g., brain-computer interfaces (BCIs). To tackle these barriers, this paper presents a novel algorithm, termed regularized spatio-temporal filtering and classification (RSTFC), for single-trial EEG classification. RSTFC consists of two modules. In the feature extraction module, an l2 -regularized algorithm is developed for supervised spatio-temporal filtering of the EEG signals. Unlike the existing supervised spatio-temporal filter optimization algorithms, the developed algorithm can simultaneously optimize spatial and high-order temporal filters in an eigenvalue decomposition framework and thus be implemented highly efficiently. In the classification module, a convex optimization algorithm for sparse Fisher linear discriminant analysis is proposed for simultaneous feature selection and classification of the typically high-dimensional spatio-temporally filtered signals. The effectiveness of RSTFC is demonstrated by comparing it with several state-of-the-arts methods on three brain-computer interface (BCI) competition data sets collected from 17 subjects. Results indicate that RSTFC yields significantly higher classification accuracies than the competing methods. This paper also discusses the advantage of optimizing channel-specific temporal filters over optimizing a temporal filter common to all channels.

  13. Distribution Bottlenecks in Classification Algorithms

    NARCIS (Netherlands)

    Zwartjes, G.J.; Havinga, P.J.M.; Smit, G.J.M.; Hurink, J.L.

    2012-01-01

    The abundance of data available on Wireless Sensor Networks makes online processing necessary. In industrial applications for example, the correct operation of equipment can be the point of interest while raw sampled data is of minor importance. Classification algorithms can be used to make state cla

  14. Cortex transform and its application for supervised texture classification of digital images

    Science.gov (United States)

    Bashar, M. K.; Ohnishi, Noboru; Shevgaonkar, R. K.

    2002-02-01

    This paper proposes a localized multi-channel filtering approach of image texture analysis based on the cortical behavior of Human Visual System (HVS). In our efforts, 2D Gaussian function, called Cortex Filter, in the frequency domain is used to model the band pass nature of simple cells in HVS. A block-based iterative method is addressed. In each pass, a square block of data is captured and cortex filters at various directions and radial bands are applied to filter out the available texture information in that block. Such decomposition results in a set of band pass images from a single input image and we call it Cortex Transform (CT). We use filter responses in each pass to compute the representative texture features i.e., the average filtered energies. The procedure is repeated for the subsequent blocks of data until the whole image is scanned. Various energy values calculated above are stored into different arrays or files and are regarded as feature images. Thus the obtained feature images are integrated with minimum distance classifier for supervised texture classification. We demonstrated the algorithm with various real world and synthetic images from various sources. Confusion matrix analysis shows a high average overall classification accuracy (97.01%) of our CT based approach in comparison with that (71.27%) of the popular gray level co-occurrence matrix (GLCM) approach.

  15. Semi-supervised hyperspectral classification from a small number of training samples using a co-training approach

    Science.gov (United States)

    Romaszewski, Michał; Głomb, Przemysław; Cholewa, Michał

    2016-11-01

    We present a novel semi-supervised algorithm for classification of hyperspectral data from remote sensors. Our method is inspired by the Tracking-Learning-Detection (TLD) framework, originally applied for tracking objects in a video stream. TLD introduced the co-training approach called P-N learning, making use of two independent 'experts' (or learners) that scored samples in different feature spaces. In a similar fashion, we formulated the hyperspectral classification task as a co-training problem, that can be solved with the P-N learning scheme. Our method uses both spatial and spectral features of data, extending a small set of initial labelled samples during the process of region growing. We show that this approach is stable and achieves very good accuracy even for small training sets. We analyse the algorithm's performance on several publicly available hyperspectral data sets.

  16. Polarimetric SAR Image Supervised Classification Method Integrating Eigenvalues

    Directory of Open Access Journals (Sweden)

    Xing Yanxiao

    2016-04-01

    Full Text Available Since classification methods based on H/α space have the drawback of yielding poor classification results for terrains with similar scattering features, in this study, we propose a polarimetric Synthetic Aperture Radar (SAR image classification method based on eigenvalues. First, we extract eigenvalues and fit their distribution with an adaptive Gaussian mixture model. Then, using the naive Bayesian classifier, we obtain preliminary classification results. The distribution of eigenvalues in two kinds of terrains may be similar, leading to incorrect classification in the preliminary step. So, we calculate the similarity of every terrain pair, and add them to the similarity table if their similarity is greater than a given threshold. We then apply the Wishart distance-based KNN classifier to these similar pairs to obtain further classification results. We used the proposed method on both airborne and spaceborne SAR datasets, and the results show that our method can overcome the shortcoming of the H/α-based unsupervised classification method for eigenvalues usage, and produces comparable results with the Support Vector Machine (SVM-based classification method.

  17. Supervised pixel classification for segmenting geographic atrophy in fundus autofluorescene images

    Science.gov (United States)

    Hu, Zhihong; Medioni, Gerard G.; Hernandez, Matthias; Sadda, SriniVas R.

    2014-03-01

    Age-related macular degeneration (AMD) is the leading cause of blindness in people over the age of 65. Geographic atrophy (GA) is a manifestation of the advanced or late-stage of the AMD, which may result in severe vision loss and blindness. Techniques to rapidly and precisely detect and quantify GA lesions would appear to be of important value in advancing the understanding of the pathogenesis of GA and the management of GA progression. The purpose of this study is to develop an automated supervised pixel classification approach for segmenting GA including uni-focal and multi-focal patches in fundus autofluorescene (FAF) images. The image features include region wise intensity (mean and variance) measures, gray level co-occurrence matrix measures (angular second moment, entropy, and inverse difference moment), and Gaussian filter banks. A k-nearest-neighbor (k-NN) pixel classifier is applied to obtain a GA probability map, representing the likelihood that the image pixel belongs to GA. A voting binary iterative hole filling filter is then applied to fill in the small holes. Sixteen randomly chosen FAF images were obtained from sixteen subjects with GA. The algorithm-defined GA regions are compared with manual delineation performed by certified graders. Two-fold cross-validation is applied for the evaluation of the classification performance. The mean Dice similarity coefficients (DSC) between the algorithm- and manually-defined GA regions are 0.84 +/- 0.06 for one test and 0.83 +/- 0.07 for the other test and the area correlations between them are 0.99 (p < 0.05) and 0.94 (p < 0.05) respectively.

  18. A multi-label, semi-supervised classification approach applied to personality prediction in social media.

    Science.gov (United States)

    Lima, Ana Carolina E S; de Castro, Leandro Nunes

    2014-10-01

    Social media allow web users to create and share content pertaining to different subjects, exposing their activities, opinions, feelings and thoughts. In this context, online social media has attracted the interest of data scientists seeking to understand behaviours and trends, whilst collecting statistics for social sites. One potential application for these data is personality prediction, which aims to understand a user's behaviour within social media. Traditional personality prediction relies on users' profiles, their status updates, the messages they post, etc. Here, a personality prediction system for social media data is introduced that differs from most approaches in the literature, in that it works with groups of texts, instead of single texts, and does not take users' profiles into account. Also, the proposed approach extracts meta-attributes from texts and does not work directly with the content of the messages. The set of possible personality traits is taken from the Big Five model and allows the problem to be characterised as a multi-label classification task. The problem is then transformed into a set of five binary classification problems and solved by means of a semi-supervised learning approach, due to the difficulty in annotating the massive amounts of data generated in social media. In our implementation, the proposed system was trained with three well-known machine-learning algorithms, namely a Naïve Bayes classifier, a Support Vector Machine, and a Multilayer Perceptron neural network. The system was applied to predict the personality of Tweets taken from three datasets available in the literature, and resulted in an approximately 83% accurate prediction, with some of the personality traits presenting better individual classification rates than others.

  19. EVALUATION OF REGISTRATION, COMPRESSION AND CLASSIFICATION ALGORITHMS

    Science.gov (United States)

    Jayroe, R. R.

    1994-01-01

    Several types of algorithms are generally used to process digital imagery such as Landsat data. The most commonly used algorithms perform the task of registration, compression, and classification. Because there are different techniques available for performing registration, compression, and classification, imagery data users need a rationale for selecting a particular approach to meet their particular needs. This collection of registration, compression, and classification algorithms was developed so that different approaches could be evaluated and the best approach for a particular application determined. Routines are included for six registration algorithms, six compression algorithms, and two classification algorithms. The package also includes routines for evaluating the effects of processing on the image data. This collection of routines should be useful to anyone using or developing image processing software. Registration of image data involves the geometrical alteration of the imagery. Registration routines available in the evaluation package include image magnification, mapping functions, partitioning, map overlay, and data interpolation. The compression of image data involves reducing the volume of data needed for a given image. Compression routines available in the package include adaptive differential pulse code modulation, two-dimensional transforms, clustering, vector reduction, and picture segmentation. Classification of image data involves analyzing the uncompressed or compressed image data to produce inventories and maps of areas of similar spectral properties within a scene. The classification routines available include a sequential linear technique and a maximum likelihood technique. The choice of the appropriate evaluation criteria is quite important in evaluating the image processing functions. The user is therefore given a choice of evaluation criteria with which to investigate the available image processing functions. All of the available

  20. ASTErIsM - Application of topometric clustering algorithms in automatic galaxy detection and classification

    CERN Document Server

    Tramacere, A; Dubath, P; Kneib, J -P; Courbin, F

    2016-01-01

    We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional ones related to segments of spiral arms. We use two different supervised ensemble classification algorithms, Random Forest,...

  1. A review of supervised object-based land-cover image classification

    Science.gov (United States)

    Ma, Lei; Li, Manchun; Ma, Xiaoxue; Cheng, Liang; Du, Peijun; Liu, Yongxue

    2017-08-01

    Object-based image classification for land-cover mapping purposes using remote-sensing imagery has attracted significant attention in recent years. Numerous studies conducted over the past decade have investigated a broad array of sensors, feature selection, classifiers, and other factors of interest. However, these research results have not yet been synthesized to provide coherent guidance on the effect of different supervised object-based land-cover classification processes. In this study, we first construct a database with 28 fields using qualitative and quantitative information extracted from 254 experimental cases described in 173 scientific papers. Second, the results of the meta-analysis are reported, including general characteristics of the studies (e.g., the geographic range of relevant institutes, preferred journals) and the relationships between factors of interest (e.g., spatial resolution and study area or optimal segmentation scale, accuracy and number of targeted classes), especially with respect to the classification accuracy of different sensors, segmentation scale, training set size, supervised classifiers, and land-cover types. Third, useful data on supervised object-based image classification are determined from the meta-analysis. For example, we find that supervised object-based classification is currently experiencing rapid advances, while development of the fuzzy technique is limited in the object-based framework. Furthermore, spatial resolution correlates with the optimal segmentation scale and study area, and Random Forest (RF) shows the best performance in object-based classification. The area-based accuracy assessment method can obtain stable classification performance, and indicates a strong correlation between accuracy and training set size, while the accuracy of the point-based method is likely to be unstable due to mixed objects. In addition, the overall accuracy benefits from higher spatial resolution images (e.g., unmanned aerial

  2. Supervised Classification: The Naive Beyesian Returns to the Old Bailey

    Directory of Open Access Journals (Sweden)

    Vilja Hulden

    2014-12-01

    Full Text Available A few years back, William Turkel wrote a series of blog posts called A Naive Bayesian in the Old Bailey, which showed how one could use machine learning to extract interesting documents out of a digital archive. This tutorial is a kind of an update on that blog essay, with roughly the same data but a slightly different version of the machine learner. The idea is to show why machine learning methods are of interest to historians, as well as to present a step-by-step implementation of a supervised machine learner. This learner is then applied to the Old Bailey digital archive, which contains several centuries’ worth of transcripts of trials held at the Old Bailey in London. We will be using Python for the implementation.

  3. An Experimental Comparative Study on Three Classification Algorithms

    Institute of Scientific and Technical Information of China (English)

    蔡巍; 王永成; 李伟; 尹中航

    2003-01-01

    Classification algorithm is one of the key techniques to affect text automatic classification system's performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm's performance presented by the groups is superior to the other two algorithms.

  4. Unsupervised classification algorithm based on EM method for polarimetric SAR images

    Science.gov (United States)

    Fernández-Michelli, J. I.; Hurtado, M.; Areta, J. A.; Muravchik, C. H.

    2016-07-01

    In this work we develop an iterative classification algorithm using complex Gaussian mixture models for the polarimetric complex SAR data. It is a non supervised algorithm which does not require training data or an initial set of classes. Additionally, it determines the model order from data, which allows representing data structure with minimum complexity. The algorithm consists of four steps: initialization, model selection, refinement and smoothing. After a simple initialization stage, the EM algorithm is iteratively applied in the model selection step to compute the model order and an initial classification for the refinement step. The refinement step uses Classification EM (CEM) to reach the final classification and the smoothing stage improves the results by means of non-linear filtering. The algorithm is applied to both simulated and real Single Look Complex data of the EMISAR mission and compared with the Wishart classification method. We use confusion matrix and kappa statistic to make the comparison for simulated data whose ground-truth is known. We apply Davies-Bouldin index to compare both classifications for real data. The results obtained for both types of data validate our algorithm and show that its performance is comparable to Wishart's in terms of classification quality.

  5. A Novel Approach to Developing a Supervised Spatial Decision Support System for Image Classification: A Study of Paddy Rice Investigation

    Directory of Open Access Journals (Sweden)

    Shih-Hsun Chang

    2014-01-01

    Full Text Available Paddy rice area estimation via remote sensing techniques has been well established in recent years. Texture information and vegetation indicators are widely used to improve the classification accuracy of satellite images. Accordingly, this study employs texture information and vegetation indicators as ancillary information for classifying paddy rice through remote sensing images. In the first stage, the images are attained using a remote sensing technique and ancillary information is employed to increase the accuracy of classification. In the second stage, we decide to construct an efficient supervised classifier, which is used to evaluate the ancillary information. In the third stage, linear discriminant analysis (LDA is introduced. LDA is a well-known method for classifying images to various categories. Also, the particle swarm optimization (PSO algorithm is employed to optimize the LDA classification outcomes and increase classification performance. In the fourth stage, we discuss the strategy of selecting different window sizes and analyze particle numbers and iteration numbers with corresponding accuracy. Accordingly, a rational strategy for the combination of ancillary information is introduced. Afterwards, the PSO algorithm improves the accuracy rate from 82.26% to 89.31%. The improved accuracy results in a much lower salt-and-pepper effect in the thematic map.

  6. Automatic modulation classification principles, algorithms and applications

    CERN Document Server

    Zhu, Zhechen

    2014-01-01

    Automatic Modulation Classification (AMC) has been a key technology in many military, security, and civilian telecommunication applications for decades. In military and security applications, modulation often serves as another level of encryption; in modern civilian applications, multiple modulation types can be employed by a signal transmitter to control the data rate and link reliability. This book offers comprehensive documentation of AMC models, algorithms and implementations for successful modulation recognition. It provides an invaluable theoretical and numerical comparison of AMC algo

  7. Classification Algorithms for Big Data Analysis, a Map Reduce Approach

    Science.gov (United States)

    Ayma, V. A.; Ferreira, R. S.; Happ, P.; Oliveira, D.; Feitosa, R.; Costa, G.; Plaza, A.; Gamba, P.

    2015-03-01

    Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.

  8. CLASSIFICATION ALGORITHMS FOR BIG DATA ANALYSIS, A MAP REDUCE APPROACH

    Directory of Open Access Journals (Sweden)

    V. A. Ayma

    2015-03-01

    Full Text Available Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP, which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA’s machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM. The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.

  9. Semi-supervised Learning for Classification of Polarimetric SAR Images Based on SVM-Wishart

    Directory of Open Access Journals (Sweden)

    Hua Wen-qiang

    2015-02-01

    Full Text Available In this study, we propose a new semi-supervised classification method for Polarimetric SAR (PolSAR images, aiming at handling the issue that the number of train set is small. First, considering the scattering characters of PolSAR data, this method extracts multiple scattering features using target decomposition approach. Then, a semi-supervised learning model is established based on a co-training framework and Support Vector Machine (SVM. Both labeled and unlabeled data are utilized in this model to obtain high classification accuracy. Third, a recovery scheme based on the Wishart classifier is proposed to improve the classification performance. From the experiments conducted in this study, it is evident that the proposed method performs more effectively compared with other traditional methods when the number of train set is small.

  10. SEMI-SUPERVISED RADIO TRANSMITTER CLASSIFICATION BASED ON ELASTIC SPARSITY REGULARIZED SVM

    Institute of Scientific and Technical Information of China (English)

    Hu Guyu; Gong Yong; Chen Yande; Pan Zhisong; Deng Zhantao

    2012-01-01

    Non-collaborative radio transmitter recognition is a significant but challenging issue,sinceit is hard or costly to obtain labeled training data samples.In order to make effective use of the unlabeled samples which can be obtained much easier,a novel semi-supervised classification method named Elastic Sparsity Regularized Support Vector Machine (ESRSVM) is proposed for radio transmitter classification.ESRSVM first constructs an elastic-net graph over data samples to capture the robust and natural discriminating information and then incorporate the information into the manifold learning framework by an elastic sparsity regularization term.Experimental results on 10 GMSK modulated Automatic Identification System radios and 15 FM walkie-talkie radios show that ESRSVM achieves obviously better performance than KNN and SVM,which use only labeled samples for classification,and also outperforms semi-supervised classifier LapSVM based on manifold regularization.

  11. Extending self-organizing maps for supervised classification of remotely sensed data

    Institute of Scientific and Technical Information of China (English)

    CHEN Yongliang

    2009-01-01

    An extended self-organizing map for supervised classification is proposed in this paper. Unlike other traditional SOMs, the model has an input layer, a Kohonen layer, and an output layer. The number of neurons in the input layer depends on the dimensionality of input patterns. The number of neurons in the output layer equals the number of the desired classes. The number of neurons in the Kohonen layer may be a few to several thousands, which depends on the complexity of classification problems and the classification precision. Each training sample is expressed by a pair of vectors: an input vector and a class codebook vector. When a training sample is input into the model, Kohonens competitive learning rule is applied to selecting the winning neuron from the Kohonen layer and the weight coefficients connecting all the neurons in the input layer with both the winning neuron and its neighbors in the Kohonen layer are modified to be closer to the input vector, and those connecting all the neurons around the winning neuron within a certain diameter in the Kohonen layer with all the neurons in the output layer are adjusted to be closer to the class codebook vector. If the number of training samples is sufficiently large and the learning epochs iterate enough times, the model will be able to serve as a supervised classifier. The model has been tentatively applied to the supervised classification of multispectral remotely sensed data. The author compared the performances of the extended SOM and BPN in remotely sensed data classification. The investigation manifests that the extended SOM is feasible for supervised classification.

  12. Semi-supervised least squares support vector machine algorithm: application to offshore oil reservoir

    Science.gov (United States)

    Luo, Wei-Ping; Li, Hong-Qi; Shi, Ning

    2016-06-01

    At the early stages of deep-water oil exploration and development, fewer and further apart wells are drilled than in onshore oilfields. Supervised least squares support vector machine algorithms are used to predict the reservoir parameters but the prediction accuracy is low. We combined the least squares support vector machine (LSSVM) algorithm with semi-supervised learning and established a semi-supervised regression model, which we call the semi-supervised least squares support vector machine (SLSSVM) model. The iterative matrix inversion is also introduced to improve the training ability and training time of the model. We use the UCI data to test the generalization of a semi-supervised and a supervised LSSVM models. The test results suggest that the generalization performance of the LSSVM model greatly improves and with decreasing training samples the generalization performance is better. Moreover, for small-sample models, the SLSSVM method has higher precision than the semi-supervised K-nearest neighbor (SKNN) method. The new semisupervised LSSVM algorithm was used to predict the distribution of porosity and sandstone in the Jingzhou study area.

  13. Fast deterministic algorithm for EEE components classification

    Science.gov (United States)

    Kazakovtsev, L. A.; Antamoshkin, A. N.; Masich, I. S.

    2015-10-01

    Authors consider the problem of automatic classification of the electronic, electrical and electromechanical (EEE) components based on results of the test control. Electronic components of the same type used in a high- quality unit must be produced as a single production batch from a single batch of the raw materials. Data of the test control are used for splitting a shipped lot of the components into several classes representing the production batches. Methods such as k-means++ clustering or evolutionary algorithms combine local search and random search heuristics. The proposed fast algorithm returns a unique result for each data set. The result is comparatively precise. If the data processing is performed by the customer of the EEE components, this feature of the algorithm allows easy checking of the results by a producer or supplier.

  14. Gaussian maximum likelihood and contextual classification algorithms for multicrop classification

    Science.gov (United States)

    Di Zenzo, Silvano; Bernstein, Ralph; Kolsky, Harwood G.; Degloria, Stephen D.

    1987-01-01

    The paper reviews some of the ways in which context has been handled in the remote-sensing literature, and additional possibilities are introduced. The problem of computing exhaustive and normalized class-membership probabilities from the likelihoods provided by the Gaussian maximum likelihood classifier (to be used as initial probability estimates to start relaxation) is discussed. An efficient implementation of probabilistic relaxation is proposed, suiting the needs of actual remote-sensing applications. A modified fuzzy-relaxation algorithm using generalized operations between fuzzy sets is presented. Combined use of the two relaxation algorithms is proposed to exploit context in multispectral classification of remotely sensed data. Results on both one artificially created image and one MSS data set are reported.

  15. Gene classification using parameter-free semi-supervised manifold learning.

    Science.gov (United States)

    Huang, Hong; Feng, Hailiang

    2012-01-01

    A new manifold learning method, called parameter-free semi-supervised local Fisher discriminant analysis (pSELF), is proposed to map the gene expression data into a low-dimensional space for tumor classification. Motivated by the fact that semi-supervised and parameter-free are two desirable and promising characteristics for dimension reduction, a new difference-based optimization objective function with unlabeled samples has been designed. The proposed method preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The semi-supervised method has an analytic form of the globally optimal solution, which can be computed efficiently by eigen decomposition. Experimental results on synthetic data and SRBCT, DLBCL, and Brain Tumor gene expression data sets demonstrate the effectiveness of the proposed method.

  16. Structure-Based Algorithms for Microvessel Classification

    KAUST Repository

    Smith, Amy F.

    2015-02-01

    © 2014 The Authors. Microcirculation published by John Wiley & Sons Ltd. Objective: Recent developments in high-resolution imaging techniques have enabled digital reconstruction of three-dimensional sections of microvascular networks down to the capillary scale. To better interpret these large data sets, our goal is to distinguish branching trees of arterioles and venules from capillaries. Methods: Two novel algorithms are presented for classifying vessels in microvascular anatomical data sets without requiring flow information. The algorithms are compared with a classification based on observed flow directions (considered the gold standard), and with an existing resistance-based method that relies only on structural data. Results: The first algorithm, developed for networks with one arteriolar and one venular tree, performs well in identifying arterioles and venules and is robust to parameter changes, but incorrectly labels a significant number of capillaries as arterioles or venules. The second algorithm, developed for networks with multiple inlets and outlets, correctly identifies more arterioles and venules, but is more sensitive to parameter changes. Conclusions: The algorithms presented here can be used to classify microvessels in large microvascular data sets lacking flow information. This provides a basis for analyzing the distinct geometrical properties and modelling the functional behavior of arterioles, capillaries, and venules.

  17. A new classification algorithm based on RGH-tree search

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    In this paper, we put forward a new classification algorithm based on RGH-Tree search and perform the classification analysis and comparison study. This algorithm can save computing resource and increase the classification efficiency. The experiment shows that this algorithm can get better effect in dealing with three dimensional multi-kind data. We find that the algorithm has better generalization ability for small training set and big testing result.

  18. Machine Learning Algorithms in Web Page Classification

    Directory of Open Access Journals (Sweden)

    W.A.AWAD

    2012-11-01

    Full Text Available In this paper we use machine learning algorithms like SVM, KNN and GIS to perform a behaviorcomparison on the web pages classifications problem, from the experiment we see in the SVM with smallnumber of negative documents to build the centroids has the smallest storage requirement and the least online test computation cost. But almost all GIS with different number of nearest neighbors have an evenhigher storage requirement and on line test computation cost than KNN. This suggests that some futurework should be done to try to reduce the storage requirement and on list test cost of GIS.

  19. Automatic Building Detection based on Supervised Classification using High Resolution Google Earth Images

    OpenAIRE

    Ghaffarian, S.

    2014-01-01

    This paper presents a novel approach to detect the buildings by automization of the training area collecting stage for supervised classification. The method based on the fact that a 3d building structure should cast a shadow under suitable imaging conditions. Therefore, the methodology begins with the detection and masking out the shadow areas using luminance component of the LAB color space, which indicates the lightness of the image, and a novel double thresholding technique. Furth...

  20. Mapping of riparian invasive species with supervised classification of Unmanned Aerial System (UAS) imagery

    Science.gov (United States)

    Michez, Adrien; Piégay, Hervé; Jonathan, Lisein; Claessens, Hugues; Lejeune, Philippe

    2016-02-01

    Riparian zones are key landscape features, representing the interface between terrestrial and aquatic ecosystems. Although they have been influenced by human activities for centuries, their degradation has increased during the 20th century. Concomitant with (or as consequences of) these disturbances, the invasion of exotic species has increased throughout the world's riparian zones. In our study, we propose a easily reproducible methodological framework to map three riparian invasive taxa using Unmanned Aerial Systems (UAS) imagery: Impatiens glandulifera Royle, Heracleum mantegazzianum Sommier and Levier, and Japanese knotweed (Fallopia sachalinensis (F. Schmidt Petrop.), Fallopia japonica (Houtt.) and hybrids). Based on visible and near-infrared UAS orthophoto, we derived simple spectral and texture image metrics computed at various scales of image segmentation (10, 30, 45, 60 using eCognition software). Supervised classification based on the random forests algorithm was used to identify the most relevant variable (or combination of variables) derived from UAS imagery for mapping riparian invasive plant species. The models were built using 20% of the dataset, the rest of the dataset being used as a test set (80%). Except for H. mantegazzianum, the best results in terms of global accuracy were achieved with the finest scale of analysis (segmentation scale parameter = 10). The best values of overall accuracies reached 72%, 68%, and 97% for I. glandulifera, Japanese knotweed, and H. mantegazzianum respectively. In terms of selected metrics, simple spectral metrics (layer mean/camera brightness) were the most used. Our results also confirm the added value of texture metrics (GLCM derivatives) for mapping riparian invasive species. The results obtained for I. glandulifera and Japanese knotweed do not reach sufficient accuracies for operational applications. However, the results achieved for H. mantegazzianum are encouraging. The high accuracies values combined to

  1. Online semi-supervised learning: algorithm and application in metagenomics

    NARCIS (Netherlands)

    S. Imangaliyev; B. Keijser; W. Crielaard; E. Tsivtsivadze

    2013-01-01

    As the amount of metagenomic data grows rapidly, online statistical learning algorithms are poised to play key role in metagenome analysis tasks. Frequently, data are only partially labeled, namely dataset contains partial information about the problem of interest. This work presents an algorithm an

  2. Online Semi-Supervised Learning: Algorithm and Application in Metagenomics

    NARCIS (Netherlands)

    Imangaliyev, S.; Keijser, B.J.F.; Crielaard, W.; Tsivtsivadze, E.

    2013-01-01

    As the amount of metagenomic data grows rapidly, online statistical learning algorithms are poised to play key rolein metagenome analysis tasks. Frequently, data are only partially labeled, namely dataset contains partial information about the problem of interest. This work presents an algorithm and

  3. Contextual classification of multispectral image data: Approximate algorithm

    Science.gov (United States)

    Tilton, J. C. (Principal Investigator)

    1980-01-01

    An approximation to a classification algorithm incorporating spatial context information in a general, statistical manner is presented which is computationally less intensive. Classifications that are nearly as accurate are produced.

  4. Semi-Supervised Clustering Fingerprint Positioning Algorithm Based on Distance Constraints

    Institute of Scientific and Technical Information of China (English)

    Ying Xia; Zhongzhao Zhang; Lin Ma; Yao Wang

    2015-01-01

    With the rapid development of WLAN ( Wireless Local Area Network ) technology, an important target of indoor positioning systems is to improve the positioning accuracy while reducing the online computation. In this paper, it proposes a novel fingerprint positioning algorithm known as semi⁃supervised affinity propagation clustering based on distance function constraints. We show that by employing affinity propagation techniques, it is able to use a fractional labeled data to adjust similarity matrix of signal space to cluster reference points with high accuracy. The semi⁃supervised APC uses a combination of machine learning, clustering analysis and fingerprinting algorithm. By collecting data and testing our algorithm in a realistic indoor WLAN environment, the experimental results indicate that the proposed algorithm can improve positioning accuracy while reduce the online localization computation, as compared with the widely used K nearest neighbor and maximum likelihood estimation algorithms.

  5. Greylevel Difference Classification Algorithm inFractal Image Compression

    Institute of Scientific and Technical Information of China (English)

    陈毅松; 卢坚; 孙正兴; 张福炎

    2002-01-01

    This paper proposes the notion of a greylevel difference classification algorithm in fractal image compression. Then an example of the greylevel difference classification algo rithm is given as an improvement of the quadrant greylevel and variance classification in the quadtree-based encoding algorithm. The algorithm incorporates the frequency feature in spatial analysis using the notion of average quadrant greylevel difference, leading to an enhancement in terms of encoding time, PSNR value and compression ratio.

  6. Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

    Directory of Open Access Journals (Sweden)

    Wenbo Wang

    2012-09-01

    Full Text Available As an important task of data mining, Classification has been received considerable attention in many applications, such as information retrieval, web searching, etc. The enlarging volumes of information emerging by the progress of technology and the growing individual needs of data mining, makes classifying of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel classification algorithms. This paper introduces the classification algorithms and cloud computing briefly, based on it analyses the bad points of the present parallel classification algorithms, then addresses a new model of parallel classifying algorithms. And it mainly introduces a parallel Naïve Bayes classification algorithm based on MapReduce, which is a simple yet powerful parallel programming technique. The experimental results demonstrate that the proposed algorithm improves the original algorithm performance, and it can process large datasets efficiently on commodity hardware.

  7. Soft supervised self-organizing mapping (3SOM) for improving land cover classification with MODIS time-series

    Science.gov (United States)

    Lawawirojwong, Siam

    Classification of remote sensing data has long been a fundamental technique for studying vegetation and land cover. Furthermore, land use and land cover maps are a basic need for environmental science. These maps are important for crop system monitoring and are also valuable resources for decision makers. Therefore, an up-to-date and highly accurate land cover map with detailed and timely information is required for the global environmental change research community to support natural resource management, environmental protection, and policy making. However, there appears to be a number of limitations associated with data utilization such as weather conditions, data availability, cost, and the time needed for acquiring and processing large numbers of images. Additionally, improving the classification accuracy and reducing the classification time have long been the goals of remote sensing research and they still require the further study. To manage these challenges, the primary goal of this research is to improve classification algorithms that utilize MODIS-EVI time-series images. A supervised self-organizing map (SSOM) and a soft supervised self-organizing map (3SOM) are modified and improved to increase classification efficiency and accuracy. To accomplish the main goal, the performance of the proposed methods is investigated using synthetic and real landscape data derived from MODIS-EVI time-series images. Two study areas are selected based on a difference of land cover characteristics: one in Thailand and one in the Midwestern U.S. The results indicate that time-series imagery is a potentially useful input dataset for land cover classification. Moreover, the SSOM with time-series data significantly outperforms the conventional classification techniques of the Gaussian maximum likelihood classifier (GMLC) and backpropagation neural network (BPNN). In addition, the 3SOM employed as a soft classifier delivers a more accurate classification than the SSOM applied as

  8. A new semi-supervised classification strategy combining active learning and spectral unmixing of hyperspectral data

    Science.gov (United States)

    Sun, Yanli; Zhang, Xia; Plaza, Antonio; Li, Jun; Dópido, Inmaculada; Liu, Yi

    2016-10-01

    Hyperspectral remote sensing allows for the detailed analysis of the surface of the Earth by providing high-dimensional images with hundreds of spectral bands. Hyperspectral image classification plays a significant role in hyperspectral image analysis and has been a very active research area in the last few years. In the context of hyperspectral image classification, supervised techniques (which have achieved wide acceptance) must address a difficult task due to the unbalance between the high dimensionality of the data and the limited availability of labeled training samples in real analysis scenarios. While the collection of labeled samples is generally difficult, expensive, and time-consuming, unlabeled samples can be generated in a much easier way. Semi-supervised learning offers an effective solution that can take advantage of both unlabeled and a small amount of labeled samples. Spectral unmixing is another widely used technique in hyperspectral image analysis, developed to retrieve pure spectral components and determine their abundance fractions in mixed pixels. In this work, we propose a method to perform semi-supervised hyperspectral image classification by combining the information retrieved with spectral unmixing and classification. Two kinds of samples that are highly mixed in nature are automatically selected, aiming at finding the most informative unlabeled samples. One kind is given by the samples minimizing the distance between the first two most probable classes by calculating the difference between the two highest abundances. Another kind is given by the samples minimizing the distance between the most probable class and the least probable class, obtained by calculating the difference between the highest and lowest abundances. The effectiveness of the proposed method is evaluated using a real hyperspectral data set collected by the airborne visible infrared imaging spectrometer (AVIRIS) over the Indian Pines region in Northwestern Indiana. In the

  9. Automatic Building Detection based on Supervised Classification using High Resolution Google Earth Images

    Science.gov (United States)

    Ghaffarian, S.; Ghaffarian, S.

    2014-08-01

    This paper presents a novel approach to detect the buildings by automization of the training area collecting stage for supervised classification. The method based on the fact that a 3d building structure should cast a shadow under suitable imaging conditions. Therefore, the methodology begins with the detection and masking out the shadow areas using luminance component of the LAB color space, which indicates the lightness of the image, and a novel double thresholding technique. Further, the training areas for supervised classification are selected by automatically determining a buffer zone on each building whose shadow is detected by using the shadow shape and the sun illumination direction. Thereafter, by calculating the statistic values of each buffer zone which is collected from the building areas the Improved Parallelepiped Supervised Classification is executed to detect the buildings. Standard deviation thresholding applied to the Parallelepiped classification method to improve its accuracy. Finally, simple morphological operations conducted for releasing the noises and increasing the accuracy of the results. The experiments were performed on set of high resolution Google Earth images. The performance of the proposed approach was assessed by comparing the results of the proposed approach with the reference data by using well-known quality measurements (Precision, Recall and F1-score) to evaluate the pixel-based and object-based performances of the proposed approach. Evaluation of the results illustrates that buildings detected from dense and suburban districts with divers characteristics and color combinations using our proposed method have 88.4 % and 853 % overall pixel-based and object-based precision performances, respectively.

  10. Predicting incomplete gene microarray data with the use of supervised learning algorithms

    CSIR Research Space (South Africa)

    Twala, B

    2010-10-01

    Full Text Available of many well-established supervised learning (SL) algorithms in an attempt to provide more accurate and automatic diagnosis class (cancer/non cancer) prediction. Virtually all research on SL addresses the task of learning to classify complete domain...

  11. Cost-conscious comparison of supervised learning algorithms over multiple data sets

    OpenAIRE

    Ulaş, Aydın; Yıldız, Olcay Taner; Alpaydın, Ahmet İbrahim Ethem

    2012-01-01

    In the literature, there exist statistical tests to compare supervised learning algorithms on multiple data sets in terms of accuracy but they do not always generate an ordering. We propose Multi(2)Test, a generalization of our previous work, for ordering multiple learning algorithms on multiple data sets from "best" to "worst" where our goodness measure is composed of a prior cost term additional to generalization error. Our simulations show that Multi2Test generates orderings using pairwise...

  12. Algorithm and Implementation of the Blog-Post Supervision Process

    CERN Document Server

    Biswas, Kamanashis; Harun, S A M

    2010-01-01

    A web log or blog in short is a trendy way to share personal entries with others through website. A typical blog may consist of texts, images, audios and videos etc. Most of the blogs work as personal online diaries, while others may focus on specific interest such as photographs (photoblog), art (artblog), travel (tourblog), IT (techblog) etc. Another type of blogging called microblogging is also very well known now-a-days which contains very short posts. Like the developed countries, the users of blogs are gradually increasing in the developing countries e.g. Bangladesh. Due to the nature of open access to all users, some people misuse it to spread fake news to achieve individual or political goals. Some of them also post vulgar materials that make an embarrass situation for other bloggers. Even, sometimes it indulges the reputation of the victim. The only way to overcome this problem is to bring all the posts under supervision of the blog moderator. But it totally contradicts with blogging concepts. In thi...

  13. AN IMPROVED ALGORITHM FOR SUPERVISED FUZZY C-MEANS CLUSTERING OF REMOTELY SENSED DATA

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    This paper describes an improved algorithm for fuzzy c-means clustering of remotely sensed data, by which the degree of fuzziness of the resultant classification is de creased as comparing with that by a conventional algorithm: that is , the classification accura cy is increased. This is achieved by incorporating covariance matrices at the level of individual classes rather than assuming a global one. Empirical results from a fuzzy classification of an Edinburgh suburban land cover confirmed the improved performance of the new algorithm for fuzzy c-means clustering, in particular when fuzziness is also accommodated in the assumed reference data.

  14. An Incremental Classification Algorithm for Mining Data with Feature Space Heterogeneity

    Directory of Open Access Journals (Sweden)

    Yu Wang

    2014-01-01

    Full Text Available Feature space heterogeneity often exists in many real world data sets so that some features are of different importance for classification over different subsets. Moreover, the pattern of feature space heterogeneity might dynamically change over time as more and more data are accumulated. In this paper, we develop an incremental classification algorithm, Supervised Clustering for Classification with Feature Space Heterogeneity (SCCFSH, to address this problem. In our approach, supervised clustering is implemented to obtain a number of clusters such that samples in each cluster are from the same class. After the removal of outliers, relevance of features in each cluster is calculated based on their variations in this cluster. The feature relevance is incorporated into distance calculation for classification. The main advantage of SCCFSH lies in the fact that it is capable of solving a classification problem with feature space heterogeneity in an incremental way, which is favorable for online classification tasks with continuously changing data. Experimental results on a series of data sets and application to a database marketing problem show the efficiency and effectiveness of the proposed approach.

  15. Multi-Modal Curriculum Learning for Semi-Supervised Image Classification.

    Science.gov (United States)

    Gong, Chen; Tao, Dacheng; Maybank, Stephen J; Liu, Wei; Kang, Guoliang; Yang, Jie

    2016-07-01

    Semi-supervised image classification aims to classify a large quantity of unlabeled images by typically harnessing scarce labeled images. Existing semi-supervised methods often suffer from inadequate classification accuracy when encountering difficult yet critical images, such as outliers, because they treat all unlabeled images equally and conduct classifications in an imperfectly ordered sequence. In this paper, we employ the curriculum learning methodology by investigating the difficulty of classifying every unlabeled image. The reliability and the discriminability of these unlabeled images are particularly investigated for evaluating their difficulty. As a result, an optimized image sequence is generated during the iterative propagations, and the unlabeled images are logically classified from simple to difficult. Furthermore, since images are usually characterized by multiple visual feature descriptors, we associate each kind of features with a teacher, and design a multi-modal curriculum learning (MMCL) strategy to integrate the information from different feature modalities. In each propagation, each teacher analyzes the difficulties of the currently unlabeled images from its own modality viewpoint. A consensus is subsequently reached among all the teachers, determining the currently simplest images (i.e., a curriculum), which are to be reliably classified by the multi-modal learner. This well-organized propagation process leveraging multiple teachers and one learner enables our MMCL to outperform five state-of-the-art methods on eight popular image data sets.

  16. Multiscale modeling for classification of SAR imagery using hybrid EM algorithm and genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    Xianbin Wen; Hua Zhang; Jianguang Zhang; Xu Jiao; Lei Wang

    2009-01-01

    A novel method that hybridizes genetic algorithm (GA) and expectation maximization (EM) algorithm for the classification of syn-thetic aperture radar (SAR) imagery is proposed by the finite Gaussian mixtures model (GMM) and multiscale autoregressive (MAR)model. This algorithm is capable of improving the global optimality and consistency of the classification performance. The experiments on the SAR images show that the proposed algorithm outperforms the standard EM method significantly in classification accuracy.

  17. Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations

    Science.gov (United States)

    Ribeiro, Sidarta; Pereira, Danillo R.; Papa, João P.; de Albuquerque, Victor Hugo C.

    2016-01-01

    Automatic classification of vocalization type could potentially become a useful tool for acoustic the monitoring of captive colonies of highly vocal primates. However, for classification to be useful in practice, a reliable algorithm that can be successfully trained on small datasets is necessary. In this work, we consider seven different classification algorithms with the goal of finding a robust classifier that can be successfully trained on small datasets. We found good classification performance (accuracy > 0.83 and F1-score > 0.84) using the Optimum Path Forest classifier. Dataset and algorithms are made publicly available. PMID:27654941

  18. Supervised pixel classification using a feature space derived from an artificial visual system

    Science.gov (United States)

    Baxter, Lisa C.; Coggins, James M.

    1991-01-01

    Image segmentation involves labelling pixels according to their membership in image regions. This requires the understanding of what a region is. Using supervised pixel classification, the paper investigates how groups of pixels labelled manually according to perceived image semantics map onto the feature space created by an Artificial Visual System. Multiscale structure of regions are investigated and it is shown that pixels form clusters based on their geometric roles in the image intensity function, not by image semantics. A tentative abstract definition of a 'region' is proposed based on this behavior.

  19. Detection of malicious attacks by Meta classification algorithms

    Directory of Open Access Journals (Sweden)

    G.Michael

    2015-03-01

    Full Text Available We address the problem of malicious node detection in a network based on the characteristics in the behavior of the network. This issue brings out a challenging set of research papers in the recent contributing a critical component to secure the network. This type of work evolves with many changes in the solution strategies. In this work, we propose carefully the learning models with cautious selection of attributes, selection of parameter thresholds and number of iterations. In this research, appropriate approach to evaluate the performance of a set of meta classifier algorithms (Ad Boost, Attribute selected classifier, Bagging, Classification via Regression, Filtered classifier, logit Boost, multiclass classifier. The ratio between training and testing data is made such way that compatibility of data patterns in both the sets are same. Hence we consider a set of supervised machine learning schemes with meta classifiers were applied on the selected dataset to predict the attack risk of the network environment . The trained models were then used for predicting the risk of the attacks in a web server environment or by any network administrator or any Security Experts. The Prediction Accuracy of the Classifiers was evaluated using 10-fold Cross Validation and the results have been compared to obtain the accuracy.

  20. Supervised Self-Organizing Classification of Superresolution ISAR Images: An Anechoic Chamber Experiment

    Directory of Open Access Journals (Sweden)

    Radoi Emanuel

    2006-01-01

    Full Text Available The problem of the automatic classification of superresolution ISAR images is addressed in the paper. We describe an anechoic chamber experiment involving ten-scale-reduced aircraft models. The radar images of these targets are reconstructed using MUSIC-2D (multiple signal classification method coupled with two additional processing steps: phase unwrapping and symmetry enhancement. A feature vector is then proposed including Fourier descriptors and moment invariants, which are calculated from the target shape and the scattering center distribution extracted from each reconstructed image. The classification is finally performed by a new self-organizing neural network called SART (supervised ART, which is compared to two standard classifiers, MLP (multilayer perceptron and fuzzy KNN ( nearest neighbors. While the classification accuracy is similar, SART is shown to outperform the two other classifiers in terms of training speed and classification speed, especially for large databases. It is also easier to use since it does not require any input parameter related to its structure.

  1. Supervised Self-Organizing Classification of Superresolution ISAR Images: An Anechoic Chamber Experiment

    Science.gov (United States)

    Radoi, Emanuel; Quinquis, André; Totir, Felix

    2006-12-01

    The problem of the automatic classification of superresolution ISAR images is addressed in the paper. We describe an anechoic chamber experiment involving ten-scale-reduced aircraft models. The radar images of these targets are reconstructed using MUSIC-2D (multiple signal classification) method coupled with two additional processing steps: phase unwrapping and symmetry enhancement. A feature vector is then proposed including Fourier descriptors and moment invariants, which are calculated from the target shape and the scattering center distribution extracted from each reconstructed image. The classification is finally performed by a new self-organizing neural network called SART (supervised ART), which is compared to two standard classifiers, MLP (multilayer perceptron) and fuzzy KNN ([InlineEquation not available: see fulltext.] nearest neighbors). While the classification accuracy is similar, SART is shown to outperform the two other classifiers in terms of training speed and classification speed, especially for large databases. It is also easier to use since it does not require any input parameter related to its structure.

  2. Support vector classification algorithm based on variable parameter linear programming

    Institute of Scientific and Technical Information of China (English)

    Xiao Jianhua; Lin Jian

    2007-01-01

    To solve the problems of SVM in dealing with large sample size and asymmetric distributed samples, a support vector classification algorithm based on variable parameter linear programming is proposed.In the proposed algorithm, linear programming is employed to solve the optimization problem of classification to decrease the computation time and to reduce its complexity when compared with the original model.The adjusted punishment parameter greatly reduced the classification error resulting from asymmetric distributed samples and the detailed procedure of the proposed algorithm is given.An experiment is conducted to verify whether the proposed algorithm is suitable for asymmetric distributed samples.

  3. ASTErIsM: application of topometric clustering algorithms in automatic galaxy detection and classification

    Science.gov (United States)

    Tramacere, A.; Paraficz, D.; Dubath, P.; Kneib, J.-P.; Courbin, F.

    2016-12-01

    We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm, which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional groups related to segments of spiral arms. We use two different supervised ensemble classification algorithms: Random Forest and Gradient Boosting. Using a sample of ≃24 000 galaxies taken from the Galaxy Zoo 2 main sample with spectroscopic redshifts, and we test our classification against the Galaxy Zoo 2 catalogue. We find that features extracted from our pipeline give, on average, an accuracy of ≃93 per cent, when testing on a test set with a size of 20 per cent of our full data set, with features deriving from the angular distribution of density attractor ranking at the top of the discrimination power.

  4. EEG source space analysis of the supervised factor analytic approach for the classification of multi-directional arm movement

    Science.gov (United States)

    Shenoy Handiru, Vikram; Vinod, A. P.; Guan, Cuntai

    2017-08-01

    Objective. In electroencephalography (EEG)-based brain-computer interface (BCI) systems for motor control tasks the conventional practice is to decode motor intentions by using scalp EEG. However, scalp EEG only reveals certain limited information about the complex tasks of movement with a higher degree of freedom. Therefore, our objective is to investigate the effectiveness of source-space EEG in extracting relevant features that discriminate arm movement in multiple directions. Approach. We have proposed a novel feature extraction algorithm based on supervised factor analysis that models the data from source-space EEG. To this end, we computed the features from the source dipoles confined to Brodmann areas of interest (BA4a, BA4p and BA6). Further, we embedded class-wise labels of multi-direction (multi-class) source-space EEG to an unsupervised factor analysis to make it into a supervised learning method. Main Results. Our approach provided an average decoding accuracy of 71% for the classification of hand movement in four orthogonal directions, that is significantly higher (>10%) than the classification accuracy obtained using state-of-the-art spatial pattern features in sensor space. Also, the group analysis on the spectral characteristics of source-space EEG indicates that the slow cortical potentials from a set of cortical source dipoles reveal discriminative information regarding the movement parameter, direction. Significance. This study presents evidence that low-frequency components in the source space play an important role in movement kinematics, and thus it may lead to new strategies for BCI-based neurorehabilitation.

  5. Classification of Autism Spectrum Disorder Using Supervised Learning of Brain Connectivity Measures Extracted from Synchrostates

    CERN Document Server

    Jamal, Wasifa; Oprescu, Ioana-Anastasia; Maharatna, Koushik; Apicella, Fabio; Sicca, Federico

    2014-01-01

    Objective. The paper investigates the presence of autism using the functional brain connectivity measures derived from electro-encephalogram (EEG) of children during face perception tasks. Approach. Phase synchronized patterns from 128-channel EEG signals are obtained for typical children and children with autism spectrum disorder (ASD). The phase synchronized states or synchrostates temporally switch amongst themselves as an underlying process for the completion of a particular cognitive task. We used 12 subjects in each group (ASD and typical) for analyzing their EEG while processing fearful, happy and neutral faces. The minimal and maximally occurring synchrostates for each subject are chosen for extraction of brain connectivity features, which are used for classification between these two groups of subjects. Among different supervised learning techniques, we here explored the discriminant analysis and support vector machine both with polynomial kernels for the classification task. Main results. The leave ...

  6. Semi-supervised prediction of gene regulatory networks using machine learning algorithms

    Indian Academy of Sciences (India)

    Nihir Patel; T L Wang

    2015-10-01

    Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.

  7. Detection and Evaluation of Cheating on College Exams Using Supervised Classification

    Science.gov (United States)

    Cavalcanti, Elmano Ramalho; Pires, Carlos Eduardo; Cavalcanti, Elmano Pontes; Pires, Vládia Freire

    2012-01-01

    Text mining has been used for various purposes, such as document classification and extraction of domain-specific information from text. In this paper we present a study in which text mining methodology and algorithms were properly employed for academic dishonesty (cheating) detection and evaluation on open-ended college exams, based on document…

  8. Text Classification Retrieval Based on Complex Network and ICA Algorithm

    Directory of Open Access Journals (Sweden)

    Hongxia Li

    2013-08-01

    Full Text Available With the development of computer science and information technology, the library is developing toward information and network. The library digital process converts the book into digital information. The high-quality preservation and management are achieved by computer technology as well as text classification techniques. It realizes knowledge appreciation. This paper introduces complex network theory in the text classification process and put forwards the ICA semantic clustering algorithm. It realizes the independent component analysis of complex network text classification. Through the ICA clustering algorithm of independent component, it realizes character words clustering extraction of text classification. The visualization of text retrieval is improved. Finally, we make a comparative analysis of collocation algorithm and ICA clustering algorithm through text classification and keyword search experiment. The paper gives the clustering degree of algorithm and accuracy figure. Through simulation analysis, we find that ICA clustering algorithm increases by 1.2% comparing with text classification clustering degree. Accuracy can be improved by 11.1% at most. It improves the efficiency and accuracy of text classification retrieval. It also provides a theoretical reference for text retrieval classification of eBook

  9. Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis

    Directory of Open Access Journals (Sweden)

    S. Muthurajkumar

    2014-05-01

    Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.

  10. Weighted K-Nearest Neighbor Classification Algorithm Based on Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Xuesong Yan

    2013-10-01

    Full Text Available K-Nearest Neighbor (KNN is one of the most popular algorithms for data classification. Many researchers have found that the KNN algorithm accomplishes very good performance in their experiments on different datasets. The traditional KNN text classification algorithm has limitations: calculation complexity, the performance is solely dependent on the training set, and so on. To overcome these limitations, an improved version of KNN is proposed in this paper, we use genetic algorithm combined with weighted KNN to improve its classification performance. and the experiment results shown that our proposed algorithm outperforms the KNN with greater accuracy.

  11. Comparison of GOES Cloud Classification Algorithms Employing Explicit and Implicit Physics

    Science.gov (United States)

    Bankert, Richard L.; Mitrescu, Cristian; Miller, Steven D.; Wade, Robert H.

    2009-01-01

    Cloud-type classification based on multispectral satellite imagery data has been widely researched and demonstrated to be useful for distinguishing a variety of classes using a wide range of methods. The research described here is a comparison of the classifier output from two very different algorithms applied to Geostationary Operational Environmental Satellite (GOES) data over the course of one year. The first algorithm employs spectral channel thresholding and additional physically based tests. The second algorithm was developed through a supervised learning method with characteristic features of expertly labeled image samples used as training data for a 1-nearest-neighbor classification. The latter's ability to identify classes is also based in physics, but those relationships are embedded implicitly within the algorithm. A pixel-to-pixel comparison analysis was done for hourly daytime scenes within a region in the northeastern Pacific Ocean. Considerable agreement was found in this analysis, with many of the mismatches or disagreements providing insight to the strengths and limitations of each classifier. Depending upon user needs, a rule-based or other postprocessing system that combines the output from the two algorithms could provide the most reliable cloud-type classification.

  12. Comparison of GOES Cloud Classification Algorithms Employing Explicit and Implicit Physics

    Science.gov (United States)

    Bankert, Richard L.; Mitrescu, Cristian; Miller, Steven D.; Wade, Robert H.

    2009-01-01

    Cloud-type classification based on multispectral satellite imagery data has been widely researched and demonstrated to be useful for distinguishing a variety of classes using a wide range of methods. The research described here is a comparison of the classifier output from two very different algorithms applied to Geostationary Operational Environmental Satellite (GOES) data over the course of one year. The first algorithm employs spectral channel thresholding and additional physically based tests. The second algorithm was developed through a supervised learning method with characteristic features of expertly labeled image samples used as training data for a 1-nearest-neighbor classification. The latter's ability to identify classes is also based in physics, but those relationships are embedded implicitly within the algorithm. A pixel-to-pixel comparison analysis was done for hourly daytime scenes within a region in the northeastern Pacific Ocean. Considerable agreement was found in this analysis, with many of the mismatches or disagreements providing insight to the strengths and limitations of each classifier. Depending upon user needs, a rule-based or other postprocessing system that combines the output from the two algorithms could provide the most reliable cloud-type classification.

  13. Detection of facilities in satellite imagery using semi-supervised image classification and auxiliary contextual observables

    Energy Technology Data Exchange (ETDEWEB)

    Harvey, Neal R [Los Alamos National Laboratory; Ruggiero, Christy E [Los Alamos National Laboratory; Pawley, Norma H [Los Alamos National Laboratory; Brumby, Steven P [Los Alamos National Laboratory; Macdonald, Brian [Los Alamos National Laboratory; Balick, Lee [Los Alamos National Laboratory; Oyer, Alden [Los Alamos National Laboratory

    2009-01-01

    Detecting complex targets, such as facilities, in commercially available satellite imagery is a difficult problem that human analysts try to solve by applying world knowledge. Often there are known observables that can be extracted by pixel-level feature detectors that can assist in the facility detection process. Individually, each of these observables is not sufficient for an accurate and reliable detection, but in combination, these auxiliary observables may provide sufficient context for detection by a machine learning algorithm. We describe an approach for automatic detection of facilities that uses an automated feature extraction algorithm to extract auxiliary observables, and a semi-supervised assisted target recognition algorithm to then identify facilities of interest. We illustrate the approach using an example of finding schools in Quickbird image data of Albuquerque, New Mexico. We use Los Alamos National Laboratory's Genie Pro automated feature extraction algorithm to find a set of auxiliary features that should be useful in the search for schools, such as parking lots, large buildings, sports fields and residential areas and then combine these features using Genie Pro's assisted target recognition algorithm to learn a classifier that finds schools in the image data.

  14. Semi-Supervised Bayesian Classification of Materials with Impact-Echo Signals

    Directory of Open Access Journals (Sweden)

    Jorge Igual

    2015-05-01

    Full Text Available The detection and identification of internal defects in a material require the use of some technology that translates the hidden interior damages into observable signals with different signature-defect correspondences. We apply impact-echo techniques for this purpose. The materials are classified according to their defective status (homogeneous, one defect or multiple defects and kind of defect (hole or crack, passing through or not. Every specimen is impacted by a hammer, and the spectrum of the propagated wave is recorded. This spectrum is the input data to a Bayesian classifier that is based on the modeling of the conditional probabilities with a mixture of Gaussians. The parameters of the Gaussian mixtures and the class probabilities are estimated using an extended expectation-maximization algorithm. The advantage of our proposal is that it is flexible, since it obtains good results for a wide range of models even under little supervision; e.g., it obtains a harmonic average of precision and recall value of 92.38% given only a 10% supervision ratio. We test the method with real specimens made of aluminum alloy. The results show that the algorithm works very well. This technique could be applied in many industrial problems, such as the optimization of the marble cutting process.

  15. Semi-supervised Bayesian classification of materials with impact-echo signals.

    Science.gov (United States)

    Igual, Jorge; Salazar, Addisson; Safont, Gonzalo; Vergara, Luis

    2015-05-19

    The detection and identification of internal defects in a material require the use of some technology that translates the hidden interior damages into observable signals with different signature-defect correspondences. We apply impact-echo techniques for this purpose. The materials are classified according to their defective status (homogeneous, one defect or multiple defects) and kind of defect (hole or crack, passing through or not). Every specimen is impacted by a hammer, and the spectrum of the propagated wave is recorded. This spectrum is the input data to a Bayesian classifier that is based on the modeling of the conditional probabilities with a mixture of Gaussians. The parameters of the Gaussian mixtures and the class probabilities are estimated using an extended expectation-maximization algorithm. The advantage of our proposal is that it is flexible, since it obtains good results for a wide range of models even under little supervision; e.g., it obtains a harmonic average of precision and recall value of 92.38% given only a 10% supervision ratio. We test the method with real specimens made of aluminum alloy. The results show that the algorithm works very well. This technique could be applied in many industrial problems, such as the optimization of the marble cutting process.

  16. Algorithm of Supervised Learning on Outlier Manifold%有监督的噪音流形学习算法

    Institute of Scientific and Technical Information of China (English)

    黄添强; 李凯; 郑之

    2011-01-01

    流形学习算法是维度约简与数据可视化领域的重要工具,提高算法的效率与健壮性对其实际应用有积极意义.经典的流形学习算法普遍的对噪音点较为敏感,现有的改进算法尚存在不足.本文提出一种基于监督学习与核函数的健壮流形学习算法,把核方法与监督学习引入降维过程,利用已知标签数据信息与核函数特性,使得同类样本变得紧密,不同类样本变成分散,提高后续分类任务的效果,降低算法对流形上噪音的敏感性.在UCI数据与白血病拉曼光谱数据上的实验表明本文改进的算法具有更高的抗噪性.%Manifold learning algorithm is an important tool in the field of dimension reduction and data visualization. Improving the algorithm's efficiency and robustness is of positive significance to its practical application. Classical manifold learning algorithm is sensitive to noise points,and its improved algorithms have been imperfect. This paper presents a robust manifold learning algorithm based on supervised learning and kernel function. It introduces nuclear methods and supervised learning into the dimensionality reduction ,and takes full advantage of the label of some data and the property of kernel function. The proposed algorithm can make close and same types of samples and distribute different types of samples,thus to improves the effect of the classification task and reduce the noise sensitivity of outliers on manifold. The experiments on the UCI data and Raman data of leukemia reveal that the algorithm has better noise immunity.

  17. Semi-supervised vibration-based classification and condition monitoring of compressors

    Science.gov (United States)

    Potočnik, Primož; Govekar, Edvard

    2017-09-01

    Semi-supervised vibration-based classification and condition monitoring of the reciprocating compressors installed in refrigeration appliances is proposed in this paper. The method addresses the problem of industrial condition monitoring where prior class definitions are often not available or difficult to obtain from local experts. The proposed method combines feature extraction, principal component analysis, and statistical analysis for the extraction of initial class representatives, and compares the capability of various classification methods, including discriminant analysis (DA), neural networks (NN), support vector machines (SVM), and extreme learning machines (ELM). The use of the method is demonstrated on a case study which was based on industrially acquired vibration measurements of reciprocating compressors during the production of refrigeration appliances. The paper presents a comparative qualitative analysis of the applied classifiers, confirming the good performance of several nonlinear classifiers. If the model parameters are properly selected, then very good classification performance can be obtained from NN trained by Bayesian regularization, SVM and ELM classifiers. The method can be effectively applied for the industrial condition monitoring of compressors.

  18. Modeling electroencephalography waveforms with semi-supervised deep belief nets: fast classification and anomaly measurement

    Science.gov (United States)

    Wulsin, D. F.; Gupta, J. R.; Mani, R.; Blanco, J. A.; Litt, B.

    2011-06-01

    Clinical electroencephalography (EEG) records vast amounts of human complex data yet is still reviewed primarily by human readers. Deep belief nets (DBNs) are a relatively new type of multi-layer neural network commonly tested on two-dimensional image data but are rarely applied to times-series data such as EEG. We apply DBNs in a semi-supervised paradigm to model EEG waveforms for classification and anomaly detection. DBN performance was comparable to standard classifiers on our EEG dataset, and classification time was found to be 1.7-103.7 times faster than the other high-performing classifiers. We demonstrate how the unsupervised step of DBN learning produces an autoencoder that can naturally be used in anomaly measurement. We compare the use of raw, unprocessed data—a rarity in automated physiological waveform analysis—with hand-chosen features and find that raw data produce comparable classification and better anomaly measurement performance. These results indicate that DBNs and raw data inputs may be more effective for online automated EEG waveform recognition than other common techniques.

  19. Semi-Supervised Projective Non-Negative Matrix Factorization for Cancer Classification.

    Directory of Open Access Journals (Sweden)

    Xiang Zhang

    Full Text Available Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods.

  20. Evaluation of supervised machine-learning algorithms to distinguish between inflammatory bowel disease and alimentary lymphoma in cats.

    Science.gov (United States)

    Awaysheh, Abdullah; Wilcke, Jeffrey; Elvinger, François; Rees, Loren; Fan, Weiguo; Zimmerman, Kurt L

    2016-11-01

    Inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) are common gastrointestinal diseases in cats. The very similar clinical signs and histopathologic features of these diseases make the distinction between them diagnostically challenging. We tested the use of supervised machine-learning algorithms to differentiate between the 2 diseases using data generated from noninvasive diagnostic tests. Three prediction models were developed using 3 machine-learning algorithms: naive Bayes, decision trees, and artificial neural networks. The models were trained and tested on data from complete blood count (CBC) and serum chemistry (SC) results for the following 3 groups of client-owned cats: normal, inflammatory bowel disease (IBD), or alimentary lymphoma (ALA). Naive Bayes and artificial neural networks achieved higher classification accuracy (sensitivities of 70.8% and 69.2%, respectively) than the decision tree algorithm (63%, p machine learning provided a method for distinguishing between ALA-IBD, ALA-normal, and IBD-normal. The naive Bayes and artificial neural networks classifiers used 10 and 4 of the CBC and SC variables, respectively, to outperform the C4.5 decision tree, which used 5 CBC and SC variables in classifying cats into the 3 classes. These models can provide another noninvasive diagnostic tool to assist clinicians with differentiating between IBD and ALA, and between diseased and nondiseased cats. © 2016 The Author(s).

  1. Weakly Supervised Segmentation-Aided Classification of Urban Scenes from 3d LIDAR Point Clouds

    Science.gov (United States)

    Guinard, S.; Landrieu, L.

    2017-05-01

    We consider the problem of the semantic classification of 3D LiDAR point clouds obtained from urban scenes when the training set is limited. We propose a non-parametric segmentation model for urban scenes composed of anthropic objects of simple shapes, partionning the scene into geometrically-homogeneous segments which size is determined by the local complexity. This segmentation can be integrated into a conditional random field classifier (CRF) in order to capture the high-level structure of the scene. For each cluster, this allows us to aggregate the noisy predictions of a weakly-supervised classifier to produce a higher confidence data term. We demonstrate the improvement provided by our method over two publicly-available large-scale data sets.

  2. Photometric classification of type Ia supernovae in the SuperNova Legacy Survey with supervised learning

    CERN Document Server

    Möller, A; Leloup, C; Neveu, J; Palanque-Delabrouille, N; Rich, J; Carlberg, R; Lidman, C; Pritchet, C

    2016-01-01

    In the era of large astronomical surveys, photometric classification of supernovae (SNe) has become an important research field due to limited spectroscopic resources for candidate follow-up and classification. In this work, we present a method to photometrically classify type Ia supernovae based on machine learning with redshifts that are derived from the SN light-curves. This method is implemented on real data from the SNLS deferred pipeline, a purely photometric pipeline that identifies SNe Ia at high-redshifts ($0.2classification. We study the performance of different algorithms such as Random Forest and Boosted Decision Trees. We evaluate the performance using SN simulations and real data from the first 3 years of the Supernova Legacy Survey (SNLS), which contains large spectroscopically and photometrically classified type Ia sa...

  3. Manifold regularized multitask learning for semi-supervised multilabel image classification.

    Science.gov (United States)

    Luo, Yong; Tao, Dacheng; Geng, Bo; Xu, Chao; Maybank, Stephen J

    2013-02-01

    It is a significant challenge to classify images with multiple labels by using only a small number of labeled samples. One option is to learn a binary classifier for each label and use manifold regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model complexity because different tasks limit one another's search volume, and the manifold regularization ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38 classes, by comparing MRMTL with popular image classification algorithms. The results suggest that MRMTL is effective for image classification.

  4. Land use mapping from CBERS-2 images with open source tools by applying different classification algorithms

    Science.gov (United States)

    Sanhouse-García, Antonio J.; Rangel-Peraza, Jesús Gabriel; Bustos-Terrones, Yaneth; García-Ferrer, Alfonso; Mesas-Carrascosa, Francisco J.

    2016-02-01

    Land cover classification is often based on different characteristics between their classes, but with great homogeneity within each one of them. This cover is obtained through field work or by mean of processing satellite images. Field work involves high costs; therefore, digital image processing techniques have become an important alternative to perform this task. However, in some developing countries and particularly in Casacoima municipality in Venezuela, there is a lack of geographic information systems due to the lack of updated information and high costs in software license acquisition. This research proposes a low cost methodology to develop thematic mapping of local land use and types of coverage in areas with scarce resources. Thematic mapping was developed from CBERS-2 images and spatial information available on the network using open source tools. The supervised classification method per pixel and per region was applied using different classification algorithms and comparing them among themselves. Classification method per pixel was based on Maxver algorithms (maximum likelihood) and Euclidean distance (minimum distance), while per region classification was based on the Bhattacharya algorithm. Satisfactory results were obtained from per region classification, where overall reliability of 83.93% and kappa index of 0.81% were observed. Maxver algorithm showed a reliability value of 73.36% and kappa index 0.69%, while Euclidean distance obtained values of 67.17% and 0.61% for reliability and kappa index, respectively. It was demonstrated that the proposed methodology was very useful in cartographic processing and updating, which in turn serve as a support to develop management plans and land management. Hence, open source tools showed to be an economically viable alternative not only for forestry organizations, but for the general public, allowing them to develop projects in economically depressed and/or environmentally threatened areas.

  5. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    Science.gov (United States)

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity.

  6. Mass spectrometry cancer data classification using wavelets and genetic algorithm.

    Science.gov (United States)

    Nguyen, Thanh; Nahavandi, Saeid; Creighton, Douglas; Khosravi, Abbas

    2015-12-21

    This paper introduces a hybrid feature extraction method applied to mass spectrometry (MS) data for cancer classification. Haar wavelets are employed to transform MS data into orthogonal wavelet coefficients. The most prominent discriminant wavelets are then selected by genetic algorithm (GA) to form feature sets. The combination of wavelets and GA yields highly distinct feature sets that serve as inputs to classification algorithms. Experimental results show the robustness and significant dominance of the wavelet-GA against competitive methods. The proposed method therefore can be applied to cancer classification models that are useful as real clinical decision support systems for medical practitioners.

  7. Solving Classification Problems Using Genetic Programming Algorithms on GPUs

    Science.gov (United States)

    Cano, Alberto; Zafra, Amelia; Ventura, Sebastián

    Genetic Programming is very efficient in problem solving compared to other proposals but its performance is very slow when the size of the data increases. This paper proposes a model for multi-threaded Genetic Programming classification evaluation using a NVIDIA CUDA GPUs programming model to parallelize the evaluation phase and reduce computational time. Three different well-known Genetic Programming classification algorithms are evaluated using the parallel evaluation model proposed. Experimental results using UCI Machine Learning data sets compare the performance of the three classification algorithms in single and multithreaded Java, C and CUDA GPU code. Results show that our proposal is much more efficient.

  8. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Science.gov (United States)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Benien, Parul; Ozcan, Aydogan

    2017-06-01

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of 0.8 cm2 and weighs only 180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved a

  9. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    Directory of Open Access Journals (Sweden)

    Ceylan Koydemir Hatice

    2017-06-01

    Full Text Available Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond

  10. Comparison of supervised machine learning algorithms for waterborne pathogen detection using mobile phone fluorescence microscopy

    KAUST Repository

    Ceylan Koydemir, Hatice

    2017-06-14

    Giardia lamblia is a waterborne parasite that affects millions of people every year worldwide, causing a diarrheal illness known as giardiasis. Timely detection of the presence of the cysts of this parasite in drinking water is important to prevent the spread of the disease, especially in resource-limited settings. Here we provide extended experimental testing and evaluation of the performance and repeatability of a field-portable and cost-effective microscopy platform for automated detection and counting of Giardia cysts in water samples, including tap water, non-potable water, and pond water. This compact platform is based on our previous work, and is composed of a smartphone-based fluorescence microscope, a disposable sample processing cassette, and a custom-developed smartphone application. Our mobile phone microscope has a large field of view of ~0.8 cm2 and weighs only ~180 g, excluding the phone. A custom-developed smartphone application provides a user-friendly graphical interface, guiding the users to capture a fluorescence image of the sample filter membrane and analyze it automatically at our servers using an image processing algorithm and training data, consisting of >30,000 images of cysts and >100,000 images of other fluorescent particles that are captured, including, e.g. dust. The total time that it takes from sample preparation to automated cyst counting is less than an hour for each 10 ml of water sample that is tested. We compared the sensitivity and the specificity of our platform using multiple supervised classification models, including support vector machines and nearest neighbors, and demonstrated that a bootstrap aggregating (i.e. bagging) approach using raw image file format provides the best performance for automated detection of Giardia cysts. We evaluated the performance of this machine learning enabled pathogen detection device with water samples taken from different sources (e.g. tap water, non-potable water, pond water) and achieved

  11. An Algorithm for Classification of 3-D Spherical Spatial Points

    Institute of Scientific and Technical Information of China (English)

    ZHU Qing-xin; Mudur SP; LIU Chang; PENG Bo; WU Jia

    2003-01-01

    This paper presents a highly efficient algorithm for classification of 3D points sampled from lots of spheres, using neighboring relations of spatial points to construct a neighbor graph from points cloud. This algorithm can be used in object recognition, computer vision, and CAD model building, etc.

  12. Supervised Classification Processes for the Characterization of Heritage Elements, Case Study: Cuenca-Ecuador

    Science.gov (United States)

    Briones, J. C.; Heras, V.; Abril, C.; Sinchi, E.

    2017-08-01

    The proper control of built heritage entails many challenges related to the complexity of heritage elements and the extent of the area to be managed, for which the available resources must be efficiently used. In this scenario, the preventive conservation approach, based on the concept that prevent is better than cure, emerges as a strategy to avoid the progressive and imminent loss of monuments and heritage sites. Regular monitoring appears as a key tool to identify timely changes in heritage assets. This research demonstrates that the supervised learning model (Support Vector Machines - SVM) is an ideal tool that supports the monitoring process detecting visible elements in aerial images such as roofs structures, vegetation and pavements. The linear, gaussian and polynomial kernel functions were tested; the lineal function provided better results over the other functions. It is important to mention that due to the high level of segmentation generated by the classification procedure, it was necessary to apply a generalization process through opening a mathematical morphological operation, which simplified the over classification for the monitored elements.

  13. Using Genetic Algorithms for Texts Classification Problems

    Directory of Open Access Journals (Sweden)

    A. A. Shumeyko

    2009-01-01

    Full Text Available The avalanche quantity of the information developed by mankind has led to concept of automation of knowledge extraction – Data Mining ([1]. This direction is connected with a wide spectrum of problems - from recognition of the fuzzy set to creation of search machines. Important component of Data Mining is processing of the text information. Such problems lean on concept of classification and clustering ([2]. Classification consists in definition of an accessory of some element (text to one of in advance created classes. Clustering means splitting a set of elements (texts on clusters which quantity are defined by localization of elements of the given set in vicinities of these some natural centers of these clusters. Realization of a problem of classification initially should lean on the given postulates, basic of which – the aprioristic information on primary set of texts and a measure of affinity of elements and classes.

  14. A New Clustering Algorithm for Face Classification

    Directory of Open Access Journals (Sweden)

    Shaker K. Ali

    2016-06-01

    Full Text Available In This paper, we proposed new clustering algorithm depend on other clustering algorithm ideas. The proposed algorithm idea is based on getting distance matrix, then the exclusion of the matrix points which will be clustered by saving the location (row, column of these points and determine the minimum distance of these points which will be belongs the group (class and keep the other points which are not clustering yet. The propose algorithm is applied to image data base of the human face with different environment (direction, angles... etc.. These data are collected from different resource (ORL site and real images collected from random sample of Thi_Qar city population in lraq. Our algorithm has been implemented on three types of distance to calculate the minimum distance between points (Euclidean, Correlation and Minkowski distance .The efficiency ratio of proposed algorithm has varied according to the data base and threshold, the efficiency of our algorithm is exceeded (96%. Matlab (2014 has been used in this work.

  15. Discovering Fuzzy Censored Classification Rules (Fccrs: A Genetic Algorithm Approach

    Directory of Open Access Journals (Sweden)

    Renu Bala

    2012-08-01

    Full Text Available Classification Rules (CRs are often discovered in the form of ‘If-Then’ Production Rules (PRs. PRs, beinghigh level symbolic rules, are comprehensible and easy to implement. However, they are not capable ofdealing with cognitive uncertainties like vagueness and ambiguity imperative to real word decision makingsituations. Fuzzy Classification Rules (FCRs based on fuzzy logic provide a framework for a flexiblehuman like reasoning involving linguistic variables. Moreover, a classification system consisting of simple‘If-Then’ rules is not competent in handling exceptional circumstances. In this paper, we propose aGenetic Algorithm approach to discover Fuzzy Censored Classification Rules (FCCRs. A FCCR is aFuzzy Classification Rule (FCRs augmented with censors. Here, censors are exceptional conditions inwhich the behaviour of a rule gets modified. The proposed algorithm works in two phases. In the firstphase, the Genetic Algorithm discovers Fuzzy Classification Rules. Subsequently, these FuzzyClassification Rules are mutated to produce FCCRs in the second phase. The appropriate encodingscheme, fitness function and genetic operators are designed for the discovery of FCCRs. The proposedapproach for discovering FCCRs is then illustrated on a synthetic dataset.

  16. A Semi-Supervised WLAN Indoor Localization Method Based on ℓ1-Graph Algorithm

    Institute of Scientific and Technical Information of China (English)

    Liye Zhang; Lin Ma; Yubin Xu

    2015-01-01

    For indoor location estimation based on received signal strength ( RSS ) in wireless local area networks ( WLAN) , in order to reduce the influence of noise on the positioning accuracy, a large number of RSS should be collected in offline phase. Therefore, collecting training data with positioning information is time consuming which becomes the bottleneck of WLAN indoor localization. In this paper, the traditional semi⁃supervised learning method based on k⁃NN andε⁃NN graph for reducing collection workload of offline phase are analyzed, and the result shows that the k⁃NN or ε⁃NN graph are sensitive to data noise, which limit the performance of semi⁃supervised learning WLAN indoor localization system. Aiming at the above problem, it proposes a ℓ1⁃graph⁃algorithm⁃based semi⁃supervised learning ( LG⁃SSL) indoor localization method in which the graph is built by ℓ1⁃norm algorithm. In our system, it firstly labels the unlabeled data using LG⁃SSL and labeled data to build the Radio Map in offline training phase, and then uses LG⁃SSL to estimate user’ s location in online phase. Extensive experimental results show that, benefit from the robustness to noise and sparsity ofℓ1⁃graph, LG⁃SSL exhibits superior performance by effectively reducing the collection workload in offline phase and improving localization accuracy in online phase.

  17. Determination of Land Cover/land Use Using SPOT 7 Data with Supervised Classification Methods

    Science.gov (United States)

    Bektas Balcik, F.; Karakacan Kuzucu, A.

    2016-10-01

    Land use/ land cover (LULC) classification is a key research field in remote sensing. With recent developments of high-spatial-resolution sensors, Earth-observation technology offers a viable solution for land use/land cover identification and management in the rural part of the cities. There is a strong need to produce accurate, reliable, and up-to-date land use/land cover maps for sustainable monitoring and management. In this study, SPOT 7 imagery was used to test the potential of the data for land cover/land use mapping. Catalca is selected region located in the north west of the Istanbul in Turkey, which is mostly covered with agricultural fields and forest lands. The potentials of two classification algorithms maximum likelihood, and support vector machine, were tested, and accuracy assessment of the land cover maps was performed through error matrix and Kappa statistics. The results indicated that both of the selected classifiers were highly useful (over 83% accuracy) in the mapping of land use/cover in the study region. The support vector machine classification approach slightly outperformed the maximum likelihood classification in both overall accuracy and Kappa statistics.

  18. Android Malware Classification Using K-Means Clustering Algorithm

    Science.gov (United States)

    Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

    2017-08-01

    Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.

  19. Gaia eclipsing binary and multiple systems. Supervised classification and self-organizing maps

    Science.gov (United States)

    Süveges, M.; Barblan, F.; Lecoeur-Taïbi, I.; Prša, A.; Holl, B.; Eyer, L.; Kochoska, A.; Mowlavi, N.; Rimoldini, L.

    2017-07-01

    Context. Large surveys producing tera- and petabyte-scale databases require machine-learning and knowledge discovery methods to deal with the overwhelming quantity of data and the difficulties of extracting concise, meaningful information with reliable assessment of its uncertainty. This study investigates the potential of a few machine-learning methods for the automated analysis of eclipsing binaries in the data of such surveys. Aims: We aim to aid the extraction of samples of eclipsing binaries from such databases and to provide basic information about the objects. We intend to estimate class labels according to two different, well-known classification systems, one based on the light curve morphology (EA/EB/EW classes) and the other based on the physical characteristics of the binary system (system morphology classes; detached through overcontact systems). Furthermore, we explore low-dimensional surfaces along which the light curves of eclipsing binaries are concentrated, and consider their use in the characterization of the binary systems and in the exploration of biases of the full unknown Gaia data with respect to the training sets. Methods: We have explored the performance of principal component analysis (PCA), linear discriminant analysis (LDA), Random Forest classification and self-organizing maps (SOM) for the above aims. We pre-processed the photometric time series by combining a double Gaussian profile fit and a constrained smoothing spline, in order to de-noise and interpolate the observed light curves. We achieved further denoising, and selected the most important variability elements from the light curves using PCA. Supervised classification was performed using Random Forest and LDA based on the PC decomposition, while SOM gives a continuous 2-dimensional manifold of the light curves arranged by a few important features. We estimated the uncertainty of the supervised methods due to the specific finite training set using ensembles of models constructed

  20. An Improved Back Propagation Neural Network Algorithm on Classification Problems

    Science.gov (United States)

    Nawi, Nazri Mohd; Ransing, R. S.; Salleh, Mohd Najib Mohd; Ghazali, Rozaida; Hamid, Norhamreeza Abdul

    The back propagation algorithm is one the most popular algorithms to train feed forward neural networks. However, the convergence of this algorithm is slow, it is mainly because of gradient descent algorithm. Previous research demonstrated that in 'feed forward' algorithm, the slope of the activation function is directly influenced by a parameter referred to as 'gain'. This research proposed an algorithm for improving the performance of the back propagation algorithm by introducing the adaptive gain of the activation function. The gain values change adaptively for each node. The influence of the adaptive gain on the learning ability of a neural network is analysed. Multi layer feed forward neural networks have been assessed. Physical interpretation of the relationship between the gain value and the learning rate and weight values is given. The efficiency of the proposed algorithm is compared with conventional Gradient Descent Method and verified by means of simulation on four classification problems. In learning the patterns, the simulations result demonstrate that the proposed method converged faster on Wisconsin breast cancer with an improvement ratio of nearly 2.8, 1.76 on diabetes problem, 65% better on thyroid data sets and 97% faster on IRIS classification problem. The results clearly show that the proposed algorithm significantly improves the learning speed of the conventional back-propagation algorithm.

  1. Application of CART Algorithm in Blood Donors Classification

    Directory of Open Access Journals (Sweden)

    T. Santhanam

    2010-01-01

    Full Text Available Problem statement: This study used data mining modeling techniques to examine the blood donor classification. The availability of blood in blood banks is a critical and important aspect in a healthcare system. Blood banks (in the developing countries context are typically based on a healthy person voluntarily donating blood and is used for transfusions or made into medications. The ability to identify regular blood donors will enable blood banks and voluntary organizations to plan systematically for organizing blood donation camps in an effective manner. Approach: Identify the blood donation behavior using the classification algorithms of data mining. The analysis had been carried out using a standard blood transfusion dataset and using the CART decision tree algorithm implemented in Weka. Results: Numerical experimental results on the UCI ML blood transfusion data with the enhancements helped to identify donor classification. Conclusion: The CART derived model along with the extended definition for identifying regular voluntary donors provided a good classification accuracy based model.

  2. Classification of remote sensed data using Artificial Bee Colony algorithm

    Directory of Open Access Journals (Sweden)

    J. Jayanth

    2015-06-01

    Full Text Available The present study employs the traditional swarm intelligence technique in the classification of satellite data since the traditional statistical classification technique shows limited success in classifying remote sensing data. The traditional statistical classifiers examine only the spectral variance ignoring the spatial distribution of the pixels corresponding to the land cover classes and correlation between various bands. The Artificial Bee Colony (ABC algorithm based upon swarm intelligence which is used to characterise spatial variations within imagery as a means of extracting information forms the basis of object recognition and classification in several domains avoiding the issues related to band correlation. The results indicate that ABC algorithm shows an improvement of 5% overall classification accuracy at 6 classes over the traditional Maximum Likelihood Classifier (MLC and Artificial Neural Network (ANN and 3% against support vector machine.

  3. A Syntactic Classification based Web Page Ranking Algorithm

    CERN Document Server

    Mukhopadhyay, Debajyoti; Kim, Young-Chon

    2011-01-01

    The existing search engines sometimes give unsatisfactory search result for lack of any categorization of search result. If there is some means to know the preference of user about the search result and rank pages according to that preference, the result will be more useful and accurate to the user. In the present paper a web page ranking algorithm is being proposed based on syntactic classification of web pages. Syntactic Classification does not bother about the meaning of the content of a web page. The proposed approach mainly consists of three steps: select some properties of web pages based on user's demand, measure them, and give different weightage to each property during ranking for different types of pages. The existence of syntactic classification is supported by running fuzzy c-means algorithm and neural network classification on a set of web pages. The change in ranking for difference in type of pages but for same query string is also being demonstrated.

  4. AN ENHANCEMENT OF ASSOCIATION CLASSIFICATION ALGORITHM FOR IDENTIFYING PHISHING WEBSITES

    Directory of Open Access Journals (Sweden)

    G.Parthasarathy

    2016-08-01

    Full Text Available Phishing is a fraudulent activity that involves attacker creating a model of an existing web page in order to get more important information similar to credit card details, passwords etc., of the users. This paper is an enhancement of the existing association classification algorithm to detect the phishing websites. We can enhance the accuracy to a greater extent by applying the association rules into classification. In addition, we can also obtain some valuable information and rules which cannot be captured by using any other classification approaches. However the rule generation procedure is very time consuming while encountering large data set. The proposed algorithm makes use of Apriori algorithm for identifying frequent itemsets and hence derives a decision tree based on the features of URL.

  5. Protein fold classification with genetic algorithms and feature selection.

    Science.gov (United States)

    Chen, Peng; Liu, Chunmei; Burge, Legand; Mahmood, Mohammad; Southerland, William; Gloster, Clay

    2009-10-01

    Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors.

  6. Material classification and automatic content enrichment of images using supervised learning and knowledge bases

    Science.gov (United States)

    Mallepudi, Sri Abhishikth; Calix, Ricardo A.; Knapp, Gerald M.

    2011-02-01

    In recent years there has been a rapid increase in the size of video and image databases. Effective searching and retrieving of images from these databases is a significant current research area. In particular, there is a growing interest in query capabilities based on semantic image features such as objects, locations, and materials, known as content-based image retrieval. This study investigated mechanisms for identifying materials present in an image. These capabilities provide additional information impacting conditional probabilities about images (e.g. objects made of steel are more likely to be buildings). These capabilities are useful in Building Information Modeling (BIM) and in automatic enrichment of images. I2T methodologies are a way to enrich an image by generating text descriptions based on image analysis. In this work, a learning model is trained to detect certain materials in images. To train the model, an image dataset was constructed containing single material images of bricks, cloth, grass, sand, stones, and wood. For generalization purposes, an additional set of 50 images containing multiple materials (some not used in training) was constructed. Two different supervised learning classification models were investigated: a single multi-class SVM classifier, and multiple binary SVM classifiers (one per material). Image features included Gabor filter parameters for texture, and color histogram data for RGB components. All classification accuracy scores using the SVM-based method were above 85%. The second model helped in gathering more information from the images since it assigned multiple classes to the images. A framework for the I2T methodology is presented.

  7. Evaluation of partial classification algorithms using ROC curves.

    Science.gov (United States)

    Tusch, G

    1995-01-01

    When using computer programs for decision support in clinical routine, an assessment or a comparison of the underlying classification algorithms is essential. In classical (forced) classification, the classification rule always selects exactly one alternative. A number of proven discriminant measures are available here, e.g.sensitivity and error rate. For probabilistic classification, a series of additional measures has been developed [1]. However, for many clinical applications, there are models where an observation is classified into several classes (partial classification), e.g., models from artificial intelligence, decision analysis, or fuzzy set theory. In partial classification, the discriminatory ability (Murphy) can be adjusted a priori to any level, in most practical cases. Here the usual measures do not apply. We investigate the preconditions for assessment and comparison based on medical decision theory. We focus on problems in the medical domain and establish a methodological framework. When using partial classification procedures, a ROC analysis in the classical sense is no longer appropriate. In forced classification for two classes, the problem is to find a cutoff point on the ROC curve; while in partial classification, you have to find two of them. They characterize the elements being classified as coming from both classes. This extends to several classes. We propose measures corresponding to the usual discriminant measures for forced classification (e.g., sensitivity and error rate) and demonstrate the effects using the ROC approach. For this purpose, we extend the existing method for forced classification in a mathematically sound manner. Algorithms for the construction of thresholds can easily be adapted. Two specific measurement models, based on parametric and non-parametric approaches, will be introduced. The basic methodology is suitable for all partial classification problems, whereas the extended ROC analysis assumes a rank order of the

  8. Comparison research on iot oriented image classification algorithms

    Directory of Open Access Journals (Sweden)

    Du Ke

    2016-01-01

    Full Text Available Image classification belongs to the machine learning and computer vision fields, it aims to recognize and classify objects in the image contents. How to apply image classification algorithms to large-scale data in the IoT framework is the focus of current research. Based on Anaconda, this article implement sk-NN, SVM, Softmax and Neural Network algorithms by Python, performs data normalization, random search, HOG and colour histogram feature extraction to enhance the algorithms, experiments on them in CIFAR-10 datasets, then conducts comparison from three aspects of training time, test time and classification accuracy. The experimental results show that: the vectorized implementation of the algorithms is more efficient than the loop implementation; The training time of k-NN is the shortest, SVM and Softmax spend more time, and the training time of Neural Network is the longest; The test time of SVM, Softmax and Neural Network are much shorter than of k-NN; Neural Network gets the highest classification accuracy, SVM and Softmax get lower and approximate accuracies, and k-NN gets the lowest accuracy. The effects of three algorithm improvement methods are obvious.

  9. Semi-supervised classification of emotional pictures based on feature combination

    Science.gov (United States)

    Li, Shuo; Zhang, Yu-Jin

    2011-02-01

    Can the abundant emotions reflected in pictures be classified automatically by computer? Only the visual features extracted from images are considered in the previous researches, which have the constrained capability to reveal various emotions. In addition, the training database utilized by previous methods is the subset of International Affective Picture System (IAPS) that has a relatively small scale, which exerts negative effects on the discrimination of emotion classifiers. To solve the above problems, this paper proposes a novel and practical emotional picture classification approach, using semi-supervised learning scheme with both visual feature and keyword tag information. Besides the IAPS with both emotion labels and keyword tags as part of the training dataset, nearly 2000 pictures with only keyword tags that are downloaded from the website Flickr form an auxiliary training dataset. The visual feature of the latent emotional semantic factors is extracted by probabilistic Latent Semantic Analysis (pLSA) model, while the text feature is described by binary vectors on the tag vocabulary. A first Linear Programming Boost (LPBoost) classifier which is trained on the samples from IAPS combines the above two features, and aims to label the other training samples from the internet. Then the second SVM classifier which is trained on all training images using only visual feature, focuses on the test images. In the experiment, the categorization performance of our approach is better than the latest methods.

  10. Improving Landsat and IRS Image Classification: Evaluation of Unsupervised and Supervised Classification through Band Ratios and DEM in a Mountainous Landscape in Nepal

    Directory of Open Access Journals (Sweden)

    Krishna Bahadur K.C.

    2009-12-01

    Full Text Available Modification of the original bands and integration of ancillary data in digital image classification has been shown to improve land use land cover classification accuracy. There are not many studies demonstrating such techniques in the context of the mountains of Nepal. The objective of this study was to explore and evaluate the use of modified band and ancillary data in Landsat and IRS image classification, and to produce a land use land cover map of the Galaudu watershed of Nepal. Classification of land uses were explored using supervised and unsupervised classification for 12 feature sets containing the LandsatMSS, TM and IRS original bands, ratios, normalized difference vegetation index, principal components and a digital elevation model. Overall, the supervised classification method produced higher accuracy than the unsupervised approach. The result from the combination of bands ration 4/3, 5/4 and 5/7 ranked the highest in terms of accuracy (82.86%, while the combination of bands 2, 3 and 4 ranked the lowest (45.29%. Inclusion of DEM as a component band shows promising results.

  11. Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

    OpenAIRE

    Saadia Zahid; Fawad Hussain; Muhammad Rashid; Muhammad Haroon Yousaf; Hafiz Adnan Habib

    2015-01-01

    Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount o...

  12. Machine learning algorithms for mode-of-action classification in toxicity assessment.

    Science.gov (United States)

    Zhang, Yile; Wong, Yau Shu; Deng, Jian; Anton, Cristina; Gabos, Stephan; Zhang, Weiping; Huang, Dorothy Yu; Jin, Can

    2016-01-01

    Real Time Cell Analysis (RTCA) technology is used to monitor cellular changes continuously over the entire exposure period. Combining with different testing concentrations, the profiles have potential in probing the mode of action (MOA) of the testing substances. In this paper, we present machine learning approaches for MOA assessment. Computational tools based on artificial neural network (ANN) and support vector machine (SVM) are developed to analyze the time-concentration response curves (TCRCs) of human cell lines responding to tested chemicals. The techniques are capable of learning data from given TCRCs with known MOA information and then making MOA classification for the unknown toxicity. A novel data processing step based on wavelet transform is introduced to extract important features from the original TCRC data. From the dose response curves, time interval leading to higher classification success rate can be selected as input to enhance the performance of the machine learning algorithm. This is particularly helpful when handling cases with limited and imbalanced data. The validation of the proposed method is demonstrated by the supervised learning algorithm applied to the exposure data of HepG2 cell line to 63 chemicals with 11 concentrations in each test case. Classification success rate in the range of 85 to 95 % are obtained using SVM for MOA classification with two clusters to cases up to four clusters. Wavelet transform is capable of capturing important features of TCRCs for MOA classification. The proposed SVM scheme incorporated with wavelet transform has a great potential for large scale MOA classification and high-through output chemical screening.

  13. Backpropagation Learning Algorithms for Email Classification.

    Directory of Open Access Journals (Sweden)

    *David Ndumiyana and Tarirayi Mukabeta

    2016-07-01

    Full Text Available Today email has become one the fastest and most effective form of communication. The popularity of this mode of transmitting goods, information and services has motivated spammers to perfect their technical skills to fool spam filters. This development has worsened the problems faced by Internet users as they have to deal with email congestion, email overload and unprioritised email messages. The result was an exponential increase in the number of email classification management tools for the past few decades. In this paper we propose a new spam classifier using a learning process of multilayer neural network to implement back propagation technique. Our contribution to the body of knowledge is the use of an improved empirical analysis to choose an optimum, novel collection of attributes of a user’s email contents that allows a quick detection of most important words in emails. We also demonstrate the effectiveness of two equal sets of emails training and testing data.

  14. Algorithms for classification of astronomical object spectra

    Science.gov (United States)

    Wasiewicz, P.; Szuppe, J.; Hryniewicz, K.

    2015-09-01

    Obtaining interesting celestial objects from tens of thousands or even millions of recorded optical-ultraviolet spectra depends not only on the data quality but also on the accuracy of spectra decomposition. Additionally rapidly growing data volumes demands higher computing power and/or more efficient algorithms implementations. In this paper we speed up the process of substracting iron transitions and fitting Gaussian functions to emission peaks utilising C++ and OpenCL methods together with the NOSQL database. In this paper we implemented typical astronomical methods of detecting peaks in comparison to our previous hybrid methods implemented with CUDA.

  15. Automatic Mining of Numerical Classification Rules with Parliamentary Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    KIZILOLUK, S.

    2015-11-01

    Full Text Available In recent years, classification rules mining has been one of the most important data mining tasks. In this study, one of the newest social-based metaheuristic methods, Parliamentary Optimization Algorithm (POA, is firstly used for automatically mining of comprehensible and accurate classification rules within datasets which have numerical attributes. Four different numerical datasets have been selected from UCI data warehouse and classification rules of high quality have been obtained. Furthermore, the results obtained from designed POA have been compared with the results obtained from four different popular classification rules mining algorithms used in WEKA. Although POA is very new and no applications in complex data mining problems have been performed, the results seem promising. The used objective function is very flexible and many different objectives can easily be added to. The intervals of the numerical attributes in the rules have been automatically found without any a priori process, as done in other classification rules mining algorithms, which causes the modification of datasets.

  16. Robust evaluation of time series classification algorithms for structural health monitoring

    Science.gov (United States)

    Harvey, Dustin Y.; Worden, Keith; Todd, Michael D.

    2014-03-01

    Structural health monitoring (SHM) systems provide real-time damage and performance information for civil, aerospace, and mechanical infrastructure through analysis of structural response measurements. The supervised learning methodology for data-driven SHM involves computation of low-dimensional, damage-sensitive features from raw measurement data that are then used in conjunction with machine learning algorithms to detect, classify, and quantify damage states. However, these systems often suffer from performance degradation in real-world applications due to varying operational and environmental conditions. Probabilistic approaches to robust SHM system design suffer from incomplete knowledge of all conditions a system will experience over its lifetime. Info-gap decision theory enables nonprobabilistic evaluation of the robustness of competing models and systems in a variety of decision making applications. Previous work employed info-gap models to handle feature uncertainty when selecting various components of a supervised learning system, namely features from a pre-selected family and classifiers. In this work, the info-gap framework is extended to robust feature design and classifier selection for general time series classification through an efficient, interval arithmetic implementation of an info-gap data model. Experimental results are presented for a damage type classification problem on a ball bearing in a rotating machine. The info-gap framework in conjunction with an evolutionary feature design system allows for fully automated design of a time series classifier to meet performance requirements under maximum allowable uncertainty.

  17. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

    Science.gov (United States)

    Zhou, Fuqun; Zhang, Aining

    2016-10-25

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.

  18. Evaluation of registration, compression and classification algorithms. Volume 1: Results

    Science.gov (United States)

    Jayroe, R.; Atkinson, R.; Callas, L.; Hodges, J.; Gaggini, B.; Peterson, J.

    1979-01-01

    The registration, compression, and classification algorithms were selected on the basis that such a group would include most of the different and commonly used approaches. The results of the investigation indicate clearcut, cost effective choices for registering, compressing, and classifying multispectral imagery.

  19. Optimal classification of standoff bioaerosol measurements using evolutionary algorithms

    Science.gov (United States)

    Nyhavn, Ragnhild; Moen, Hans J. F.; Farsund, Øystein; Rustad, Gunnar

    2011-05-01

    Early warning systems based on standoff detection of biological aerosols require real-time signal processing of a large quantity of high-dimensional data, challenging the systems efficiency in terms of both computational complexity and classification accuracy. Hence, optimal feature selection is essential in forming a stable and efficient classification system. This involves finding optimal signal processing parameters, characteristic spectral frequencies and other data transformations in large magnitude variable space, stating the need for an efficient and smart search algorithm. Evolutionary algorithms are population-based optimization methods inspired by Darwinian evolutionary theory. These methods focus on application of selection, mutation and recombination on a population of competing solutions and optimize this set by evolving the population of solutions for each generation. We have employed genetic algorithms in the search for optimal feature selection and signal processing parameters for classification of biological agents. The experimental data were achieved with a spectrally resolved lidar based on ultraviolet laser induced fluorescence, and included several releases of 5 common simulants. The genetic algorithm outperform benchmark methods involving analytic, sequential and random methods like support vector machines, Fisher's linear discriminant and principal component analysis, with significantly improved classification accuracy compared to the best classical method.

  20. Optimization of deep learning algorithms for object classification

    Science.gov (United States)

    Horváth, András.

    2017-02-01

    Deep learning is currently the state of the art algorithm for image classification. The complexity of these feedforward neural networks have overcome a critical point, resulting algorithmic breakthroughs in various fields. On the other hand their complexity makes them executable in tasks, where High-throughput computing powers are available. The optimization of these networks -considering computational complexity and applicability on embedded systems- has not yet been studied and investigated in details. In this paper I show some examples how this algorithms can be optimized and accelerated on embedded systems.

  1. GLAST Burst Monitor Trigger Classification Algorithm

    Science.gov (United States)

    Perrin, D. J.; Sidman, E. D.; Meegan, C. A.; Briggs, M. S.; Connaughton, V.

    2004-01-01

    The Gamma Ray Large Area Space Telescope (GLAST), currently set for launch in the first quarter of 2007, will consist of two instruments, the GLAST Burst Monitor (GBM) and the Large Area Telescope (LAT). One of the goals of the GBM is to identify and locate gamma-ray bursts using on-board software. The GLAST observatory can then be re-oriented to allow observations by the LAT. A Bayesian analysis will be used to distinguish gamma-ray bursts from other triggering events, such as solar flares, magnetospheric particle precipitation, soft gamma repeaters (SGRs), and Cygnus X-1 flaring. The trigger parameters used in the analysis are the burst celestial coordinates, angle from the Earth's horizon, spectral hardness, and the spacecraft geomagnetic latitude. The algorithm will be described and the results of testing will be presented.

  2. Supervised, Multivariate, Whole-brain Reduction Did Not Help to Achieve High Classification Performance in Schizophrenia Research

    Directory of Open Access Journals (Sweden)

    Eva Janousova

    2016-08-01

    Full Text Available We examined how penalized linear discriminant analysis with resampling, which is a supervised, multivariate, whole-brain reduction technique, can help schizophrenia diagnostics and research. In an experiment with magnetic resonance brain images of 52 first-episode schizophrenia patients and 52 healthy controls, this method allowed us to select brain areas relevant to schizophrenia, such as the left prefrontal cortex, the anterior cingulum, the right anterior insula, the thalamus and the hippocampus. Nevertheless, the classification performance based on such reduced data was not significantly better than the classification of data reduced by mass univariate selection using a t-test or unsupervised multivariate reduction using principal component analysis. Moreover, we found no important influence of the type of imaging features, namely local deformations or grey matter volumes, and the classification method, specifically linear discriminant analysis or linear support vector machines, on the classification results. However, we ascertained significant effect of a cross-validation setting on classification performance as classification results were overestimated even though the resampling was performed during the selection of brain imaging features. Therefore, it is critically important to perform cross-validation in all steps of the analysis (not only during classification in case there is no external validation set to avoid optimistically biasing the results of classification studies.

  3. Supervised Learning in Multilayer Spiking Neural Networks

    CERN Document Server

    Sporea, Ioana

    2012-01-01

    The current article introduces a supervised learning algorithm for multilayer spiking neural networks. The algorithm presented here overcomes some limitations of existing learning algorithms as it can be applied to neurons firing multiple spikes and it can in principle be applied to any linearisable neuron model. The algorithm is applied successfully to various benchmarks, such as the XOR problem and the Iris data set, as well as complex classifications problems. The simulations also show the flexibility of this supervised learning algorithm which permits different encodings of the spike timing patterns, including precise spike trains encoding.

  4. Operational algorithm for ice-water classification on dual-polarized RADARSAT-2 images

    Science.gov (United States)

    Zakhvatkina, Natalia; Korosov, Anton; Muckenhuber, Stefan; Sandven, Stein; Babiker, Mohamed

    2017-01-01

    Synthetic Aperture Radar (SAR) data from RADARSAT-2 (RS2) in dual-polarization mode provide additional information for discriminating sea ice and open water compared to single-polarization data. We have developed an automatic algorithm based on dual-polarized RS2 SAR images to distinguish open water (rough and calm) and sea ice. Several technical issues inherent in RS2 data were solved in the pre-processing stage, including thermal noise reduction in HV polarization and correction of angular backscatter dependency in HH polarization. Texture features were explored and used in addition to supervised image classification based on the support vector machines (SVM) approach. The study was conducted in the ice-covered area between Greenland and Franz Josef Land. The algorithm has been trained using 24 RS2 scenes acquired in winter months in 2011 and 2012, and the results were validated against manually derived ice charts of the Norwegian Meteorological Institute. The algorithm was applied on a total of 2705 RS2 scenes obtained from 2013 to 2015, and the validation results showed that the average classification accuracy was 91 ± 4 %.

  5. Contact-state classification in human-demonstrated robot compliant motion tasks using the boosting algorithm.

    Science.gov (United States)

    Cabras, Stefano; Castellanos, María Eugenia; Staffetti, Ernesto

    2010-10-01

    Robot programming by demonstration is a robot programming paradigm in which a human operator directly demonstrates the task to be performed. In this paper, we focus on programming by demonstration of compliant motion tasks, which are tasks that involve contacts between an object manipulated by the robot and the environment in which it operates. Critical issues in this paradigm are to distinguish essential actions from those that are not relevant for the correct execution of the task and to transform this information into a robot-independent representation. Essential actions in compliant motion tasks are the contacts that take place, and therefore, it is important to understand the sequence of contact states that occur during a demonstration, called contact classification or contact segmentation. We propose a contact classification algorithm based on a supervised learning algorithm, in particular on a stochastic gradient boosting algorithm. The approach described in this paper is accurate and does not depend on the geometric model of the objects involved in the demonstration. It neither relies on the kinestatic model of the contact interactions nor on the contact state graph, whose computation is usually of prohibitive complexity even for very simple geometric object models.

  6. Novel Approaches for Diagnosing Melanoma Skin Lesions Through Supervised and Deep Learning Algorithms.

    Science.gov (United States)

    Premaladha, J; Ravichandran, K S

    2016-04-01

    Dermoscopy is a technique used to capture the images of skin, and these images are useful to analyze the different types of skin diseases. Malignant melanoma is a kind of skin cancer whose severity even leads to death. Earlier detection of melanoma prevents death and the clinicians can treat the patients to increase the chances of survival. Only few machine learning algorithms are developed to detect the melanoma using its features. This paper proposes a Computer Aided Diagnosis (CAD) system which equips efficient algorithms to classify and predict the melanoma. Enhancement of the images are done using Contrast Limited Adaptive Histogram Equalization technique (CLAHE) and median filter. A new segmentation algorithm called Normalized Otsu's Segmentation (NOS) is implemented to segment the affected skin lesion from the normal skin, which overcomes the problem of variable illumination. Fifteen features are derived and extracted from the segmented images are fed into the proposed classification techniques like Deep Learning based Neural Networks and Hybrid Adaboost-Support Vector Machine (SVM) algorithms. The proposed system is tested and validated with nearly 992 images (malignant & benign lesions) and it provides a high classification accuracy of 93 %. The proposed CAD system can assist the dermatologists to confirm the decision of the diagnosis and to avoid excisional biopsies.

  7. An ellipse detection algorithm based on edge classification

    Science.gov (United States)

    Yu, Liu; Chen, Feng; Huang, Jianming; Wei, Xiangquan

    2015-12-01

    In order to enhance the speed and accuracy of ellipse detection, an ellipse detection algorithm based on edge classification is proposed. Too many edge points are removed by making edge into point in serialized form and the distance constraint between the edge points. It achieves effective classification by the criteria of the angle between the edge points. And it makes the probability of randomly selecting the edge points falling on the same ellipse greatly increased. Ellipse fitting accuracy is significantly improved by the optimization of the RED algorithm. It uses Euclidean distance to measure the distance from the edge point to the elliptical boundary. Experimental results show that: it can detect ellipse well in case of edge with interference or edges blocking each other. It has higher detecting precision and less time consuming than the RED algorithm.

  8. Bayesian of inductive cognition algorithm for adaptive classification

    Science.gov (United States)

    Jin, Longcun; Wan, Wanggen; Cui, Bin; Wu, Yongliang

    2009-07-01

    In this paper, we proposed a Bayesian of inductive cognition algorithm using in virtual reality multimedia classification. We present a Bayesian of inductive cognition algorithm framework model for adaptively classifying scenes in virtual reality multimedia data. The Multimedia can switch between different shots, the unknown objects can leave or enter the scene at multiple times, and the scenes can be adaptively classified. The proposed algorithm consists of Bayesian inductive cognition part and Dirichlet process part. This algorithm has several advantages over traditional distance-based agglomerative adaptively classifying algorithms. Bayesian of inductive cognition algorithm based on Dirichlet process hypothesis testing is used to decide which merges are advantageous and to output the recommended depth of the scenes. The algorithm can be interpreted as a novel fast bottom-up approximate inference method for a Dirichlet process mixture model. We describe procedures for learning the model hyperparameters, computing the predictive distribution, and extensions to the Bayesian of inductive cognition algorithm. Experimental results on virtual reality multimedia data sets demonstrate useful properties of the Bayesian of inductive cognition algorithm.

  9. Supervised machine learning on a network scale: application to seismic event classification and detection

    Science.gov (United States)

    Reynen, Andrew; Audet, Pascal

    2017-09-01

    A new method using a machine learning technique is applied to event classification and detection at seismic networks. This method is applicable to a variety of network sizes and settings. The algorithm makes use of a small catalogue of known observations across the entire network. Two attributes, the polarization and frequency content, are used as input to regression. These attributes are extracted at predicted arrival times for P and S waves using only an approximate velocity model, as attributes are calculated over large time spans. This method of waveform characterization is shown to be able to distinguish between blasts and earthquakes with 99 per cent accuracy using a network of 13 stations located in Southern California. The combination of machine learning with generalized waveform features is further applied to event detection in Oklahoma, United States. The event detection algorithm makes use of a pair of unique seismic phases to locate events, with a precision directly related to the sampling rate of the generalized waveform features. Over a week of data from 30 stations in Oklahoma, United States are used to automatically detect 25 times more events than the catalogue of the local geological survey, with a false detection rate of less than 2 per cent. This method provides a highly confident way of detecting and locating events. Furthermore, a large number of seismic events can be automatically detected with low false alarm, allowing for a larger automatic event catalogue with a high degree of trust.

  10. 监督学习的发展动态%Current Directions in Supervised Learning Research

    Institute of Scientific and Technical Information of China (English)

    蒋艳凰; 周海芳; 杨学军

    2003-01-01

    Supervised learning is very important in machine learning area. It has been making great progress in manydirections. This article summarizes three of these directions ,which are the hot problems in supervised learning field.These three directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods forscaling up supervised learning algorithm, (c) extracting understandable rules from classifiers.

  11. Automated Quality Assessment of Structural Magnetic Resonance Brain Images Based on a Supervised Machine Learning Algorithm

    Directory of Open Access Journals (Sweden)

    Ricardo Andres Pizarro

    2016-12-01

    Full Text Available High-resolution three-dimensional magnetic resonance imaging (3D-MRI is being increasingly used to delineate morphological changes underlying neuropsychiatric disorders. Unfortunately, artifacts frequently compromise the utility of 3D-MRI yielding irreproducible results, from both type I and type II errors. It is therefore critical to screen 3D-MRIs for artifacts before use. Currently, quality assessment involves slice-wise visual inspection of 3D-MRI volumes, a procedure that is both subjective and time consuming. Automating the quality rating of 3D-MRI could improve the efficiency and reproducibility of the procedure. The present study is one of the first efforts to apply a support vector machine (SVM algorithm in the quality assessment of structural brain images, using global and region of interest (ROI automated image quality features developed in-house. SVM is a supervised machine-learning algorithm that can predict the category of test datasets based on the knowledge acquired from a learning dataset. The performance (accuracy of the automated SVM approach was assessed, by comparing the SVM-predicted quality labels to investigator-determined quality labels. The accuracy for classifying 1457 3D-MRI volumes from our database using the SVM approach is around 80%. These results are promising and illustrate the possibility of using SVM as an automated quality assessment tool for 3D-MRI.

  12. An algorithm for the arithmetic classification of multilattices

    CERN Document Server

    Indelicato, Giuliana

    2009-01-01

    A procedure for the construction and the classification of multilattices in arbitrary dimension is proposed. The algorithm allows to determine explicitly the location of the points of a multilattice given its space group, and to determine whether two multilattices are arithmetically equivalent. The algorithm is based on ideas from integer matrix theory, in particular the reduction to the Smith normal form. Among the applications of this procedure is a software package that allows the classification of complex crystalline structures and the determination of their space groups. Also, it can be used to determine the symmetry of regular systems of points in high dimension, with applications to the study of quasicrystals and sets of points with noncrystallographic symmetry in low dimension, such as viral capsid structures.

  13. a Review of Point Clouds Segmentation and Classification Algorithms

    Science.gov (United States)

    Grilli, E.; Menna, F.; Remondino, F.

    2017-02-01

    Today 3D models and point clouds are very popular being currently used in several fields, shared through the internet and even accessed on mobile phones. Despite their broad availability, there is still a relevant need of methods, preferably automatic, to provide 3D data with meaningful attributes that characterize and provide significance to the objects represented in 3D. Segmentation is the process of grouping point clouds into multiple homogeneous regions with similar properties whereas classification is the step that labels these regions. The main goal of this paper is to analyse the most popular methodologies and algorithms to segment and classify 3D point clouds. Strong and weak points of the different solutions presented in literature or implemented in commercial software will be listed and shortly explained. For some algorithms, the results of the segmentation and classification is shown using real examples at different scale in the Cultural Heritage field. Finally, open issues and research topics will be discussed.

  14. New algorithm of target classification in polarimetric SAR

    Institute of Scientific and Technical Information of China (English)

    Wang Yang; Lu Jiaguo; Wu Xianliang

    2008-01-01

    The different approaches used for target decomposition (TD) theory in radar polarimetry are reviewed and three main types of theorems are introduced: those based on Mueller matrix, those using an eigenvector analysis of the coherency matrix, and those employing coherent decomposition of the scattering matrix. Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated success in many fields. A new algorithm of target classification, by combining target decomposition and the support vector machine, is proposed.To conduct the experiment, the polarimetric synthetic aperture radar (SAR) data are used. Experimental results show that it is feasible and efficient to target classification by applying target decomposition to extract scattering mechanisms, and the effects of kernel function and its parameters on the classification efficiency are significant.

  15. DTL: a language to assist cardiologists in improving classification algorithms.

    Science.gov (United States)

    Kors, J A; Kamp, D M; Henkemans, D P; van Bemmel, J H

    1991-06-01

    Heuristic classifiers, e.g., for diagnostic classification of the electrocardiogram, can be very complex. The development and refinement of such classifiers is cumbersome and time-consuming. Generally, it requires a computer expert to implement the cardiologist's diagnostic reasoning into computer language. The average cardiologist, however, is not able to verify whether his intentions have been properly realized and perform as he hoped for. But also for the initiated, it often remains obscure how a particular result was reached by a complex classification program. An environment is presented which solves these problems. The environment consists of a language, DTL (Decision Tree Language), that allows cardiologists to express their classification algorithms in a way that is familiar to them, and an interpreter and translator for that language. The considerations in the design of DTL are described and the structure and capabilities of the interpreter and translator are discussed.

  16. Low-Level Vision Algorithms for Localization, Classification, and Tracking

    OpenAIRE

    Kevin N. Gabayan

    2003-01-01

    Camera networks can provide images of detected objects that vary in perspective and level of obstruction. To improve the understanding of visual events, vision algorithms are implemented in a wireless sensor network. Methods were developed to fuse data from multiple cameras to improve object identification and location in the presence of obstructions. Training sets of images allow classification of objects into familiar categories. Feature-based object correspondence is used to track multiple...

  17. Protein sequence classification with improved extreme learning machine algorithms.

    Science.gov (United States)

    Cao, Jiuwen; Xiong, Lianglin

    2014-01-01

    Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms.

  18. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  19. Feature extraction and classification algorithms for high dimensional data

    Science.gov (United States)

    Lee, Chulhee; Landgrebe, David

    1993-01-01

    Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized

  20. Improvements on coronal hole detection in SDO/AIA images using supervised classification

    Science.gov (United States)

    Reiss, Martin A.; Hofmeister, Stefan J.; De Visscher, Ruben; Temmer, Manuela; Veronig, Astrid M.; Delouille, Véronique; Mampaey, Benjamin; Ahammer, Helmut

    2015-07-01

    We demonstrate the use of machine learning algorithms in combination with segmentation techniques in order to distinguish coronal holes and filaments in SDO/AIA EUV images of the Sun. Based on two coronal hole detection techniques (intensity-based thresholding, SPoCA), we prepared datasets of manually labeled coronal hole and filament channel regions present on the Sun during the time range 2011-2013. By mapping the extracted regions from EUV observations onto HMI line-of-sight magnetograms we also include their magnetic characteristics. We computed shape measures from the segmented binary maps as well as first order and second order texture statistics from the segmented regions in the EUV images and magnetograms. These attributes were used for data mining investigations to identify the most performant rule to differentiate between coronal holes and filament channels. We applied several classifiers, namely Support Vector Machine (SVM), Linear Support Vector Machine, Decision Tree, and Random Forest, and found that all classification rules achieve good results in general, with linear SVM providing the best performances (with a true skill statistic of ≈ 0.90). Additional information from magnetic field data systematically improves the performance across all four classifiers for the SPoCA detection. Since the calculation is inexpensive in computing time, this approach is well suited for applications on real-time data. This study demonstrates how a machine learning approach may help improve upon an unsupervised feature extraction method.

  1. Exploring high dimensional data with Butterfly: a novel classification algorithm based on discrete dynamical systems.

    Science.gov (United States)

    Geraci, Joseph; Dharsee, Moyez; Nuin, Paulo; Haslehurst, Alexandria; Koti, Madhuri; Feilotter, Harriet E; Evans, Ken

    2014-03-01

    We introduce a novel method for visualizing high dimensional data via a discrete dynamical system. This method provides a 2D representation of the relationship between subjects according to a set of variables without geometric projections, transformed axes or principal components. The algorithm exploits a memory-type mechanism inherent in a certain class of discrete dynamical systems collectively referred to as the chaos game that are closely related to iterative function systems. The goal of the algorithm was to create a human readable representation of high dimensional patient data that was capable of detecting unrevealed subclusters of patients from within anticipated classifications. This provides a mechanism to further pursue a more personalized exploration of pathology when used with medical data. For clustering and classification protocols, the dynamical system portion of the algorithm is designed to come after some feature selection filter and before some model evaluation (e.g. clustering accuracy) protocol. In the version given here, a univariate features selection step is performed (in practice more complex feature selection methods are used), a discrete dynamical system is driven by this reduced set of variables (which results in a set of 2D cluster models), these models are evaluated for their accuracy (according to a user-defined binary classification) and finally a visual representation of the top classification models are returned. Thus, in addition to the visualization component, this methodology can be used for both supervised and unsupervised machine learning as the top performing models are returned in the protocol we describe here. Butterfly, the algorithm we introduce and provide working code for, uses a discrete dynamical system to classify high dimensional data and provide a 2D representation of the relationship between subjects. We report results on three datasets (two in the article; one in the appendix) including a public lung cancer

  2. Performance comparison of SLFN training algorithms for DNA microarray classification.

    Science.gov (United States)

    Huynh, Hieu Trung; Kim, Jung-Ja; Won, Yonggwan

    2011-01-01

    The classification of biological samples measured by DNA microarrays has been a major topic of interest in the last decade, and several approaches to this topic have been investigated. However, till now, classifying the high-dimensional data of microarrays still presents a challenge to researchers. In this chapter, we focus on evaluating the performance of the training algorithms of the single hidden layer feedforward neural networks (SLFNs) to classify DNA microarrays. The training algorithms consist of backpropagation (BP), extreme learning machine (ELM) and regularized least squares ELM (RLS-ELM), and an effective algorithm called neural-SVD has recently been proposed. We also compare the performance of the neural network approaches with popular classifiers such as support vector machine (SVM), principle component analysis (PCA) and fisher discriminant analysis (FDA).

  3. Preliminary results from the ASF/GPS ice classification algorithm

    Science.gov (United States)

    Cunningham, G.; Kwok, R.; Holt, B.

    1992-01-01

    The European Space Agency Remote Sensing Satellite (ERS-1) satellite carried a C-band synthetic aperture radar (SAR) to study the earth's polar regions. The radar returns from sea ice can be used to infer properties of ice, including ice type. An algorithm has been developed for the Alaska SAR facility (ASF)/Geophysical Processor System (GPS) to infer ice type from the SAR observations over sea ice and open water. The algorithm utilizes look-up tables containing expected backscatter values from various ice types. An analysis has been made of two overlapping strips with 14 SAR images. The backscatter values of specific ice regions were sampled to study the backscatter characteristics of the ice in time and space. Results show both stability of the backscatter values in time and a good separation of multiyear and first-year ice signals, verifying the approach used in the classification algorithm.

  4. An Automated Algorithm to Screen Massive Training Samples for a Global Impervious Surface Classification

    Science.gov (United States)

    Tan, Bin; Brown de Colstoun, Eric; Wolfe, Robert E.; Tilton, James C.; Huang, Chengquan; Smith, Sarah E.

    2012-01-01

    An algorithm is developed to automatically screen the outliers from massive training samples for Global Land Survey - Imperviousness Mapping Project (GLS-IMP). GLS-IMP is to produce a global 30 m spatial resolution impervious cover data set for years 2000 and 2010 based on the Landsat Global Land Survey (GLS) data set. This unprecedented high resolution impervious cover data set is not only significant to the urbanization studies but also desired by the global carbon, hydrology, and energy balance researches. A supervised classification method, regression tree, is applied in this project. A set of accurate training samples is the key to the supervised classifications. Here we developed the global scale training samples from 1 m or so resolution fine resolution satellite data (Quickbird and Worldview2), and then aggregate the fine resolution impervious cover map to 30 m resolution. In order to improve the classification accuracy, the training samples should be screened before used to train the regression tree. It is impossible to manually screen 30 m resolution training samples collected globally. For example, in Europe only, there are 174 training sites. The size of the sites ranges from 4.5 km by 4.5 km to 8.1 km by 3.6 km. The amount training samples are over six millions. Therefore, we develop this automated statistic based algorithm to screen the training samples in two levels: site and scene level. At the site level, all the training samples are divided to 10 groups according to the percentage of the impervious surface within a sample pixel. The samples following in each 10% forms one group. For each group, both univariate and multivariate outliers are detected and removed. Then the screen process escalates to the scene level. A similar screen process but with a looser threshold is applied on the scene level considering the possible variance due to the site difference. We do not perform the screen process across the scenes because the scenes might vary due to

  5. A Generalized Image Scene Decomposition-Based System for Supervised Classification of Very High Resolution Remote Sensing Imagery

    Directory of Open Access Journals (Sweden)

    ZhiYong Lv

    2016-09-01

    Full Text Available Very high resolution (VHR remote sensing images are widely used for land cover classification. However, to the best of our knowledge, few approaches have been shown to improve classification accuracies through image scene decomposition. In this paper, a simple yet powerful observational scene scale decomposition (OSSD-based system is proposed for the classification of VHR images. Different from the traditional methods, the OSSD-based system aims to improve the classification performance by decomposing the complexity of an image’s content. First, an image scene is divided into sub-image blocks through segmentation to decompose the image content. Subsequently, each sub-image block is classified respectively, or each block is processed firstly through an image filter or spectral–spatial feature extraction method, and then each processed segment is taken as the feature input of a classifier. Finally, classified sub-maps are fused together for accuracy evaluation. The effectiveness of our proposed approach was investigated through experiments performed on different images with different supervised classifiers, namely, support vector machine, k-nearest neighbor, naive Bayes classifier, and maximum likelihood classifier. Compared with the accuracy achieved without OSSD processing, the accuracy of each classifier improved significantly, and our proposed approach shows outstanding performance in terms of classification accuracy.

  6. Slow Learner Prediction using Multi-Variate Naïve Bayes Classification Algorithm

    Directory of Open Access Journals (Sweden)

    Shiwani Rana

    2016-12-01

    Full Text Available Machine Learning is a field of computer science that learns from data by studying algorithms and their constructions. In machine learning, for specific inputs, algorithms help to make predictions. Classification is a supervised learning approach, which maps a data item into predefined classes. For predicting slow learners in an institute, a modified Naïve Bayes algorithm implemented. The implementation is carried sing Python.  It takes into account a combination of likewise multi-valued attributes. A dataset of the 60 students of BE (Information Technology Third Semester for the subject of Digital Electronics of University Institute of Engineering and Technology (UIET, Panjab University (PU, Chandigarh, India is taken to carry out the simulations. The analysis is done by choosing most significant forty-eight attributes. The experimental results have shown that the modified Naïve Bayes model has outperformed the Naïve Bayes Classifier in accuracy but requires significant improvement in the terms of elapsed time. By using Modified Naïve Bayes approach, the accuracy is found out to be 71.66% whereas it is calculated 66.66% using existing Naïve Bayes model. Further, a comparison is drawn by using WEKA tool. Here, an accuracy of Naïve Bayes is obtained as 58.33 %.

  7. How to measure metallicity from five-band photometry with supervised machine learning algorithms

    CERN Document Server

    Acquaviva, Viviana

    2015-01-01

    We demonstrate that it is possible to measure metallicity from the SDSS five-band photometry to better than 0.1 dex using supervised machine learning algorithms. Using spectroscopic estimates of metallicity as ground truth, we build, optimize and train several estimators to predict metallicity. We use the observed photometry, as well as derived quantities such as stellar mass and photometric redshift, as features, and we build two sample data sets at median redshifts of 0.103 and 0.218 and median r-band magnitude of 17.5 and 18.3 respectively. We find that ensemble methods, such as Random Forests of Trees and Extremely Randomized Trees, and Support Vector Machines all perform comparably well and can measure metallicity with a Root Mean Square Error (RMSE) of 0.081 and 0.090 for the two data sets when all objects are included. The fraction of outliers (objects for which the difference between true and predicted metallicity is larger than 0.2 dex) is only 2.2 and 3.9% respectively, and the RMSE decreases to 0.0...

  8. Novel Approach to Unsupervised Change Detection Based on a Robust Semi-Supervised FCM Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Pan Shao

    2016-03-01

    Full Text Available This study presents a novel approach for unsupervised change detection in multitemporal remotely sensed images. This method addresses the problem of the analysis of the difference image by proposing a novel and robust semi-supervised fuzzy C-means (RSFCM clustering algorithm. The advantage of the RSFCM is to further introduce the pseudolabels from the difference image compared with the existing change detection methods; these methods, mainly use difference intensity levels and spatial context. First, the patterns with a high probability of belonging to the changed or unchanged class are identified by selectively thresholding the difference image histogram. Second, the pseudolabels of these nearly certain pixel-patterns are jointly exploited with the intensity levels and spatial information in the properly defined RSFCM classifier in order to discriminate the changed pixels from the unchanged pixels. Specifically, labeling knowledge is used to guide the RSFCM clustering process to enhance the change information and obtain a more accurate membership; information on spatial context helps to lower the effect of noise and outliers by modifying the membership. RSFCM can detect more changes and provide noise immunity by the synergistic exploitation of pseudolabels and spatial context. The two main contributions of this study are as follows: (1 it proposes the idea of combining the three information types from the difference image, namely, (a intensity levels, (b labels, and (c spatial context; and (2 it develops the novel RSFCM algorithm for image segmentation and forms the proposed change detection framework. The proposed method is effective and efficient for change detection as confirmed by six experimental results of this study.

  9. Unsupervised classification of multivariate geostatistical data: Two algorithms

    Science.gov (United States)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  10. Improved RMR Rock Mass Classification Using Artificial Intelligence Algorithms

    Science.gov (United States)

    Gholami, Raoof; Rasouli, Vamegh; Alimoradi, Andisheh

    2013-09-01

    Rock mass classification systems such as rock mass rating (RMR) are very reliable means to provide information about the quality of rocks surrounding a structure as well as to propose suitable support systems for unstable regions. Many correlations have been proposed to relate measured quantities such as wave velocity to rock mass classification systems to limit the associated time and cost of conducting the sampling and mechanical tests conventionally used to calculate RMR values. However, these empirical correlations have been found to be unreliable, as they usually overestimate or underestimate the RMR value. The aim of this paper is to compare the results of RMR classification obtained from the use of empirical correlations versus machine-learning methodologies based on artificial intelligence algorithms. The proposed methods were verified based on two case studies located in northern Iran. Relevance vector regression (RVR) and support vector regression (SVR), as two robust machine-learning methodologies, were used to predict the RMR for tunnel host rocks. RMR values already obtained by sampling and site investigation at one tunnel were taken into account as the output of the artificial networks during training and testing phases. The results reveal that use of empirical correlations overestimates the predicted RMR values. RVR and SVR, however, showed more reliable results, and are therefore suggested for use in RMR classification for design purposes of rock structures.

  11. Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2013-01-01

    Full Text Available Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP implementation and two pure Register-Transfer Level (RTL implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.

  12. Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

    Directory of Open Access Journals (Sweden)

    Saadia Zahid

    2015-01-01

    Full Text Available Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs with artificial neural networks (ANNs. Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and lastly, speech segment is classified into silence and pure-speech segments on the basis of rule-based classifier. Minimum data is used for training classifier; ensemble methods are used for minimizing misclassification rate and approximately 98% accurate segments are obtained. A fast and efficient algorithm is designed that can be used with real-time multimedia applications.

  13. Implementation of several mathematical algorithms to breast tissue density classification

    Science.gov (United States)

    Quintana, C.; Redondo, M.; Tirao, G.

    2014-02-01

    The accuracy of mammographic abnormality detection methods is strongly dependent on breast tissue characteristics, where a dense breast tissue can hide lesions causing cancer to be detected at later stages. In addition, breast tissue density is widely accepted to be an important risk indicator for the development of breast cancer. This paper presents the implementation and the performance of different mathematical algorithms designed to standardize the categorization of mammographic images, according to the American College of Radiology classifications. These mathematical techniques are based on intrinsic properties calculations and on comparison with an ideal homogeneous image (joint entropy, mutual information, normalized cross correlation and index Q) as categorization parameters. The algorithms evaluation was performed on 100 cases of the mammographic data sets provided by the Ministerio de Salud de la Provincia de Córdoba, Argentina—Programa de Prevención del Cáncer de Mama (Department of Public Health, Córdoba, Argentina, Breast Cancer Prevention Program). The obtained breast classifications were compared with the expert medical diagnostics, showing a good performance. The implemented algorithms revealed a high potentiality to classify breasts into tissue density categories.

  14. Review of WiMAX Scheduling Algorithms and Their Classification

    Science.gov (United States)

    Yadav, A. L.; Vyavahare, P. D.; Bansod, P. P.

    2014-07-01

    Providing quality of service (QoS) in wireless communication networks has become an important consideration for supporting variety of applications. IEEE 802.16 based WiMAX is the most promising technology for broadband wireless access with best QoS features for tripe play (voice, video and data) service users. Unlike wired networks, QoS support is difficult in wireless networks due to variable and unpredictable nature of wireless channels. In transmission of voice and video main issue involves allocation of available resources among the users to meet QoS criteria such as delay, jitter and throughput requirements to maximize goodput, to minimize power consumption while keeping feasible algorithm flexibility and ensuring system scalability. WiMAX assures guaranteed QoS by including several mechanisms at the MAC layer such as admission control and scheduling. Packet scheduling is a process of resolving contention for bandwidth which determines allocation of bandwidth among users and their transmission order. Various approaches for classification of scheduling algorithms in WiMAX have appeared in literature as homogeneous, hybrid and opportunistic scheduling algorithms. The paper consolidates the parameters and performance metrics that need to be considered in developing a scheduler. The paper surveys recently proposed scheduling algorithms, their shortcomings, assumptions, suitability and improvement issues associated with these uplink scheduling algorithms.

  15. Supervised classification of aerial imagery and multi-source data fusion for flood assessment

    Science.gov (United States)

    Sava, E.; Harding, L.; Cervone, G.

    2015-12-01

    Floods are among the most devastating natural hazards and the ability to produce an accurate and timely flood assessment before, during, and after an event is critical for their mitigation and response. Remote sensing technologies have become the de-facto approach for observing the Earth and its environment. However, satellite remote sensing data are not always available. For these reasons, it is crucial to develop new techniques in order to produce flood assessments during and after an event. Recent advancements in data fusion techniques of remote sensing with near real time heterogeneous datasets have allowed emergency responders to more efficiently extract increasingly precise and relevant knowledge from the available information. This research presents a fusion technique using satellite remote sensing imagery coupled with non-authoritative data such as Civil Air Patrol (CAP) and tweets. A new computational methodology is proposed based on machine learning algorithms to automatically identify water pixels in CAP imagery. Specifically, wavelet transformations are paired with multiple classifiers, run in parallel, to build models discriminating water and non-water regions. The learned classification models are first tested against a set of control cases, and then used to automatically classify each image separately. A measure of uncertainty is computed for each pixel in an image proportional to the number of models classifying the pixel as water. Geo-tagged tweets are continuously harvested and stored on a MongoDB and queried in real time. They are fused with CAP classified data, and with satellite remote sensing derived flood extent results to produce comprehensive flood assessment maps. The final maps are then compared with FEMA generated flood extents to assess their accuracy. The proposed methodology is applied on two test cases, relative to the 2013 floods in Boulder CO, and the 2015 floods in Texas.

  16. Video Analytics Algorithm for Automatic Vehicle Classification (Intelligent Transport System

    Directory of Open Access Journals (Sweden)

    ArtaIftikhar

    2013-04-01

    Full Text Available Automated Vehicle detection and classification is an important component of intelligent transport system. Due to significant importance in various fields such as traffic accidents avoidance, toll collection, congestion avoidance, terrorist activities monitoring, security and surveillance systems, intelligent transport system has become important field of study. Various technologies have been used for detecting and classifying vehicles automatically. Automated vehicle detection is broadly divided into two types- Hardware based and software based detection. Various algorithms have been implemented to classify different vehicles from videos. In this paper an efficient and economical solution for automatic vehicle detection and classification is proposed. The proposed system first isolates the object through background subtraction followed by vehicle detection using ontology. Vehicle detection is based on low level features such as shape, size, and spatial location. Finally system classifies vehicles into one of the known classes of vehicle based on size.

  17. 基于半监督学习的Web页面内容分类技术研究%Study on Web page content classification technology based on semi-supervised learning

    Institute of Scientific and Technical Information of China (English)

    赵夫群

    2016-01-01

    For the key issues that how to use labeled and unlabeled data to conduct Web classification,a classifier of com-bining generative model with discriminative model is explored. The maximum likelihood estimation is adopted in the unlabeled training set to construct a semi-supervised classifier with high classification performance. The Dirichlet-polynomial mixed distri-bution is used to model the text,and then a hybrid model which is suitable for the semi-supervised learning is proposed. Since the EM algorithm for the semi-supervised learning has fast convergence rate and is easy to fall into local optimum,two intelli-gent optimization methods of simulated annealing algorithm and genetic algorithm are introduced,analyzed and processed. A new intelligent semi-supervised classification algorithm was generated by combing the two algorithms,and the feasibility of the algorithm was verified.%针对如何使用标记和未标记数据进行Web分类这一关键性问题,探索一种生成模型和判别模型相互结合的分类器,在无标记训练集中采用最大似然估计,构造一种具有良好分类性能的半监督分类器.利用狄利克雷-多项式混合分布对文本进行建模,提出了适用于半监督学习的混合模型.针对半监督学习的EM算法收敛速度过快,容易陷入局部最优的难题,引入两种智能优化的方法——模拟退火算法和遗传算法进行分析和处理,结合这两种算法形成一种新型智能的半监督分类算法,并且验证了该算法的可行性.

  18. Supervised Classification of Benthic Reflectance in Shallow Subtropical Waters Using a Generalized Pixel-Based Classifier across a Time Series

    Directory of Open Access Journals (Sweden)

    Tara Blakey

    2015-04-01

    Full Text Available We tested a supervised classification approach with Landsat 5 Thematic Mapper (TM data for time-series mapping of seagrass in a subtropical lagoon. Seagrass meadows are an integral link between marine and inland ecosystems and are at risk from upstream processes such as runoff and erosion. Despite the prevalence of image-specific approaches, the classification accuracies we achieved show that pixel-based spectral classes may be generalized and applied to a time series of images that were not included in the classifier training. We employed in-situ data on seagrass abundance from 2007 to 2011 to train and validate a classification model. We created depth-invariant bands from TM bands 1, 2, and 3 to correct for variations in water column depth prior to building the classification model. In-situ data showed mean total seagrass cover remained relatively stable over the study area and period, with seagrass cover generally denser in the west than the east. Our approach achieved mapping accuracies (67% and 76% for two validation years comparable with those attained using spectral libraries, but was simpler to implement. We produced a series of annual maps illustrating inter-annual variability in seagrass occurrence. Accuracies may be improved in future work by better addressing the spatial mismatch between pixel size of remotely sensed data and footprint of field data and by employing atmospheric correction techniques that normalize reflectances across images.

  19. Classification of damage in structural systems using time series analysis and supervised and unsupervised pattern recognition techniques

    Science.gov (United States)

    Omenzetter, Piotr; de Lautour, Oliver R.

    2010-04-01

    Developed for studying long, periodic records of various measured quantities, time series analysis methods are inherently suited and offer interesting possibilities for Structural Health Monitoring (SHM) applications. However, their use in SHM can still be regarded as an emerging application and deserves more studies. In this research, Autoregressive (AR) models were used to fit experimental acceleration time histories from two experimental structural systems, a 3- storey bookshelf-type laboratory structure and the ASCE Phase II SHM Benchmark Structure, in healthy and several damaged states. The coefficients of the AR models were chosen as damage sensitive features. Preliminary visual inspection of the large, multidimensional sets of AR coefficients to check the presence of clusters corresponding to different damage severities was achieved using Sammon mapping - an efficient nonlinear data compression technique. Systematic classification of damage into states based on the analysis of the AR coefficients was achieved using two supervised classification techniques: Nearest Neighbor Classification (NNC) and Learning Vector Quantization (LVQ), and one unsupervised technique: Self-organizing Maps (SOM). This paper discusses the performance of AR coefficients as damage sensitive features and compares the efficiency of the three classification techniques using experimental data.

  20. A Computational Algorithm for Metrical Classification of Verse

    Directory of Open Access Journals (Sweden)

    Rama N.

    2010-03-01

    Full Text Available The science of versification and analysis of verse in Sanskrit is governed by rules of metre or chandas. Such metre-wise classification of verses has numerous uses for scholars and researchers alike, such as in the study of poets and their style of Sanskrit poetical works. This paper presents a comprehensive computational scheme and set of algorithms to identify the metre of verses given as Sanskrit (Unicode or English E-text (Latin Unicode. The paper also demonstrates the use of euphonic conjunction rules to correct verses in which these conjunctions, which are compulsory in verse, have erroneously not been implemented.

  1. Chaotic genetic algorithm for gene selection and classification problems.

    Science.gov (United States)

    Chuang, Li-Yeh; Yang, Cheng-San; Li, Jung-Chike; Yang, Cheng-Hong

    2009-10-01

    Pattern recognition techniques suffer from a well-known curse, the dimensionality problem. The microarray data classification problem is a classical complex pattern recognition problem. Selecting relevant genes from microarray data poses a formidable challenge to researchers due to the high-dimensionality of features, multiclass categories being involved, and the usually small sample size. The goal of feature (gene) selection is to select those subsets of differentially expressed genes that are potentially relevant for distinguishing the sample classes. In this paper, information gain and chaotic genetic algorithm are proposed for the selection of relevant genes, and a K-nearest neighbor with the leave-one-out crossvalidation method serves as a classifier. The chaotic genetic algorithm is modified by using the chaotic mutation operator to increase the population diversity. The enhanced population diversity expands the GA's search ability. The proposed approach is tested on 10 microarray data sets from the literature. The experimental results show that the proposed method not only effectively reduced the number of gene expression levels, but also achieved lower classification error rates than other methods.

  2. Neighborhood Hypergraph Based Classification Algorithm for Incomplete Information System

    Directory of Open Access Journals (Sweden)

    Feng Hu

    2015-01-01

    Full Text Available The problem of classification in incomplete information system is a hot issue in intelligent information processing. Hypergraph is a new intelligent method for machine learning. However, it is hard to process the incomplete information system by the traditional hypergraph, which is due to two reasons: (1 the hyperedges are generated randomly in traditional hypergraph model; (2 the existing methods are unsuitable to deal with incomplete information system, for the sake of missing values in incomplete information system. In this paper, we propose a novel classification algorithm for incomplete information system based on hypergraph model and rough set theory. Firstly, we initialize the hypergraph. Second, we classify the training set by neighborhood hypergraph. Third, under the guidance of rough set, we replace the poor hyperedges. After that, we can obtain a good classifier. The proposed approach is tested on 15 data sets from UCI machine learning repository. Furthermore, it is compared with some existing methods, such as C4.5, SVM, NavieBayes, and KNN. The experimental results show that the proposed algorithm has better performance via Precision, Recall, AUC, and F-measure.

  3. Multiple signal classification algorithm for super-resolution fluorescence microscopy

    Science.gov (United States)

    Agarwal, Krishna; Macháň, Radek

    2016-12-01

    Single-molecule localization techniques are restricted by long acquisition and computational times, or the need of special fluorophores or biologically toxic photochemical environments. Here we propose a statistical super-resolution technique of wide-field fluorescence microscopy we call the multiple signal classification algorithm which has several advantages. It provides resolution down to at least 50 nm, requires fewer frames and lower excitation power and works even at high fluorophore concentrations. Further, it works with any fluorophore that exhibits blinking on the timescale of the recording. The multiple signal classification algorithm shows comparable or better performance in comparison with single-molecule localization techniques and four contemporary statistical super-resolution methods for experiments of in vitro actin filaments and other independently acquired experimental data sets. We also demonstrate super-resolution at timescales of 245 ms (using 49 frames acquired at 200 frames per second) in samples of live-cell microtubules and live-cell actin filaments imaged without imaging buffers.

  4. The CR‐Ω+ Classification Algorithm for Spatio‐Temporal Prediction of Criminal Activity

    Directory of Open Access Journals (Sweden)

    S. Godoy‐Calderón

    2010-04-01

    Full Text Available We present a spatio‐temporal prediction model that allows forecasting of the criminal activity behavior in a particular region byusing supervised classification. The degree of membership of each pattern is interpreted as the forecasted increase or decreasein the criminal activity for the specified time and location. The proposed forecasting model (CR‐Ω+ is based on the family ofKora‐Ω Logical‐Combinatorial algorithms operating on large data volumes from several heterogeneous sources using aninductive learning process. We propose several modifications to the original algorithms by Bongard and Baskakova andZhuravlëv which improve the prediction performance on the studied dataset of criminal activity. We perform two analyses:punctual prediction and tendency analysis, which show that it is possible to predict punctually one of four crimes to beperpetrated (crime family, in a specific space and time, and 66% of effectiveness in the prediction of the place of crime, despiteof the noise of the dataset. The tendency analysis yielded an STRMSE (Spatio‐Temporal RMSE of less than 1.0.

  5. Classification algorithms for predicting sleepiness and sleep apnea severity.

    Science.gov (United States)

    Eiseman, Nathaniel A; Westover, M Brandon; Mietus, Joseph E; Thomas, Robert J; Bianchi, Matt T

    2012-02-01

    Identifying predictors of subjective sleepiness and severity of sleep apnea are important yet challenging goals in sleep medicine. Classification algorithms may provide insights, especially when large data sets are available. We analyzed polysomnography and clinical features available from the Sleep Heart Health Study. The Epworth Sleepiness Scale and the apnea-hypopnea index were the targets of three classifiers: k-nearest neighbor, naive Bayes and support vector machine algorithms. Classification was based on up to 26 features including demographics, polysomnogram, and electrocardiogram (spectrogram). Naive Bayes was best for predicting abnormal Epworth class (0-10 versus 11-24), although prediction was weak: polysomnogram features had 16.7% sensitivity and 88.8% specificity; spectrogram features had 5.3% sensitivity and 96.5% specificity. The support vector machine performed similarly to naive Bayes for predicting sleep apnea class (0-5 versus >5): 59.0% sensitivity and 74.5% specificity using clinical features and 43.4% sensitivity and 83.5% specificity using spectrographic features compared with the naive Bayes classifier, which had 57.5% sensitivity and 73.7% specificity (clinical), and 39.0% sensitivity and 82.7% specificity (spectrogram). Mutual information analysis confirmed the minimal dependency of the Epworth score on any feature, while the apnea-hypopnea index showed modest dependency on body mass index, arousal index, oxygenation and spectrogram features. Apnea classification was modestly accurate, using either clinical or spectrogram features, and showed lower sensitivity and higher specificity than common sleep apnea screening tools. Thus, clinical prediction of sleep apnea may be feasible with easily obtained demographic and electrocardiographic analysis, but the utility of the Epworth is questioned by its minimal relation to clinical, electrocardiographic, or polysomnographic features.

  6. A Comparison of RBF Neural Network Training Algorithms for Inertial Sensor Based Terrain Classification

    Directory of Open Access Journals (Sweden)

    Erkan Beşdok

    2009-08-01

    Full Text Available This paper introduces a comparison of training algorithms of radial basis function (RBF neural networks for classification purposes. RBF networks provide effective solutions in many science and engineering fields. They are especially popular in the pattern classification and signal processing areas. Several algorithms have been proposed for training RBF networks. The Artificial Bee Colony (ABC algorithm is a new, very simple and robust population based optimization algorithm that is inspired by the intelligent behavior of honey bee swarms. The training performance of the ABC algorithm is compared with the Genetic algorithm, Kalman filtering algorithm and gradient descent algorithm. In the experiments, not only well known classification problems from the UCI repository such as the Iris, Wine and Glass datasets have been used, but also an experimental setup is designed and inertial sensor based terrain classification for autonomous ground vehicles was also achieved. Experimental results show that the use of the ABC algorithm results in better learning than those of others.

  7. Land-cover classification with an expert classification algorithm using digital aerial photographs

    Directory of Open Access Journals (Sweden)

    José L. de la Cruz

    2010-05-01

    Full Text Available The purpose of this study was to evaluate the usefulness of the spectral information of digital aerial sensors in determining land-cover classification using new digital techniques. The land covers that have been evaluated are the following, (1 bare soil, (2 cereals, including maize (Zea mays L., oats (Avena sativa L., rye (Secale cereale L., wheat (Triticum aestivum L. and barley (Hordeun vulgare L., (3 high protein crops, such as peas (Pisum sativum L. and beans (Vicia faba L., (4 alfalfa (Medicago sativa L., (5 woodlands and scrublands, including holly oak (Quercus ilex L. and common retama (Retama sphaerocarpa L., (6 urban soil, (7 olive groves (Olea europaea L. and (8 burnt crop stubble. The best result was obtained using an expert classification algorithm, achieving a reliability rate of 95%. This result showed that the images of digital airborne sensors hold considerable promise for the future in the field of digital classifications because these images contain valuable information that takes advantage of the geometric viewpoint. Moreover, new classification techniques reduce problems encountered using high-resolution images; while reliabilities are achieved that are better than those achieved with traditional methods.

  8. Semi-supervised classification of remote sensing image based on probabilistic topic model%利用概率主题模型的遥感影像半监督分类

    Institute of Scientific and Technical Information of China (English)

    易文斌; 冒亚明; 慎利

    2013-01-01

    Land cover is the center of the interaction of the natural environment and human activities and the acquisition of land cover information are obtained through the classification of remote sensing images, so the image classification is one of the most basic issues of remote sensing image analysis. Based on the image clustering analysis of high-resolution remote sensing image through the probabilistic topic model, the generated model which is a typical method in the semi-supervised learning is analyzed and a classification method based on probabilistic topic model and semi-supervised learning(SS-LDA)is formed in the paper. The process of SS-LDA model used in the text recognition applications is relearned and a basic image classification process of high-resolution remote sensing image is constructed. Comparing to traditional unsupervised classification and supervised classi-fication algorithm, the SS-LDA algorithm will get more accuracy of image classification results through experiments.%  土地覆盖是自然环境与人类活动相互作用的中心,而土地覆盖信息主要是通过遥感影像分类来获取,因此影像分类是遥感影像分析的最基本问题之一。在参考基于概率主题模型的高分辨率遥感影像聚类分析的基础上,通过半监督学习最典型的生成模型方法引出了基于概率主题模型的半监督分类(SS-LDA)算法。借鉴SS-LDA模型在文本识别应用的流程,构建了基于SS-LDA算法的高分辨率遥感影像分类的基本流程。通过实验证明,相对于传统的非监督分类与监督分类算法,SS-LDA算法能够获取较高精度的影像分类结果。

  9. Reliable classification of two-class cancer data using evolutionary algorithms.

    Science.gov (United States)

    Deb, Kalyanmoy; Raji Reddy, A

    2003-11-01

    In the area of bioinformatics, the identification of gene subsets responsible for classifying available disease samples to two or more of its variants is an important task. Such problems have been solved in the past by means of unsupervised learning methods (hierarchical clustering, self-organizing maps, k-mean clustering, etc.) and supervised learning methods (weighted voting approach, k-nearest neighbor method, support vector machine method, etc.). Such problems can also be posed as optimization problems of minimizing gene subset size to achieve reliable and accurate classification. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, here we treat the problem as a multiobjective optimization problem of minimizing the gene subset size and minimizing the number of misclassified samples. Moreover, for a more reliable classification, we consider multiple training sets in evaluating a classifier. Contrary to the past studies, the use of a multiobjective EA (NSGA-II) has enabled us to discover a smaller gene subset size (such as four or five) to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). We have also extended the NSGA-II to obtain multiple non-dominated solutions discovering as much as 352 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to have further confidence in the identification task, we have also introduced a prediction strength threshold for determining a sample's belonging to one class or the other. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multiobjective EA for the gene subset identification task.

  10. Photometric classification of type Ia supernovae in the SuperNova Legacy Survey with supervised learning

    Science.gov (United States)

    Möller, A.; Ruhlmann-Kleider, V.; Leloup, C.; Neveu, J.; Palanque-Delabrouille, N.; Rich, J.; Carlberg, R.; Lidman, C.; Pritchet, C.

    2016-12-01

    In the era of large astronomical surveys, photometric classification of supernovae (SNe) has become an important research field due to limited spectroscopic resources for candidate follow-up and classification. In this work, we present a method to photometrically classify type Ia supernovae based on machine learning with redshifts that are derived from the SN light-curves. This method is implemented on real data from the SNLS deferred pipeline, a purely photometric pipeline that identifies SNe Ia at high-redshifts (0.2 Random Forest and Boosted Decision Trees. We evaluate the performance using SN simulations and real data from the first 3 years of the Supernova Legacy Survey (SNLS), which contains large spectroscopically and photometrically classified type Ia samples. Using the Area Under the Curve (AUC) metric, where perfect classification is given by 1, we find that our best-performing classifier (Extreme Gradient Boosting Decision Tree) has an AUC of 0.98.We show that it is possible to obtain a large photometrically selected type Ia SN sample with an estimated contamination of less than 5%. When applied to data from the first three years of SNLS, we obtain 529 events. We investigate the differences between classifying simulated SNe, and real SN survey data. In particular, we find that applying a thorough set of selection cuts to the SN sample is essential for good classification. This work demonstrates for the first time the feasibility of machine learning classification in a high-z SN survey with application to real SN data.

  11. Entropy-based generation of supervised neural networks for classification of structured patterns.

    Science.gov (United States)

    Tsai, Hsien-Leing; Lee, Shie-Jue

    2004-03-01

    Sperduti and Starita proposed a new type of neural network which consists of generalized recursive neurons for classification of structures. In this paper, we propose an entropy-based approach for constructing such neural networks for classification of acyclic structured patterns. Given a classification problem, the architecture, i.e., the number of hidden layers and the number of neurons in each hidden layer, and all the values of the link weights associated with the corresponding neural network are automatically determined. Experimental results have shown that the networks constructed by our method can have a better performance, with respect to network size, learning speed, or recognition accuracy, than the networks obtained by other methods.

  12. FPGA implementation of Generalized Hebbian Algorithm for texture classification.

    Science.gov (United States)

    Lin, Shiow-Jyu; Hwang, Wen-Jyi; Lee, Wei-Hao

    2012-01-01

    This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA) because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip (SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance and low area costs.

  13. FPGA Implementation of Generalized Hebbian Algorithm for Texture Classification

    Directory of Open Access Journals (Sweden)

    Wei-Hao Lee

    2012-05-01

    Full Text Available This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA. It is embedded in a System-On-Programmable-Chip (SOPC platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance andlow area costs.

  14. An Evolutionary Algorithm for Enhanced Magnetic Resonance Imaging Classification

    Directory of Open Access Journals (Sweden)

    T.S. Murunya

    2014-11-01

    Full Text Available This study presents an image classification method for retrieval of images from a multi-varied MRI database. With the development of sophisticated medical imaging technology which helps doctors in diagnosis, medical image databases contain a huge amount of digital images. Magnetic Resonance Imaging (MRI is a widely used imaging technique which picks signals from a body's magnetic particles spinning to magnetic tune and through a computer converts scanned data into pictures of internal organs. Image processing techniques are required to analyze medical images and retrieve it from database. The proposed framework extracts features using Moment Invariants (MI and Wavelet Packet Tree (WPT. Extracted features are reduced using Correlation based Feature Selection (CFS and a CFS with cuckoo search algorithm is proposed. Naïve Bayes and K-Nearest Neighbor (KNN classify the selected features. National Biomedical Imaging Archive (NBIA dataset including colon, brain and chest is used to evaluate the framework.

  15. Predicting disease risk using bootstrap ranking and classification algorithms.

    Science.gov (United States)

    Manor, Ohad; Segal, Eran

    2013-01-01

    Genome-wide association studies (GWAS) are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a "black box" in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction typically rank single nucleotide polymorphisms (SNPs) by the p-value of their association with the disease, and use the top-associated SNPs as input to a classification algorithm. However, the predictive power of such methods is relatively poor. To improve the predictive power, we devised BootRank, which uses bootstrapping in order to obtain a robust prioritization of SNPs for use in predictive models. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC) data and results in a more robust set of SNPs and a larger number of enriched pathways being associated with the different diseases. Finally, we show that combining BootRank with seven different classification algorithms improves performance compared to previous studies that used the WTCCC data. Notably, diseases for which BootRank results in the largest improvements were recently shown to have more heritability than previously thought, likely due to contributions from variants with low minimum allele frequency (MAF), suggesting that BootRank can be beneficial in cases where SNPs affecting the disease are poorly tagged or have low MAF. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.

  16. Predicting disease risk using bootstrap ranking and classification algorithms.

    Directory of Open Access Journals (Sweden)

    Ohad Manor

    Full Text Available Genome-wide association studies (GWAS are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a "black box" in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction typically rank single nucleotide polymorphisms (SNPs by the p-value of their association with the disease, and use the top-associated SNPs as input to a classification algorithm. However, the predictive power of such methods is relatively poor. To improve the predictive power, we devised BootRank, which uses bootstrapping in order to obtain a robust prioritization of SNPs for use in predictive models. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC data and results in a more robust set of SNPs and a larger number of enriched pathways being associated with the different diseases. Finally, we show that combining BootRank with seven different classification algorithms improves performance compared to previous studies that used the WTCCC data. Notably, diseases for which BootRank results in the largest improvements were recently shown to have more heritability than previously thought, likely due to contributions from variants with low minimum allele frequency (MAF, suggesting that BootRank can be beneficial in cases where SNPs affecting the disease are poorly tagged or have low MAF. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.

  17. CLASSIFICATION OF DEFECTS IN SOFTWARE USING DECISION TREE ALGORITHM

    Directory of Open Access Journals (Sweden)

    M. SURENDRA NAIDU

    2013-06-01

    Full Text Available Software defects due to coding errors continue to plague the industry with disastrous impact, especially in the enterprise application software category. Identifying how much of these defects are specifically due to coding errors is a challenging problem. Defect prevention is the most vivid but usually neglected aspect of softwarequality assurance in any project. If functional at all stages of software development, it can condense the time, overheads and wherewithal entailed to engineer a high quality product. In order to reduce the time and cost, we will focus on finding the total number of defects if the test case shows that the software process not executing properly. That has occurred in the software development process. The proposed system classifying various defects using decision tree based defect classification technique, which is used to group the defects after identification. The classification can be done by employing algorithms such as ID3 or C4.5 etc. After theclassification the defect patterns will be measured by employing pattern mining technique. Finally the quality will be assured by using various quality metrics such as defect density, etc. The proposed system will be implemented in JAVA.

  18. A comparison of supervised, unsupervised and synthetic land use classification methods in the north of Iran

    NARCIS (Netherlands)

    Mohammady, M.; Moradi, H.R.; Zeinivand, H.; Temme, A.J.A.M.

    2015-01-01

    Land use classification is often the first step in land use studies and thus forms the basis for many earth science studies. In this paper, we focus on low-cost techniques for combining Landsat images with geographic information system approaches to create a land use map. In the Golestan region of I

  19. EMD-Based Temporal and Spectral Features for the Classification of EEG Signals Using Supervised Learning.

    Science.gov (United States)

    Riaz, Farhan; Hassan, Ali; Rehman, Saad; Niazi, Imran Khan; Dremstrup, Kim

    2016-01-01

    This paper presents a novel method for feature extraction from electroencephalogram (EEG) signals using empirical mode decomposition (EMD). Its use is motivated by the fact that the EMD gives an effective time-frequency analysis of nonstationary signals. The intrinsic mode functions (IMF) obtained as a result of EMD give the decomposition of a signal according to its frequency components. We present the usage of upto third order temporal moments, and spectral features including spectral centroid, coefficient of variation and the spectral skew of the IMFs for feature extraction from EEG signals. These features are physiologically relevant given that the normal EEG signals have different temporal and spectral centroids, dispersions and symmetries when compared with the pathological EEG signals. The calculated features are fed into the standard support vector machine (SVM) for classification purposes. The performance of the proposed method is studied on a publicly available dataset which is designed to handle various classification problems including the identification of epilepsy patients and detection of seizures. Experiments show that good classification results are obtained using the proposed methodology for the classification of EEG signals. Our proposed method also compares favorably to other state-of-the-art feature extraction methods.

  20. Multi-classification algorithm and its realization based on least square support vector machine algorithm

    Institute of Scientific and Technical Information of China (English)

    Fan Youping; Chen Yunping; Sun Wansheng; Li Yu

    2005-01-01

    As a new type of learning machine developed on the basis of statistics learning theory, support vector machine (SVM) plays an important role in knowledge discovering and knowledge updating by constructing non-linear optimal classifier. However, realizing SVM requires resolving quadratic programming under constraints of inequality, which results in calculation difficulty while learning samples gets larger. Besides, standard SVM is incapable of tackling multi-classification. To overcome the bottleneck of populating SVM, with training algorithm presented, the problem of quadratic programming is converted into that of resolving a linear system of equations composed of a group of equation constraints by adopting the least square SVM(LS-SVM) and introducing a modifying variable which can change inequality constraints into equation constraints, which simplifies the calculation. With regard to multi-classification, an LS-SVM applicable in multi-classification is deduced. Finally, efficiency of the algorithm is checked by using universal Circle in square and two-spirals to measure the performance of the classifier.

  1. Interactive exploration of uncertainty in fuzzy classifications by isosurface visualization of class clusters

    NARCIS (Netherlands)

    Lucieer, A.; Veen, L.E.

    2009-01-01

    Uncertainty and vagueness are important concepts when dealing with transition zones between vegetation communities or land-cover classes. In this study, classification uncertainty is quantified by applying a supervised fuzzy classification algorithm. New visualization techniques are proposed and pre

  2. A Novel Training Algorithm of Genetic Neural Networks and Its Application to Classification

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    First of all, this paper discusses the drawbacks of multilayer perceptron (MLP), which is trained by the traditional back propagation (BP) algorithm and used in a special classification problem. A new training algorithm for neural networks based on genetic algorithm and BP algorithm is developed. The difference between the new training algorithm and BP algorithm in the ability of nonlinear approaching is expressed through an example, and the application foreground is illustrated by an example.

  3. Semi-automatic supervised classification of minerals from x-ray mapping images

    DEFF Research Database (Denmark)

    Nielsen, Allan Aasbjerg; Flesche, Harald; Larsen, Rasmus

    1998-01-01

    spectroscopy (EDS) in a scanning electron microscope (SEM). Extensions to traditional multivariate statistical methods are applied to perform the classification. Training sets are grown from one or a few seed points by a method that ensures spatial and spectral closeness of observations. Spectral closeness...... to a small area in order to allow for the estimation of a variance-covariance matrix. This expansion is controlled by upper limits for the spatial and Euclidean spectral distances from the seed point. Second, after this initial expansion the growing of the training set is controlled by an upper limit...... training, a standard quadratic classifier is applied. The performance for each parameter setting is measured by the overall misclassification rate on an independently generated validation set. The classification method is presently used as a routine petrographical analysis method at Norsk Hydro Research...

  4. Restructuring supervision and reconfiguration of skill mix in community pharmacy: Classification of perceived safety and risk.

    Science.gov (United States)

    Bradley, Fay; Willis, Sarah C; Noyce, Peter R; Schafheutle, Ellen I

    2016-01-01

    Broadening the range of services provided through community pharmacy increases workloads for pharmacists that could be alleviated by reconfiguring roles within the pharmacy team. To examine pharmacists' and pharmacy technicians (PTs)' perceptions of how safe it would be for support staff to undertake a range of pharmacy activities during a pharmacist's absence. Views on supervision, support staff roles, competency and responsibility were also sought. Informed by nominal group discussions, a questionnaire was developed and distributed to a random sample of 1500 pharmacists and 1500 PTs registered in England. Whilst focused on community pharmacy practice, hospital pharmacy respondents were included, as more advanced skill mix models may provide valuable insights. Respondents were asked to rank a list of 22 pharmacy activities in terms of perceived risk and safety of these activities being performed by support staff during a pharmacist's absence. Descriptive and comparative statistic analyses were conducted. Six-hundred-and-forty-two pharmacists (43.2%) and 854 PTs (57.3%) responded; the majority worked in community pharmacy. Dependent on agreement levels with perceived safety, from community pharmacists and PTs, and hospital pharmacists and PTs, the 22 activities were grouped into 'safe' (n = 7), 'borderline' (n = 9) and 'unsafe' (n = 6). Activities such as assembly and labeling were considered 'safe,' clinical activities were considered 'unsafe.' There were clear differences between pharmacists and PTs, and sectors (community pharmacy vs. hospital). Community pharmacists were most cautious (particularly mobile and portfolio pharmacists) about which activities they felt support staff could safely perform; PTs in both sectors felt significantly more confident performing particularly technical activities than pharmacists. This paper presents novel empirical evidence informing the categorization of pharmacy activities into 'safe,' 'borderline' or 'unsafe

  5. Contributions to "k"-Means Clustering and Regression via Classification Algorithms

    Science.gov (United States)

    Salman, Raied

    2012-01-01

    The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…

  6. Study and Implementation of Web Mining Classification Algorithm Based on Building Tree of Detection Class Threshold

    Institute of Scientific and Technical Information of China (English)

    CHEN Jun-jie; SONG Han-tao; LU Yu-chang

    2005-01-01

    A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4. 5 algorithm, the disadvantage of excessive adaptation in C4. 5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.

  7. Verdict Accuracy of Quick Reduct Algorithm using Clustering and Classification Techniques for Gene Expression Data

    Directory of Open Access Journals (Sweden)

    T.Chandrasekhar

    2012-01-01

    Full Text Available In most gene expression data, the number of training samples is very small compared to the large number of genes involved in the experiments. However, among the large amount of genes, only a small fraction is effective for performing a certain task. Furthermore, a small subset of genes is desirable in developing gene expression based diagnostic tools for delivering reliable and understandable results. With the gene selection results, the cost of biological experiment and decision can be greatly reduced by analyzing only the marker genes. An important application of gene expression data in functional genomics is to classify samples according to their gene expression profiles. Feature selection (FS is a process which attempts to select more informative features. It is one of the important steps in knowledge discovery. Conventional supervised FS methods evaluate various feature subsets using an evaluation function or metric to select only those features which are related to the decision classes of the data under consideration. This paper studies a feature selection method based on rough set theory. Further K-Means, Fuzzy C-Means (FCM algorithm have implemented for the reduced feature set without considering class labels. Then the obtained results are compared with the original class labels. Back Propagation Network (BPN has also been used for classification. Then the performance of K-Means, FCM, and BPN are analyzed through the confusion matrix. It is found that the BPN is performing well comparatively.

  8. Automatic classification of endogenous landslide seismicity using the Random Forest supervised classifier

    Science.gov (United States)

    Provost, F.; Hibert, C.; Malet, J.-P.

    2017-01-01

    The deformation of slow-moving landslides developed in clays induces endogenous seismicity of mostly low-magnitude events (MLAlps) for the detection of four types of seismic sources. The automatic algorithm retrieves 93% of sensitivity in comparison to a manually interpreted catalog considered as reference.

  9. Supervised pre-processing approaches in multiple class variables classification for fish recruitment forecasting

    KAUST Repository

    Fernandes, José Antonio

    2013-02-01

    A multi-species approach to fisheries management requires taking into account the interactions between species in order to improve recruitment forecasting of the fish species. Recent advances in Bayesian networks direct the learning of models with several interrelated variables to be forecasted simultaneously. These models are known as multi-dimensional Bayesian network classifiers (MDBNs). Pre-processing steps are critical for the posterior learning of the model in these kinds of domains. Therefore, in the present study, a set of \\'state-of-the-art\\' uni-dimensional pre-processing methods, within the categories of missing data imputation, feature discretization and feature subset selection, are adapted to be used with MDBNs. A framework that includes the proposed multi-dimensional supervised pre-processing methods, coupled with a MDBN classifier, is tested with synthetic datasets and the real domain of fish recruitment forecasting. The correctly forecasting of three fish species (anchovy, sardine and hake) simultaneously is doubled (from 17.3% to 29.5%) using the multi-dimensional approach in comparison to mono-species models. The probability assessments also show high improvement reducing the average error (estimated by means of Brier score) from 0.35 to 0.27. Finally, these differences are superior to the forecasting of species by pairs. © 2012 Elsevier Ltd.

  10. A New Approach Using Data Envelopment Analysis for Ranking Classification Algorithms

    Directory of Open Access Journals (Sweden)

    A. Bazleh

    2011-01-01

    Full Text Available Problem statement: A variety of methods and algorithms for classification problems have been developed recently. But the main question is that how to select an appropriate and effective classification algorithm. This has always been an important and difficult issue. Approach: Since the classification algorithm selection task needs to examine more than one criterion such as accuracy and computational time, it can be modeled and also ranked by Data Envelopment Analysis (DEA technique. Results: In this study, 44 standard databases were modeled by 7 famous classification algorithms and we have examined them by accreditation method. Conclusion/Recommendation: The results indicate that Data Envelopment Analysis (DEA is an appropriate tool for evaluating classification algorithms.

  11. A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms.

    Science.gov (United States)

    Şen, Baha; Peker, Musa; Çavuşoğlu, Abdullah; Çelebi, Fatih V

    2014-03-01

    Sleep scoring is one of the most important diagnostic methods in psychiatry and neurology. Sleep staging is a time consuming and difficult task undertaken by sleep experts. This study aims to identify a method which would classify sleep stages automatically and with a high degree of accuracy and, in this manner, will assist sleep experts. This study consists of three stages: feature extraction, feature selection from EEG signals, and classification of these signals. In the feature extraction stage, it is used 20 attribute algorithms in four categories. 41 feature parameters were obtained from these algorithms. Feature selection is important in the elimination of irrelevant and redundant features and in this manner prediction accuracy is improved and computational overhead in classification is reduced. Effective feature selection algorithms such as minimum redundancy maximum relevance (mRMR); fast correlation based feature selection (FCBF); ReliefF; t-test; and Fisher score algorithms are preferred at the feature selection stage in selecting a set of features which best represent EEG signals. The features obtained are used as input parameters for the classification algorithms. At the classification stage, five different classification algorithms (random forest (RF); feed-forward neural network (FFNN); decision tree (DT); support vector machine (SVM); and radial basis function neural network (RBF)) classify the problem. The results, obtained from different classification algorithms, are provided so that a comparison can be made between computation times and accuracy rates. Finally, it is obtained 97.03 % classification accuracy using the proposed method. The results show that the proposed method indicate the ability to design a new intelligent assistance sleep scoring system.

  12. Automated supervised classification of variable stars II. Application to the OGLE database

    CERN Document Server

    Sarro, L M; López, M; Aerts, C

    2008-01-01

    We aim to extend and test the classifiers presented in a previous work against an independent dataset. We complement the assessment of the validity of the classifiers by applying them to the set of OGLE light curves treated as variable objects of unknown class. The results are compared to published classification results based on the so-called extractor methods.Two complementary analyses are carried out in parallel. In both cases, the original time series of OGLE observations of the Galactic bulge and Magellanic Clouds are processed in order to identify and characterize the frequency components. In the first approach, the classifiers are applied to the data and the results analyzed in terms of systematic errors and differences between the definition samples in the training set and in the extractor rules. In the second approach, the original classifiers are extended with colour information and, again, applied to OGLE light curves. We have constructed a classification system that can process huge amounts of tim...

  13. Supervised Learning Approach for Spam Classification Analysis using Data Mining Tools

    Directory of Open Access Journals (Sweden)

    R.Deepa Lakshmi

    2010-12-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifierrelated issues. In recent days, Machine learning for spamclassification is an important research issue. This paper exploresand identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysisamong the algorithms has also been presented.

  14. Supervised Learning Approach for Spam Classification Analysis using Data Mining Tools

    Directory of Open Access Journals (Sweden)

    R.Deepa Lakshmi

    2010-11-01

    Full Text Available E-mail is one of the most popular and frequently used ways of communication due to its worldwide accessibility, relatively fast message transfer, and low sending cost. The flaws in the e-mail protocols and the increasing amount of electronic business and financial transactions directly contribute to the increase in e-mail-based threats. Email spam is one of the major problems of the today’s Internet, bringing financial damage to companies and annoying individual users. Among the approaches developed to stop spam, filtering is the one of the most important technique. Many researches in spam filtering have been centered on the more sophisticated classifierrelated issues. In recent days, Machine learning for spamclassification is an important research issue. This paper exploresand identifies the use of different learning algorithms for classifying spam messages from e-mail. A comparative analysisamong the algorithms has also been presented.

  15. Evaluation of the impact of convolution masks on algorithm to supervise scenery changes at space vehicle integration pads

    Directory of Open Access Journals (Sweden)

    Francisco Carlos P. Bizarria

    2009-06-01

    Full Text Available The Satellite Launch Vehicle developed in Brazil employs a specialized unit at the launch center known as the Movable Integration Tower. On that tower, fixed and movable work floors are installed for use by specialists, at predefined periods of time, to carry out tests mainly related to the pre-launch phase of that vehicle. Outside of those periods it is necessary to detect unexpected movements of platforms and unauthorized people on the site. Within that context, this work presents an evaluation of different resolutions of convolution mask and tolerances in the efficiency of a proposed algorithm to supervise scenery changes on these work floors. The results obtained from this evaluation are satisfactory and show that the proposed algorithm is suitable for the purpose for which it is intended.

  16. Integrating genetic algorithm method with neural network for land use classification using SZ-3 CMODIS data

    Institute of Scientific and Technical Information of China (English)

    WANG Changyao; LUO Chengfeng; LIU Zhengjun

    2005-01-01

    This paper presents a methodology on land use mapping using CMODIS (Chinese Moderate Resolution Imaging Spectroradiometer ) data on-board SZ-3 (Shenzhou 3) spacecraft. The integrated method is composed of genetic algorithm (GA) for feature extraction and neural network classifier for land use classification. In the data preprocessing, a moment matching method was adopted to reuse classification was obtained. To generate a land use map, the three layers back propagation neural network classifier is used for training the samples and classification. Compared with the Maximum Likelihood classification algorithm, the results show that the accuracy of land use classification is obviously improved by using our proposed method, the selected band number in the classification process is reduced,and the computational performance for training and classification is improved. The result also shows that the CMODIS data can be effectively used for land use/land cover classification and change monitoring at regional and global scale.

  17. Automated classification of female facial beauty by image analysis and supervised learning

    Science.gov (United States)

    Gunes, Hatice; Piccardi, Massimo; Jan, Tony

    2004-01-01

    The fact that perception of facial beauty may be a universal concept has long been debated amongst psychologists and anthropologists. In this paper, we performed experiments to evaluate the extent of beauty universality by asking a number of diverse human referees to grade a same collection of female facial images. Results obtained show that the different individuals gave similar votes, thus well supporting the concept of beauty universality. We then trained an automated classifier using the human votes as the ground truth and used it to classify an independent test set of facial images. The high accuracy achieved proves that this classifier can be used as a general, automated tool for objective classification of female facial beauty. Potential applications exist in the entertainment industry and plastic surgery.

  18. Comparison of different classification algorithms for landmine detection using GPR

    Science.gov (United States)

    Karem, Andrew; Fadeev, Aleksey; Frigui, Hichem; Gader, Paul

    2010-04-01

    The Edge Histogram Detector (EHD) is a landmine detection algorithm that has been developed for ground penetrating radar (GPR) sensor data. It has been tested extensively and has demonstrated excellent performance. The EHD consists of two main components. The first one maps the raw data to a lower dimension using edge histogram based feature descriptors. The second component uses a possibilistic K-Nearest Neighbors (pK-NN) classifier to assign a confidence value. In this paper we show that performance of the baseline EHD could be improved by replacing the pK-NN classifier with model based classifiers. In particular, we investigate two such classifiers: Support Vector Regression (SVR), and Relevance Vector Machines (RVM). We investigate the adaptation of these classifiers to the landmine detection problem with GPR, and we compare their performance to the baseline EHD with a pK-NN classifier. As in the baseline EHD, we treat the problem as a two class classification problem: mine vs. clutter. Model parameters for the SVR and the RVM classifiers are estimated from training data using logarithmic grid search. For testing, soft labels are assigned to the test alarms. A confidence of zero indicates the maximum probability of being a false alarm. Similarly, a confidence of one represents the maximum probability of being a mine. Results on large and diverse GPR data collections show that the proposed modification to the classifier component can improve the overall performance of the EHD significantly.

  19. Improving supervised classification accuracy using non-rigid multimodal image registration: detecting prostate cancer

    Science.gov (United States)

    Chappelow, Jonathan; Viswanath, Satish; Monaco, James; Rosen, Mark; Tomaszewski, John; Feldman, Michael; Madabhushi, Anant

    2008-03-01

    Computer-aided diagnosis (CAD) systems for the detection of cancer in medical images require precise labeling of training data. For magnetic resonance (MR) imaging (MRI) of the prostate, training labels define the spatial extent of prostate cancer (CaP); the most common source for these labels is expert segmentations. When ancillary data such as whole mount histology (WMH) sections, which provide the gold standard for cancer ground truth, are available, the manual labeling of CaP can be improved by referencing WMH. However, manual segmentation is error prone, time consuming and not reproducible. Therefore, we present the use of multimodal image registration to automatically and accurately transcribe CaP from histology onto MRI following alignment of the two modalities, in order to improve the quality of training data and hence classifier performance. We quantitatively demonstrate the superiority of this registration-based methodology by comparing its results to the manual CaP annotation of expert radiologists. Five supervised CAD classifiers were trained using the labels for CaP extent on MRI obtained by the expert and 4 different registration techniques. Two of the registration methods were affi;ne schemes; one based on maximization of mutual information (MI) and the other method that we previously developed, Combined Feature Ensemble Mutual Information (COFEMI), which incorporates high-order statistical features for robust multimodal registration. Two non-rigid schemes were obtained by succeeding the two affine registration methods with an elastic deformation step using thin-plate splines (TPS). In the absence of definitive ground truth for CaP extent on MRI, classifier accuracy was evaluated against 7 ground truth surrogates obtained by different combinations of the expert and registration segmentations. For 26 multimodal MRI-WMH image pairs, all four registration methods produced a higher area under the receiver operating characteristic curve compared to that

  20. Text Classification using Association Rule with a Hybrid Concept of Naive Bayes Classifier and Genetic Algorithm

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the automated assignment of natural language texts to predefined categories based on their content. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Now a day the demand of text classification is increasing tremendously. Keeping this demand into consideration, new and updated techniques are being developed for the purpose of automated text classification. This paper presents a new algorithm for text classification. Instead of using words, word relation i.e. association rules is used to derive feature set from pre-classified text documents. The concept of Naive Bayes Classifier is then used on derived features and finally a concept of Genetic Algorithm has been added for final classification. A system based on the proposed algorithm has been implemented and tested. The experimental ...

  1. Hierarchical trie packet classification algorithm based on expectation-maximization clustering

    Science.gov (United States)

    Bi, Xia-an; Zhao, Junxia

    2017-01-01

    With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476

  2. Impact of corpus domain for sentiment classification: An evaluation study using supervised machine learning techniques

    Science.gov (United States)

    Karsi, Redouane; Zaim, Mounia; El Alami, Jamila

    2017-07-01

    Thanks to the development of the internet, a large community now has the possibility to communicate and express its opinions and preferences through multiple media such as blogs, forums, social networks and e-commerce sites. Today, it becomes clearer that opinions published on the web are a very valuable source for decision-making, so a rapidly growing field of research called “sentiment analysis” is born to address the problem of automatically determining the polarity (Positive, negative, neutral,…) of textual opinions. People expressing themselves in a particular domain often use specific domain language expressions, thus, building a classifier, which performs well in different domains is a challenging problem. The purpose of this paper is to evaluate the impact of domain for sentiment classification when using machine learning techniques. In our study three popular machine learning techniques: Support Vector Machines (SVM), Naive Bayes and K nearest neighbors(KNN) were applied on datasets collected from different domains. Experimental results show that Support Vector Machines outperforms other classifiers in all domains, since it achieved at least 74.75% accuracy with a standard deviation of 4,08.

  3. Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification.

    Science.gov (United States)

    Soares, João V B; Leandro, Jorge J G; Cesar Júnior, Roberto M; Jelinek, Herbert F; Cree, Michael J

    2006-09-01

    We present a method for automated segmentation of the vasculature in retinal images. The method produces segmentations by classifying each image pixel as vessel or nonvessel, based on the pixel's feature vector. Feature vectors are composed of the pixel's intensity and two-dimensional Gabor wavelet transform responses taken at multiple scales. The Gabor wavelet is capable of tuning to specific frequencies, thus allowing noise filtering and vessel enhancement in a single step. We use a Bayesian classifier with class-conditional probability density functions (likelihoods) described as Gaussian mixtures, yielding a fast classification, while being able to model complex decision surfaces. The probability distributions are estimated based on a training set of labeled pixels obtained from manual segmentations. The method's performance is evaluated on publicly available DRIVE (Staal et al., 2004) and STARE (Hoover et al., 2000) databases of manually labeled images. On the DRIVE database, it achieves an area under the receiver operating characteristic curve of 0.9614, being slightly superior than that presented by state-of-the-art approaches. We are making our implementation available as open source MATLAB scripts for researchers interested in implementation details, evaluation, or development of methods.

  4. Improvements on coronal hole detection in SDO/AIA images using supervised classification

    CERN Document Server

    Reiss, Martin A; De Visscher, Ruben; Temmer, Manuela; Veronig, Astrid M; Delouille, Véronique; Mampaey, Benjamin; Ahammer, Helmut

    2015-01-01

    We demonstrate the use of machine learning algorithms in combination with segmentation techniques in order to distinguish coronal holes and filaments in SDO/AIA EUV images of the Sun. Based on two coronal hole detection techniques (intensity-based thresholding, SPoCA), we prepared data sets of manually labeled coronal hole and filament channel regions present on the Sun during the time range 2011 - 2013. By mapping the extracted regions from EUV observations onto HMI line-of-sight magnetograms we also include their magnetic characteristics. We computed shape measures from the segmented binary maps as well as first order and second order texture statistics from the segmented regions in the EUV images and magnetograms. These attributes were used for data mining investigations to identify the most performant rule to differentiate between coronal holes and filament channels. We applied several classifiers, namely Support Vector Machine, Linear Support Vector Machine, Decision Tree, and Random Forest and found tha...

  5. Experimental assessment of an automatic breast density classification algorithm based on principal component analysis applied to histogram data

    Science.gov (United States)

    Angulo, Antonio; Ferrer, Jose; Pinto, Joseph; Lavarello, Roberto; Guerrero, Jorge; Castaneda, Benjamín.

    2015-01-01

    Breast parenchymal density is considered a strong indicator of cancer risk. However, measures of breast density are often qualitative and require the subjective judgment of radiologists. This work proposes a supervised algorithm to automatically assign a BI-RADS breast density score to a digital mammogram. The algorithm applies principal component analysis to the histograms of a training dataset of digital mammograms to create four different spaces, one for each BI-RADS category. Scoring is achieved by projecting the histogram of the image to be classified onto the four spaces and assigning it to the closest class. In order to validate the algorithm, a training set of 86 images and a separate testing database of 964 images were built. All mammograms were acquired in the craniocaudal view from female patients without any visible pathology. Eight experienced radiologists categorized the mammograms according to a BIRADS score and the mode of their evaluations was considered as ground truth. Results show better agreement between the algorithm and ground truth for the training set (kappa=0.74) than for the test set (kappa=0.44) which suggests the method may be used for BI-RADS classification but a better training is required.

  6. A Locality-Constrained and Label Embedding Dictionary Learning Algorithm for Image Classification.

    Science.gov (United States)

    Zhengming Li; Zhihui Lai; Yong Xu; Jian Yang; Zhang, David

    2017-02-01

    Locality and label information of training samples play an important role in image classification. However, previous dictionary learning algorithms do not take the locality and label information of atoms into account together in the learning process, and thus their performance is limited. In this paper, a discriminative dictionary learning algorithm, called the locality-constrained and label embedding dictionary learning (LCLE-DL) algorithm, was proposed for image classification. First, the locality information was preserved using the graph Laplacian matrix of the learned dictionary instead of the conventional one derived from the training samples. Then, the label embedding term was constructed using the label information of atoms instead of the classification error term, which contained discriminating information of the learned dictionary. The optimal coding coefficients derived by the locality-based and label-based reconstruction were effective for image classification. Experimental results demonstrated that the LCLE-DL algorithm can achieve better performance than some state-of-the-art algorithms.

  7. A survey of supervised machine learning models for mobile-phone based pathogen identification and classification

    Science.gov (United States)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan

    2017-03-01

    Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.

  8. Supervised novelty detection in brain tissue classification with an application to white matter hyperintensities

    Science.gov (United States)

    Kuijf, Hugo J.; Moeskops, Pim; de Vos, Bob D.; Bouvy, Willem H.; de Bresser, Jeroen; Biessels, Geert Jan; Viergever, Max A.; Vincken, Koen L.

    2016-03-01

    Novelty detection is concerned with identifying test data that differs from the training data of a classifier. In the case of brain MR images, pathology or imaging artefacts are examples of untrained data. In this proof-of-principle study, we measure the behaviour of a classifier during the classification of trained labels (i.e. normal brain tissue). Next, we devise a measure that distinguishes normal classifier behaviour from abnormal behavior that occurs in the case of a novelty. This will be evaluated by training a kNN classifier on normal brain tissue, applying it to images with an untrained pathology (white matter hyperintensities (WMH)), and determine if our measure is able to identify abnormal classifier behaviour at WMH locations. For our kNN classifier, behaviour is modelled as the mean, median, or q1 distance to the k nearest points. Healthy tissue was trained on 15 images; classifier behaviour was trained/tested on 5 images with leave-one-out cross-validation. For each trained class, we measure the distribution of mean/median/q1 distances to the k nearest point. Next, for each test voxel, we compute its Z-score with respect to the measured distribution of its predicted label. We consider a Z-score >=4 abnormal behaviour of the classifier, having a probability due to chance of 0.000032. Our measure identified >90% of WMH volume and also highlighted other non-trained findings. The latter being predominantly vessels, cerebral falx, brain mask errors, choroid plexus. This measure is generalizable to other classifiers and might help in detecting unexpected findings or novelties by measuring classifier behaviour.

  9. Seasonal Separation of African Savanna Components Using Worldview-2 Imagery: A Comparison of Pixel- and Object-Based Approaches and Selected Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Żaneta Kaszta

    2016-09-01

    Full Text Available Separation of savanna land cover components is challenging due to the high heterogeneity of this landscape and spectral similarity of compositionally different vegetation types. In this study, we tested the usability of very high spatial and spectral resolution WorldView-2 (WV-2 imagery to classify land cover components of African savanna in wet and dry season. We compared the performance of Object-Based Image Analysis (OBIA and pixel-based approach with several algorithms: k-nearest neighbor (k-NN, maximum likelihood (ML, random forests (RF, classification and regression trees (CART and support vector machines (SVM. Results showed that classifications of WV-2 imagery produce high accuracy results (>77% regardless of the applied classification approach. However, OBIA had a significantly higher accuracy for almost every classifier with the highest overall accuracy score of 93%. Amongst tested classifiers, SVM and RF provided highest accuracies. Overall classifications of the wet season image provided better results with 93% for RF. However, considering woody leaf-off conditions, the dry season classification also performed well with overall accuracy of 83% (SVM and high producer accuracy for the tree cover (91%. Our findings demonstrate the potential of imagery like WorldView-2 with OBIA and advanced supervised machine-learning algorithms in seasonal fine-scale land cover classification of African savanna.

  10. Cloud classification from satellite data using a fuzzy sets algorithm: A polar example

    Science.gov (United States)

    Key, J. R.; Maslanik, J. A.; Barry, R. G.

    1988-01-01

    Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine likely areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

  11. Cloud classification from satellite data using a fuzzy sets algorithm - A polar example

    Science.gov (United States)

    Key, J. R.; Maslanik, J. A.; Barry, R. G.

    1989-01-01

    Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine like areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

  12. Cloud classification from satellite data using a fuzzy sets algorithm - A polar example

    Science.gov (United States)

    Key, J. R.; Maslanik, J. A.; Barry, R. G.

    1989-01-01

    Where spatial boundaries between phenomena are diffuse, classification methods which construct mutually exclusive clusters seem inappropriate. The Fuzzy c-means (FCM) algorithm assigns each observation to all clusters, with membership values as a function of distance to the cluster center. The FCM algorithm is applied to AVHRR data for the purpose of classifying polar clouds and surfaces. Careful analysis of the fuzzy sets can provide information on which spectral channels are best suited to the classification of particular features, and can help determine like areas of misclassification. General agreement in the resulting classes and cloud fraction was found between the FCM algorithm, a manual classification, and an unsupervised maximum likelihood classifier.

  13. Automatic classification of endogenous seismic sources within a landslide body using random forest algorithm

    Science.gov (United States)

    Provost, Floriane; Hibert, Clément; Malet, Jean-Philippe; Stumpf, André; Doubre, Cécile

    2016-04-01

    Different studies have shown the presence of microseismic activity in soft-rock landslides. The seismic signals exhibit significantly different features in the time and frequency domains which allow their classification and interpretation. Most of the classes could be associated with different mechanisms of deformation occurring within and at the surface (e.g. rockfall, slide-quake, fissure opening, fluid circulation). However, some signals remain not fully understood and some classes contain few examples that prevent any interpretation. To move toward a more complete interpretation of the links between the dynamics of soft-rock landslides and the physical processes controlling their behaviour, a complete catalog of the endogeneous seismicity is needed. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forest classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied to the seismicity recorded at the Super-Sauze landslide between 2013 and 2015. We selected a dozen of seismic signal features that characterize precisely its spectral content (e.g. central frequency, spectrum width, energy in several frequency bands, spectrogram shape, spectrum local and global maxima

  14. Study on An Absolute Non-Collision Hash and Jumping Table IP Classification Algorithms

    Institute of Scientific and Technical Information of China (English)

    SHANG Feng-jun; PAN Ying-jun

    2004-01-01

    In order to classify packet, we propose a novel IP classification based the non-collision hash and jumping table Trie-tree (NHJTTT) algorithm, which is based on non-collision hash Trie-tree and Lakshman and Stiliadis proposing a 2-dimensional classification algorithm (LS algorithm).The core of algorithm consists of two parts: structure the non-collision hash function, which is constructed mainly based on destination /source port and protocol type field so that the hash function can avoid space explosion problem; introduce jumping table Trie-tree based LS algorithm in order to reduce time complexity.The test results show that the classification rate of NHJTTT algorithm is up to 1 million packets per second and the maximum memory consumed is 9 MB for 10 000 rules.

  15. A Novel Algorithm of Network Trade Customer Classification Based on Fourier Basis Functions

    Directory of Open Access Journals (Sweden)

    Li Xinwu

    2013-11-01

    Full Text Available Learning algorithm of neural network is always an important research contents in neural network theory research and application field, learning algorithm about the feed-forward neural network has no satisfactory solution in particular for its defects in calculation speed. The paper presents a new Fourier basis functions neural network algorithm and applied it to classify network trade customer. First, 21 customer classification indicators are designed, based on characteristics and behaviors analysis of network trade customer, including customer characteristics type variables and customer behaviors type variables,; Second, Fourier basis functions is used to improve the calculation flow and algorithm structure of original BP neural network algorithm to speed up its convergence and then a new Fourier basis neural network model is constructed. Finally the experimental results show that the problem of convergence speed can been solved, and the accuracy of the customer classification are ensured when the new algorithm is used in network trade customer classification practically.

  16. A comparison of classification techniques for glacier change detection using multispectral images

    OpenAIRE

    Rahul Nijhawan; Pradeep Garg; Praveen Thakur

    2016-01-01

    Main aim of this paper is to compare the classification accuracies of glacier change detection by following classifiers: sub-pixel classification algorithm, indices based supervised classification and object based algorithm using Landsat imageries. It was observed that shadow effect was not removed in sub-pixel based classification which was removed by the indices method. Further the accuracy was improved by object based classification. Objective of the paper is to analyse different classific...

  17. SLEAS: Supervised Learning using Entropy as Attribute Selection Measure

    Directory of Open Access Journals (Sweden)

    Kishor Kumar Reddy C

    2014-10-01

    Full Text Available There is embryonic importance in scaling up the broadly used decision tree learning algorithms to huge datasets. Even though abundant diverse methodologies have been proposed, a fast tree growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential to a greater extent. This paper aims at improving the performance of the SLIQ (Supervised Learning in Quest decision tree algorithm for classification in data mining. In the present research, we adopted entropy as attribute selection measure, which overcomes the problems facing with Gini Index. Classification accuracy of the proposed supervised learning using entropy as attribute selection measure (SLEAS algorithm is compared with the existing SLIQ algorithm using twelve datasets taken from UCI Machine Learning Repository, and the results yields that the SLEAS outperforms when compared with SLIQ decision tree. Further, error rate is also computed and the results clearly show that the SLEAS algorithm is giving less error rate when compared with SLIQ decision tree.

  18. Competitive evaluation of data mining algorithms for use in classification of leukocyte subtypes with Raman microspectroscopy.

    Science.gov (United States)

    Maguire, A; Vega-Carrascal, I; Bryant, J; White, L; Howe, O; Lyng, F M; Meade, A D

    2015-04-07

    Raman microspectroscopy has been investigated for some time for use in label-free cell sorting devices. These approaches require coupling of the Raman spectrometer to complex data mining algorithms for identification of cellular subtypes such as the leukocyte subpopulations of lymphocytes and monocytes. In this study, three distinct multivariate classification approaches, (PCA-LDA, SVMs and Random Forests) are developed and tested on their ability to classify the cellular subtype in extracted peripheral blood mononuclear cells (T-cell lymphocytes from myeloid cells), and are evaluated in terms of their respective classification performance. A strategy for optimisation of each of the classification algorithm is presented with emphasis on reduction of model complexity in each of the algorithms. The relative classification performance and performance characteristics are highlighted, overall suggesting the radial basis function SVM as a robust option for classification of leukocytes with Raman microspectroscopy.

  19. Preliminary hard and soft bottom seafloor substrate map derived from an supervised classification of bathymetry derived from multispectral World View-2 satellite imagery of Ni'ihau Island, Territory of Main Hawaiian Islands, USA

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Preliminary hard and soft seafloor substrate map derived from a supervised classification from multispectral World View-2 satellite imagery of Ni'ihau Island,...

  20. A Non-Collision Hash Trie-Tree Based FastIP Classification Algorithm

    Institute of Scientific and Technical Information of China (English)

    徐恪; 吴建平; 喻中超; 徐明伟

    2002-01-01

    With the development of network applications, routers must support such functions as firewalls, provision of QoS, traffic billing, etc. All these functions need the classification of IP packets, according to how different the packets are processed subsequently, which is determined. In this article, a novel IP classification algorithm is proposed based on the Grid of Tries algorithm. The new algorithm not only eliminates original limitations in the case of multiple fields but also shows better performance in regard to both time and space. It has better overall performance than many other algorithms.

  1. A TCAM-based Two-dimensional Prefix Packet Classification Algorithm

    Institute of Scientific and Technical Information of China (English)

    王志恒; 刘刚; 白英彩

    2004-01-01

    Packet classification (PC) has become the main method to support the quality of service and security of network application. And two-dimensional prefix packet classification (PPC) is the popular one. This paper analyzes the problem of ruler conflict, and then presents a TCAMbased two-dimensional PPC algorithm. This algorithm makes use of the parallelism of TCAM to lookup the longest prefix in one instruction cycle. Then it uses a memory image and associated data structures to eliminate the conflicts between rulers, and performs a fast two-dimensional PPC.Compared with other algorithms, this algorithm has the least time complexity and less space complexity.

  2. Classification of hyperspectral remote sensing images based on simulated annealing genetic algorithm and multiple instance learning

    Institute of Scientific and Technical Information of China (English)

    高红民; 周惠; 徐立中; 石爱业

    2014-01-01

    A hybrid feature selection and classification strategy was proposed based on the simulated annealing genetic algorithm and multiple instance learning (MIL). The band selection method was proposed from subspace decomposition, which combines the simulated annealing algorithm with the genetic algorithm in choosing different cross-over and mutation probabilities, as well as mutation individuals. Then MIL was combined with image segmentation, clustering and support vector machine algorithms to classify hyperspectral image. The experimental results show that this proposed method can get high classification accuracy of 93.13%at small training samples and the weaknesses of the conventional methods are overcome.

  3. Data Mining Algorithms for Classification of Complex Biomedical Data

    Science.gov (United States)

    Lan, Liang

    2012-01-01

    In my dissertation, I will present my research which contributes to solve the following three open problems from biomedical informatics: (1) Multi-task approaches for microarray classification; (2) Multi-label classification of gene and protein prediction from multi-source biological data; (3) Spatial scan for movement data. In microarray…

  4. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    Science.gov (United States)

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.

  5. Applications of feature selection. [development of classification algorithms for LANDSAT data

    Science.gov (United States)

    Guseman, L. F., Jr.

    1976-01-01

    The use of satellite-acquired (LANDSAT) multispectral scanner (MSS) data to conduct an inventory of some crop of economic interest such as wheat over a large geographical area is considered in relation to the development of accurate and efficient algorithms for data classification. The dimension of the measurement space and the computational load for a classification algorithm is increased by the use of multitemporal measurements. Feature selection/combination techniques used to reduce the dimensionality of the problem are described.

  6. Hybrid model based on Genetic Algorithms and SVM applied to variable selection within fruit juice classification.

    Science.gov (United States)

    Fernandez-Lozano, C; Canto, C; Gestal, M; Andrade-Garda, J M; Rabuñal, J R; Dorado, J; Pazos, A

    2013-01-01

    Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected.

  7. Development of a Fingerprint Gender Classification Algorithm Using Fingerprint Global Features

    OpenAIRE

    S. F. Abdullah; A.F.N.A. Rahman; Z.A.Abas; W.H.M Saad

    2016-01-01

    In forensic world, the process of identifying and calculating the fingerprint features is complex and take time when it is done manually using fingerprint laboratories magnifying glass. This study is meant to enhance the forensic manual method by proposing a new algorithm for fingerprint global feature extraction for gender classification. The result shows that the new algorithm gives higher acceptable readings which is above 70% of classification rate when it is compared to the manual method...

  8. Value Focused Thinking Applications to Supervised Pattern Classification With Extensions to Hyperspectral Anomaly Detection Algorithms

    Science.gov (United States)

    2015-03-26

    upfront resampling, with replacement , of both the response values and the feature values . This in essence is treating the original set of data as if... accounts for the stochastic noise by resampling it, with replacement , from the original conditional probability distribution. In this research, both...Domingos’ variance values seem to be larger than either TPF or FPF values when accounting for all of the runs, which is not apparent when simply

  9. Random forest algorithm for classification of multiwavelength data

    Institute of Scientific and Technical Information of China (English)

    Dan Gao; Yan-Xia Zhang; Yong-Heng Zhao

    2009-01-01

    We introduced a decision tree method called Random Forests for multiwavelength data classification. The data were adopted from different databases, including the Sloan Digital Sky Survey (SDSS) Data Release five, USNO, FIRST and ROSAT.We then studied the discrimination of quasars from stars and the classification of quasars,stars and galaxies with the sample from optical and radio bands and with that from optical and X-ray bands. Moreover, feature selection and feature weighting based on Random Forests were investigated. The performances based on different input patterns were compared. The experimental results show that the random forest method is an effective method for astronomical object classification and can be applied to other classification problems faced in astronomy. In addition, Random Forests will show its superiorities due to its own merits, e.g. classification, feature selection, feature weighting as well as outlier detection.

  10. Based on Perceptron Object Classification Algorithms for Processing of Agricultural Field Images

    OpenAIRE

    Ganchenko, V.; Doudkin, A.; Pawlowski, T.; Petrovsky, A.; Sadykhov, R.

    2012-01-01

    Neural network algorithms of object classification are considered in the paper applying to disease area recognition of agricultural field images. The images are presented as reduced normalized histograms. The classification is carried out for RGB-and HSV-space by using of a multilayer perceptron.

  11. Classification of Noisy Data: An Approach Based on Genetic Algorithms and Voronoi Tessellation

    DEFF Research Database (Denmark)

    Khan, Abdul Rauf; Schiøler, Henrik; Knudsen, Torben

    2016-01-01

    on the portioning of information space; and (2) use of the genetic algorithm to solve combinatorial problems for classification. In particular, we will implement our methodology to solve complex classification problems and compare the performance of our classifier with other well-known methods (SVM, KNN, and ANN...

  12. Classification of Noisy Data: An Approach Based on Genetic Algorithms and Voronoi Tessellation

    DEFF Research Database (Denmark)

    Khan, Abdul Rauf; Schiøler, Henrik; Knudsen, Torben

    on the portioning of information space; and (2) use of the genetic algorithm to solve combinatorial problems for classification. In particular, we will implement our methodology to solve complex classification problems and compare the performance of our classifier with other well-known methods (SVM, KNN, and ANN...

  13. Quantum Algorithm for K-Nearest Neighbors Classification Based on the Metric of Hamming Distance

    Science.gov (United States)

    Ruan, Yue; Xue, Xiling; Liu, Heng; Tan, Jianing; Li, Xi

    2017-08-01

    K-nearest neighbors (KNN) algorithm is a common algorithm used for classification, and also a sub-routine in various complicated machine learning tasks. In this paper, we presented a quantum algorithm (QKNN) for implementing this algorithm based on the metric of Hamming distance. We put forward a quantum circuit for computing Hamming distance between testing sample and each feature vector in the training set. Taking advantage of this method, we realized a good analog for classical KNN algorithm by setting a distance threshold value t to select k - n e a r e s t neighbors. As a result, QKNN achieves O(n 3) performance which is only relevant to the dimension of feature vectors and high classification accuracy, outperforms Llyod's algorithm (Lloyd et al. 2013) and Wiebe's algorithm (Wiebe et al. 2014).

  14. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis.

    Science.gov (United States)

    Al-Rajab, Murad; Lu, Joan; Xu, Qiang

    2017-07-01

    This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Fast weighted K-view-voting algorithm for image texture classification

    Science.gov (United States)

    Liu, Hong; Lan, Yihua; Wang, Qian; Jin, Renchao; Song, Enmin; Hung, Chih-Cheng

    2012-02-01

    We propose an innovative and efficient approach to improve K-view-template (K-view-T) and K-view-datagram (K-view-D) algorithms for image texture classification. The proposed approach, called the weighted K-view-voting algorithm (K-view-V), uses a novel voting method for texture classification and an accelerating method based on the efficient summed square image (SSI) scheme as well as fast Fourier transform (FFT) to enable overall faster processing. Decision making, which assigns a pixel to a texture class, occurs by using our weighted voting method among the ``promising'' members in the neighborhood of a classified pixel. In other words, this neighborhood consists of all the views, and each view has a classified pixel in its territory. Experimental results on benchmark images, which are randomly taken from Brodatz Gallery and natural and medical images, show that this new classification algorithm gives higher classification accuracy than existing K-view algorithms. In particular, it improves the accurate classification of pixels near the texture boundary. In addition, the proposed acceleration method improves the processing speed of K-view-V as it requires much less computation time than other K-view algorithms. Compared with the results of earlier developed K-view algorithms and the gray level co-occurrence matrix (GLCM), the proposed algorithm is more robust, faster, and more accurate.

  16. Differential characteristic set algorithm for the complete symmetry classification of partial differential equations

    Institute of Scientific and Technical Information of China (English)

    Chaolu Temuer; Yu-shan BAI

    2009-01-01

    In this paper,we present a differential polynomial characteristic set algorithm for the complete symmetry classification of partial differential equations (PDEs)with some parameters. It can make the solution to the complete symmetry classification problem for PDEs become direct and systematic. As an illustrative example,the complete potential symmetry classifications of nonlinear and linear wave equations with an arbitrary function parameter are presented. This is a new application of the differential form characteristic set algorithm,i.e.,Wu's method,in differential equations.

  17. Performance evaluation of algorithms for the classification of metabolic 1H NMR fingerprints.

    Science.gov (United States)

    Hochrein, Jochen; Klein, Matthias S; Zacharias, Helena U; Li, Juan; Wijffels, Gene; Schirra, Horst Joachim; Spang, Rainer; Oefner, Peter J; Gronwald, Wolfram

    2012-12-07

    Nontargeted metabolite fingerprinting is increasingly applied to biomedical classification. The choice of classification algorithm may have a considerable impact on outcome. In this study, employing nested cross-validation for assessing predictive performance, six binary classification algorithms in combination with different strategies for data-driven feature selection were systematically compared on five data sets of urine, serum, plasma, and milk one-dimensional fingerprints obtained by proton nuclear magnetic resonance (NMR) spectroscopy. Support Vector Machines and Random Forests combined with t-score-based feature filtering performed well on most data sets, whereas the performance of the other tested methods varied between data sets.

  18. IMPROVEMENT OF TCAM-BASED PACKET CLASSIFICATION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Xu Zhen; Zhang Jun; Rui Liyang; Sun Jun

    2008-01-01

    The feature of Ternary Content Addressable Memories (TCAMs) makes them particularly attractive for IP address lookup and packet classification applications in a router system. However, the limitations of TCAMs impede their utilization. In this paper, the solutions for decreasing the power consumption and avoiding entry expansion in range matching are addressed. Experimental results demonstrate that the proposed techniques can make some big improvements on the performance of TCAMs in IP address lookup and packet classification.

  19. A Weighted Block Dictionary Learning Algorithm for Classification

    OpenAIRE

    Zhongrong Shi

    2016-01-01

    Discriminative dictionary learning, playing a critical role in sparse representation based classification, has led to state-of-the-art classification results. Among the existing discriminative dictionary learning methods, two different approaches, shared dictionary and class-specific dictionary, which associate each dictionary atom to all classes or a single class, have been studied. The shared dictionary is a compact method but with lack of discriminative information; the class-specific dict...

  20. A method for classification of network traffic based on C5.0 Machine Learning Algorithm

    DEFF Research Database (Denmark)

    Bujlow, Tomasz; Riaz, M. Tahir; Pedersen, Jens Myrup

    2012-01-01

    current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of statistical traffic information received from volunteers and C5.0 algorithm we constructed a boosted classifier, which was shown...

  1. Analysis of Distributed and Adaptive Genetic Algorithm for Mining Interesting Classification Rules

    Institute of Scientific and Technical Information of China (English)

    YI Yunfei; LIN Fang; QIN Jun

    2008-01-01

    Distributed genetic algorithm can be combined with the adaptive genetic algorithm for mining the interesting and comprehensible classification rules. The paper gives the method to encode for the rules, the fitness function, the selecting, crossover, mutation and migration operator for the DAGA at the same time are designed.

  2. Using genetic algorithms to select and create features for pattern classification. Technical report

    Energy Technology Data Exchange (ETDEWEB)

    Chang, E.I.; Lippmann, R.P.

    1991-03-11

    Genetic algorithms were used to select and create features and to select reference exemplar patterns for machine vision and speech pattern classification tasks. On a 15-feature machine-vision inspection task, it was found that genetic algorithms performed no better than conventional approaches to feature selection but required much more computation. For a speech recognition task, genetic algorithms required no more computation time than traditional approaches but reduced the number of features required by a factor of five (from 153 to 33 features). On a difficult artificial machine-vision task, genetic algorithms were able to create new features (polynomial functions of the original features) that reduced classification error rates from 10 to almost 0 percent. Neural net and nearest-neighbor classifiers were unable to provide such low error rates using only the original features. Genetic algorithms were also used to reduce the number of reference exemplar patterns and to select the value of k for a k-nearest-neighbor classifier. On a .338 training pattern vowel recognition problem with 10 classes, genetic algorithms simultaneously reduced the number of stored exemplars from 338 to 63 and selected k without significantly decreasing classification accuracy. In all applications, genetic algorithms were easy to apply and found good solutions in many fewer trials than would be required by an exhaustive search. Run times were long but not unreasonable. These results suggest that genetic algorithms may soon be practical for pattern classification problems as faster serial and parallel computers are developed.

  3. New Dandelion Algorithm Optimizes Extreme Learning Machine for Biomedical Classification Problems

    Directory of Open Access Journals (Sweden)

    Xiguang Li

    2017-01-01

    Full Text Available Inspired by the behavior of dandelion sowing, a new novel swarm intelligence algorithm, namely, dandelion algorithm (DA, is proposed for global optimization of complex functions in this paper. In DA, the dandelion population will be divided into two subpopulations, and different subpopulations will undergo different sowing behaviors. Moreover, another sowing method is designed to jump out of local optimum. In order to demonstrate the validation of DA, we compare the proposed algorithm with other existing algorithms, including bat algorithm, particle swarm optimization, and enhanced fireworks algorithm. Simulations show that the proposed algorithm seems much superior to other algorithms. At the same time, the proposed algorithm can be applied to optimize extreme learning machine (ELM for biomedical classification problems, and the effect is considerable. At last, we use different fusion methods to form different fusion classifiers, and the fusion classifiers can achieve higher accuracy and better stability to some extent.

  4. Application and comparison of classification algorithms for recognition of Alzheimer's disease in electrical brain activity (EEG).

    Science.gov (United States)

    Lehmann, Christoph; Koenig, Thomas; Jelic, Vesna; Prichep, Leslie; John, Roy E; Wahlund, Lars-Olof; Dodge, Yadolah; Dierks, Thomas

    2007-04-15

    The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.

  5. A NEW UNSUPERVISED CLASSIFICATION ALGORITHM FOR POLARIMETRIC SAR IMAGES BASED ON FUZZY SET THEORY

    Institute of Scientific and Technical Information of China (English)

    Fu Yusheng; Xie Yan; Pi Yiming; Hou Yinming

    2006-01-01

    In this letter, a new method is proposed for unsupervised classification of terrain types and man-made objects using POLarimetric Synthetic Aperture Radar (POLSAR) data. This technique is a combination of the usage of polarimetric information of SAR images and the unsupervised classification method based on fuzzy set theory. Image quantization and image enhancement are used to preprocess the POLSAR data. Then the polarimetric information and Fuzzy C-Means (FCM) clustering algorithm are used to classify the preprocessed images. The advantages of this algorithm are the automated classification, its high classification accuracy, fast convergence and high stability. The effectiveness of this algorithm is demonstrated by experiments using SIR-C/X-SAR (Spaceborne Imaging Radar-C/X-band Synthetic Aperture Radar) data.

  6. Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer

    Science.gov (United States)

    Ruske, Simon; Topping, David O.; Foot, Virginia E.; Kaye, Paul H.; Stanley, Warren R.; Crawford, Ian; Morse, Andrew P.; Gallagher, Martin W.

    2017-03-01

    Characterisation of bioaerosols has important implications within environment and public health sectors. Recent developments in ultraviolet light-induced fluorescence (UV-LIF) detectors such as the Wideband Integrated Bioaerosol Spectrometer (WIBS) and the newly introduced Multiparameter Bioaerosol Spectrometer (MBS) have allowed for the real-time collection of fluorescence, size and morphology measurements for the purpose of discriminating between bacteria, fungal spores and pollen.This new generation of instruments has enabled ever larger data sets to be compiled with the aim of studying more complex environments. In real world data sets, particularly those from an urban environment, the population may be dominated by non-biological fluorescent interferents, bringing into question the accuracy of measurements of quantities such as concentrations. It is therefore imperative that we validate the performance of different algorithms which can be used for the task of classification.For unsupervised learning we tested hierarchical agglomerative clustering with various different linkages. For supervised learning, 11 methods were tested, including decision trees, ensemble methods (random forests, gradient boosting and AdaBoost), two implementations for support vector machines (libsvm and liblinear) and Gaussian methods (Gaussian naïve Bayesian, quadratic and linear discriminant analysis, the k-nearest neighbours algorithm and artificial neural networks).The methods were applied to two different data sets produced using the new MBS, which provides multichannel UV-LIF fluorescence signatures for single airborne biological particles. The first data set contained mixed PSLs and the second contained a variety of laboratory-generated aerosol.Clustering in general performs slightly worse than the supervised learning methods, correctly classifying, at best, only 67. 6 and 91. 1 % for the two data sets respectively. For supervised learning the gradient boosting algorithm was

  7. A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis

    Directory of Open Access Journals (Sweden)

    Bendi Venkata Ramana

    2011-03-01

    Full Text Available Patients with Liver disease have been continuously increasing because of excessive consumption ofalcohol, inhale of harmful gases, intake of contaminated food, pickles and drugs. Automatic classificationtools may reduce burden on doctors. This paper evaluates the selected classification algorithms for theclassification of some liver patient datasets. The classification algorithms considered here are Naïve Bayesclassifier, C4.5, Back propagation Neural Network algorithm, and Support Vector Machines. Thesealgorithms are evaluated based on four criteria: Accuracy, Precision, Sensitivity and Specificity

  8. An Improved BP Algorithm and Its Application in Classification of Surface Defects of Steel Plate

    Institute of Scientific and Technical Information of China (English)

    ZHAO Xiang-yang; LAI Kang-sheng; DAI Dong-ming

    2007-01-01

    Artificial neural network is a new approach to pattern recognition and classification. The model of multilayer perceptron (MLP) and back-propagation (BP) is used to train the algorithm in the artificial neural network. An improved fast algorithm of the BP network was presented, which adopts a singular value decomposition (SVD) and a generalized inverse matrix. It not only increases the speed of network learning but also achieves a satisfying precision. The simulation and experiment results show the effect of improvement of BP algorithm on the classification of the surface defects of steel plate.

  9. Using genetic algorithm feature selection in neural classification systems for image pattern recognition

    Directory of Open Access Journals (Sweden)

    Margarita R. Gamarra A.

    2012-09-01

    Full Text Available Pattern recognition performance depends on variations during extraction, selection and classification stages. This paper presents an approach to feature selection by using genetic algorithms with regard to digital image recognition and quality control. Error rate and kappa coefficient were used for evaluating the genetic algorithm approach Neural networks were used for classification, involving the features selected by the genetic algorithms. The neural network approach was compared to a K-nearest neighbor classifier. The proposed approach performed better than the other methods.

  10. Ice classification algorithm development and verification for the Alaska SAR Facility using aircraft imagery

    Science.gov (United States)

    Holt, Benjamin; Kwok, Ronald; Rignot, Eric

    1989-01-01

    The Alaska SAR Facility (ASF) at the University of Alaska, Fairbanks is a NASA program designed to receive, process, and archive SAR data from ERS-1 and to support investigations that will use this regional data. As part of ASF, specialized subsystems and algorithms to produce certain geophysical products from the SAR data are under development. Of particular interest are ice motion, ice classification, and ice concentration. This work focuses on the algorithm under development for ice classification, and the verification of the algorithm using C-band aircraft SAR imagery recently acquired over the Alaskan arctic.

  11. Application of a Genetic Algorithm to Nearest Neighbour Classification

    NARCIS (Netherlands)

    Simkin, S.; Verwaart, D.; Vrolijk, H.C.J.

    2005-01-01

    This paper describes the application of a genetic algorithm to nearest-neighbour based imputation of sample data into a census data dataset. The genetic algorithm optimises the selection and weights of variables used for measuring distance. The results show that the measure of fit can be improved by

  12. Multiclass Semi-Supervised Boosting and Similarity Learning

    NARCIS (Netherlands)

    Tanha, J.; Saberian, M.J.; van Someren, M.; Xiong, H.; Karypis, G.; Thuraisingham, B.; Cook, D.; Wu, X.

    2013-01-01

    In this paper, we consider the multiclass semi-supervised classification problem. A boosting algorithm is proposed to solve the multiclass problem directly. The proposed multiclass approach uses a new multiclass loss function, which includes two terms. The first term is the cost of the multiclass ma

  13. Two-step Classification Algorithm Based on Decision-Theoretic Rough Set Theory

    Directory of Open Access Journals (Sweden)

    Jun Wang

    2013-07-01

    Full Text Available This paper introduces rough set theory and decision-theoretic rough set theory. Then based on the latter, a two-step classification algorithm is proposed. Compared with primitive DTRST algorithms, our method decreases the range of negative domain and employs a two-steps strategy in classification. New samples and unknown samples can be estimated whether it belongs to the negative domain when they are found. Then, fewer wrong samples will be classified in negative domain. Therefore, error rate and loss of classification is lowered. Compared with traditional information filtering methods, such as Naive Bayes algorithm and primitive DTRST algorithm, the proposed method can gain high accuracy and low loss.  

  14. Application of ant colony algorithm in plant leaves classification based on infrared spectroscopy

    Science.gov (United States)

    Guo, Tiantai; Hong, Bo; Kong, Ming; Zhao, Jun

    2014-04-01

    This paper proposes to use ant colony algorithm in the analysis of spectral data of plant leaves to achieve the best classification of different plants within a short time. Intelligent classification is realized according to different components of featured information included in near infrared spectrum data of plants. The near infrared diffusive emission spectrum curves of the leaves of Cinnamomum camphora and Acer saccharum Marsh are acquired, which have 75 leaves respectively, and are divided into two groups. Then, the acquired data are processed using ant colony algorithm and the same kind of leaves can be classified as a class by ant colony clustering algorithm. Finally, the two groups of data are classified into two classes. Experiment results show that the algorithm can distinguish different species up to the percentage of 100%. The classification of plant leaves has important application value in agricultural development, research of species invasion, floriculture etc.

  15. Packet Classification by Multilevel Cutting of the Classification Space: An Algorithmic-Architectural Solution for IP Packet Classification in Next Generation Networks

    Directory of Open Access Journals (Sweden)

    Motasem Aldiab

    2008-01-01

    Full Text Available Traditionally, the Internet provides only a “best-effort” service, treating all packets going to the same destination equally. However, providing differentiated services for different users based on their quality requirements is increasingly becoming a demanding issue. For this, routers need to have the capability to distinguish and isolate traffic belonging to different flows. This ability to determine the flow each packet belongs to is called packet classification. Technology vendors are reluctant to support algorithmic solutions for classification due to their nondeterministic performance. Although content addressable memories (CAMs are favoured by technology vendors due to their deterministic high-lookup rates, they suffer from the problems of high-power consumption and high-silicon cost. This paper provides a new algorithmic-architectural solution for packet classification that mixes CAMs with algorithms based on multilevel cutting of the classification space into smaller spaces. The provided solution utilizes the geometrical distribution of rules in the classification space. It provides the deterministic performance of CAMs, support for dynamic updates, and added flexibility for system designers.

  16. Tomato classification based on laser metrology and computer algorithms

    Science.gov (United States)

    Igno Rosario, Otoniel; Muñoz Rodríguez, J. Apolinar; Martínez Hernández, Haydeé P.

    2011-08-01

    An automatic technique for tomato classification is presented based on size and color. The size is determined based on surface contouring by laser line scanning. Here, a Bezier network computes the tomato height based on the line position. The tomato color is determined by CIELCH color space and the components red and green. Thus, the tomato size is classified in large, medium and small. Also, the tomato is classified into six colors associated with its maturity. The performance and accuracy of the classification system is evaluated based on methods reported in the recent years. The technique is tested and experimental results are presented.

  17. Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy.

    Science.gov (United States)

    Gibbons, Chris; Richards, Suzanne; Valderas, Jose Maria; Campbell, John

    2017-03-15

    Machine learning techniques may be an effective and efficient way to classify open-text reports on doctor's activity for the purposes of quality assurance, safety, and continuing professional development. The objective of the study was to evaluate the accuracy of machine learning algorithms trained to classify open-text reports of doctor performance and to assess the potential for classifications to identify significant differences in doctors' professional performance in the United Kingdom. We used 1636 open-text comments (34,283 words) relating to the performance of 548 doctors collected from a survey of clinicians' colleagues using the General Medical Council Colleague Questionnaire (GMC-CQ). We coded 77.75% (1272/1636) of the comments into 5 global themes (innovation, interpersonal skills, popularity, professionalism, and respect) using a qualitative framework. We trained 8 machine learning algorithms to classify comments and assessed their performance using several training samples. We evaluated doctor performance using the GMC-CQ and compared scores between doctors with different classifications using t tests. Individual algorithm performance was high (range F score=.68 to .83). Interrater agreement between the algorithms and the human coder was highest for codes relating to "popular" (recall=.97), "innovator" (recall=.98), and "respected" (recall=.87) codes and was lower for the "interpersonal" (recall=.80) and "professional" (recall=.82) codes. A 10-fold cross-validation demonstrated similar performance in each analysis. When combined together into an ensemble of multiple algorithms, mean human-computer interrater agreement was .88. Comments that were classified as "respected," "professional," and "interpersonal" related to higher doctor scores on the GMC-CQ compared with comments that were not classified (Pdoctors who were rated as popular or innovative and those who were not rated at all (P>.05). Machine learning algorithms can classify open-text feedback

  18. A VLSI optimal constructive algorithm for classification problems

    Energy Technology Data Exchange (ETDEWEB)

    Beiu, V. [Los Alamos National Lab., NM (United States); Draghici, S.; Sethi, I.K. [Wayne State Univ., Detroit, MI (United States)

    1997-10-01

    If neural networks are to be used on a large scale, they have to be implemented in hardware. However, the cost of the hardware implementation is critically sensitive to factors like the precision used for the weights, the total number of bits of information and the maximum fan-in used in the network. This paper presents a version of the Constraint Based Decomposition training algorithm which is able to produce networks using limited precision integer weights and units with limited fan-in. The algorithm is tested on the 2-spiral problem and the results are compared with other existing algorithms.

  19. Classification of Noisy Data: An Approach Based on Genetic Algorithms and Voronoi Tessellation

    DEFF Research Database (Denmark)

    Khan, Abdul Rauf; Schiøler, Henrik; Knudsen, Torben;

    2016-01-01

    Classification is one of the major constituents of the data-mining toolkit. The well-known methods for classification are built on either the principle of logic or statistical/mathematical reasoning for classification. In this article we propose: (1) a different strategy, which is based......). The results of this study suggest that our proposed methodology is specialized to deal with the classification problem of highly imbalanced classes with significant overlap....... on the portioning of information space; and (2) use of the genetic algorithm to solve combinatorial problems for classification. In particular, we will implement our methodology to solve complex classification problems and compare the performance of our classifier with other well-known methods (SVM, KNN, and ANN...

  20. Determining The Effect of Some Mechanical Properties on Color Maturity of Tomato With K-Star, Random Forest and Decision Tree (C4.5 Classification Algorithms

    Directory of Open Access Journals (Sweden)

    Hande Küçükönder

    2015-02-01

    Full Text Available This study was conducted in order to determine the effect of the mechanical properties such as maximum force at the skin rupture point, energy at the skin rupture point and the skin firmness on color maturity of tomato by supervised learning algorithms of data mining. In the present study, a total of 88 tomato samples were used, and color measurements for each tomato in 4 different equatorial regions were performed and a total of 352 color measurement units were used. In the classification processes performed according to these mechanical properties, K-Star, Random Forest and Decision Tree (C4.5 algorithms of data mining were utilized, and in the comparison of comprising classification models, Root Mean Square Error (RMSE, Mean absolute error (MAE, Root relative squared error (RRSE and Relative absolute error (RAE values, which are some of the criteria of error variance, were considered to be low, while the classification accuracy rate was considered to be high. As a result of the comparison made, the classification model formed according to K-Star instance-based algorithm [MAE: 0.004, RMSE: 0.006, %RAE: 1.73, %RRSE: 1.70] has been found to be a better classifier compared to the others. With the classification made according to K-Star algorithm, the maximum force at the skin rupture point on the degree of maturity of tomato and the skin firmness were found to be green, light red, and their effects are non-significant during the color conversion periods, and found significant during other periods while the energy at the skin rupture point is only pink and has been to be significant during the color conversion stages and non-significant during other stages.

  1. Study on Increasing the Accuracy of Classification Based on Ant Colony algorithm

    Science.gov (United States)

    Yu, M.; Chen, D.-W.; Dai, C.-Y.; Li, Z.-L.

    2013-05-01

    The application for GIS advances the ability of data analysis on remote sensing image. The classification and distill of remote sensing image is the primary information source for GIS in LUCC application. How to increase the accuracy of classification is an important content of remote sensing research. Adding features and researching new classification methods are the ways to improve accuracy of classification. Ant colony algorithm based on mode framework defined, agents of the algorithms in nature-inspired computation field can show a kind of uniform intelligent computation mode. It is applied in remote sensing image classification is a new method of preliminary swarm intelligence. Studying the applicability of ant colony algorithm based on more features and exploring the advantages and performance of ant colony algorithm are provided with very important significance. The study takes the outskirts of Fuzhou with complicated land use in Fujian Province as study area. The multi-source database which contains the integration of spectral information (TM1-5, TM7, NDVI, NDBI) and topography characters (DEM, Slope, Aspect) and textural information (Mean, Variance, Homogeneity, Contrast, Dissimilarity, Entropy, Second Moment, Correlation) were built. Classification rules based different characters are discovered from the samples through ant colony algorithm and the classification test is performed based on these rules. At the same time, we compare with traditional maximum likelihood method, C4.5 algorithm and rough sets classifications for checking over the accuracies. The study showed that the accuracy of classification based on the ant colony algorithm is higher than other methods. In addition, the land use and cover changes in Fuzhou for the near term is studied and display the figures by using remote sensing technology based on ant colony algorithm. In addition, the land use and cover changes in Fuzhou for the near term is studied and display the figures by using

  2. Spatiotemporal representations of rapid visual target detection: a single-trial EEG classification algorithm.

    Science.gov (United States)

    Fuhrmann Alpert, Galit; Manor, Ran; Spanier, Assaf B; Deouell, Leon Y; Geva, Amir B

    2014-08-01

    Brain computer interface applications, developed for both healthy and clinical populations, critically depend on decoding brain activity in single trials. The goal of the present study was to detect distinctive spatiotemporal brain patterns within a set of event related responses. We introduce a novel classification algorithm, the spatially weighted FLD-PCA (SWFP), which is based on a two-step linear classification of event-related responses, using fisher linear discriminant (FLD) classifier and principal component analysis (PCA) for dimensionality reduction. As a benchmark algorithm, we consider the hierarchical discriminant component Analysis (HDCA), introduced by Parra, et al. 2007. We also consider a modified version of the HDCA, namely the hierarchical discriminant principal component analysis algorithm (HDPCA). We compare single-trial classification accuracies of all the three algorithms, each applied to detect target images within a rapid serial visual presentation (RSVP, 10 Hz) of images from five different object categories, based on single-trial brain responses. We find a systematic superiority of our classification algorithm in the tested paradigm. Additionally, HDPCA significantly increases classification accuracies compared to the HDCA. Finally, we show that presenting several repetitions of the same image exemplars improve accuracy, and thus may be important in cases where high accuracy is crucial.

  3. Analysis and Evaluation of IKONOS Image Fusion Algorithm Based on Land Cover Classification

    Institute of Scientific and Technical Information of China (English)

    Xia; JING; Yan; BAO

    2015-01-01

    Different fusion algorithm has its own advantages and limitations,so it is very difficult to simply evaluate the good points and bad points of the fusion algorithm. Whether an algorithm was selected to fuse object images was also depended upon the sensor types and special research purposes. Firstly,five fusion methods,i. e. IHS,Brovey,PCA,SFIM and Gram-Schmidt,were briefly described in the paper. And then visual judgment and quantitative statistical parameters were used to assess the five algorithms. Finally,in order to determine which one is the best suitable fusion method for land cover classification of IKONOS image,the maximum likelihood classification( MLC) was applied using the above five fusion images. The results showed that the fusion effect of SFIM transform and Gram-Schmidt transform were better than the other three image fusion methods in spatial details improvement and spectral information fidelity,and Gram-Schmidt technique was superior to SFIM transform in the aspect of expressing image details. The classification accuracy of the fused image using Gram-Schmidt and SFIM algorithms was higher than that of the other three image fusion methods,and the overall accuracy was greater than 98%. The IHS-fused image classification accuracy was the lowest,the overall accuracy and kappa coefficient were 83. 14% and 0. 76,respectively. Thus the IKONOS fusion images obtained by the Gram-Schmidt and SFIM were better for improving the land cover classification accuracy.

  4. 数据流分类器算法在水质环境中的应用%The Application of Data Stream Classification Algorithm in Water Quality Environment

    Institute of Scientific and Technical Information of China (English)

    曹红; 郑鑫

    2014-01-01

    许多现实应用中,由于数据流的特性,使人们难以获得全部数据的类标签。为了解决类标签不完整数据流的分类问题,本文首先分析了有标签数据集对基于聚类假设半监督分类算法分类误差的影响;然后,利用分类误差影响分析以及数据流的特点,提出一种基于聚类假设半监督数据流集成分类器算法(semi-supervised data stream ensemble classifiers under the cluster assumption, SSDSEC),并针对个体分类器的权值设定进行了探讨;最后,利用仿真实验验证本文算法的有效性。%In many real-world applications, due to the characteristics of the data stream, makes it difficult to get the class labels of all data. This paper first analyzes in order to solve the problem of the class label incomplete data stream classification, labeled data set based on clustering assuming semi-supervised classification algorithms classification error; then use classification errors affect the analysis as well as the characteristics of the data stream is proposed semi-supervised data stream the integrated classifier algorithm (Semi-supervised data stream ensemble classifiers under the cluster assumption, SSDSEC), and assigning weights for individual classifier based clustering assumptions; Finally, the simulation results verify the proposed algorithm effectiveness.

  5. Performance of Activity Classification Algorithms in Free-Living Older Adults.

    Science.gov (United States)

    Sasaki, Jeffer Eidi; Hickey, Amanda M; Staudenmayer, John W; John, Dinesh; Kent, Jane A; Freedson, Patty S

    2016-05-01

    The objective of this study is to compare activity type classification rates of machine learning algorithms trained on laboratory versus free-living accelerometer data in older adults. Thirty-five older adults (21 females and 14 males, 70.8 ± 4.9 yr) performed selected activities in the laboratory while wearing three ActiGraph GT3X+ activity monitors (in the dominant hip, wrist, and ankle; ActiGraph, LLC, Pensacola, FL). Monitors were initialized to collect raw acceleration data at a sampling rate of 80 Hz. Fifteen of the participants also wore GT3X+ in free-living settings and were directly observed for 2-3 h. Time- and frequency-domain features from acceleration signals of each monitor were used to train random forest (RF) and support vector machine (SVM) models to classify five activity types: sedentary, standing, household, locomotion, and recreational activities. All algorithms were trained on laboratory data (RFLab and SVMLab) and free-living data (RFFL and SVMFL) using 20-s signal sampling windows. Classification accuracy rates of both types of algorithms were tested on free-living data using a leave-one-out technique. Overall classification accuracy rates for the algorithms developed from laboratory data were between 49% (wrist) and 55% (ankle) for the SVMLab algorithms and 49% (wrist) to 54% (ankle) for the RFLab algorithms. The classification accuracy rates for SVMFL and RFFL algorithms ranged from 58% (wrist) to 69% (ankle) and from 61% (wrist) to 67% (ankle), respectively. Our algorithms developed on free-living accelerometer data were more accurate in classifying the activity type in free-living older adults than those on our algorithms developed on laboratory accelerometer data. Future studies should consider using free-living accelerometer data to train machine learning algorithms in older adults.

  6. Plant leaf image classification based on supervised orthogonal locality preserving projections%基于监督正交局部保持映射的植物叶片图像分类方法

    Institute of Scientific and Technical Information of China (English)

    张善文; 张传雷; 程雷

    2013-01-01

    problem degrades the recognition performance of these algorithms. To overcome the problem, a supervised orthogonal LPP (SOLPP) algorithm is presented and applied to the plant classification by using leaf images, based on locality preserving projections (LPP). LPP can be trained and applied as a linear projection and can model feature vectors that are assumed to lie on a nonlinear embedding subspace by preserving local relations among input features, so it has an advantage over conventional linear dimensionality reduction algorithms like principal components analysis (PCA) and linear discriminant analysis (LDA). First, the class information matrix is computed by the Warshall algorithm, which is an efficient method for computing the transitive closure of a relationship. It takes a matrix as input to represent the relationship of the observed data, and outputs a matrix of the transitive closure of the original data relationship. Based on the matrix, the within-class and between-class matrices are obtained by making full use of the local information and class information of the data. After dimensionality reduction, in subspace space, the distances between the same-class samples become smaller, while the distances between the different-class samples become larger. This characteristic can improve the classifying performance of the proposed algorithm. Compared with the classical subspace supervised dimensional reduction algorithms, in the proposed method, it is not necessary to judge whether any two samples belong to the same class or not when constructing the within-class and between-class scatter matrices, which can improve the classifying performance of the proposed algorithm. Finally, the K-nearest neighborhood classifier is applied to classifying plants. Comparison experiments with other existing algorithms, such as neighborhood rough set(NRS), support vector machine(SVM), efficient moving center hypersphere(MCH), modified locally linear discriminant embedding(MLLDE) and

  7. Online co-regularized algorithms

    NARCIS (Netherlands)

    Ruijter, T. de; Tsivtsivadze, E.; Heskes, T.

    2012-01-01

    We propose an online co-regularized learning algorithm for classification and regression tasks. We demonstrate that by sequentially co-regularizing prediction functions on unlabeled data points, our algorithm provides improved performance in comparison to supervised methods on several UCI benchmarks

  8. Online co-regularized algorithms

    NARCIS (Netherlands)

    Ruijter, T. de; Tsivtsivadze, E.; Heskes, T.

    2012-01-01

    We propose an online co-regularized learning algorithm for classification and regression tasks. We demonstrate that by sequentially co-regularizing prediction functions on unlabeled data points, our algorithm provides improved performance in comparison to supervised methods on several UCI benchmarks

  9. Improved Algorithm of Pattern Classification and Recognition Applied in a Coal Dust Sensor

    Institute of Scientific and Technical Information of China (English)

    MA Feng-ying; SONG Shu

    2007-01-01

    To resolve the conflicting requirements of measurement precision and real-time performance speed, an improved algorithm for pattern classification and recognition was developed. The angular distribution of diffracted light varies with particle size. These patterns could be classified into groups with an innovative classification based upon reference dust samples. After such classification patterns could be recognized easily and rapidly by minimizing the variance between the reference pattern and dust sample eigenvectors. Simulation showed that the maximum recognition speed improves 20 fold. This enables the use of a single-chip, real-time inversion algorithm. An increased number of reference patterns reduced the errors in total and respiring coal dust measurements. Experiments in coal mine testify that the accuracy of sensor achieves 95%. Results indicate the improved algorithm enhances the precision and real-time capability of the coal dust sensor effectively.

  10. Analysis of data mining classification by comparison of C4.5 and ID algorithms

    Science.gov (United States)

    Sudrajat, R.; Irianingsih, I.; Krisnawan, D.

    2017-01-01

    The rapid development of information technology, triggered by the intensive use of information technology. For example, data mining widely used in investment. Many techniques that can be used assisting in investment, the method that used for classification is decision tree. Decision tree has a variety of algorithms, such as C4.5 and ID3. Both algorithms can generate different models for similar data sets and different accuracy. C4.5 and ID3 algorithms with discrete data provide accuracy are 87.16% and 99.83% and C4.5 algorithm with numerical data is 89.69%. C4.5 and ID3 algorithms with discrete data provides 520 and 598 customers and C4.5 algorithm with numerical data is 546 customers. From the analysis of the both algorithm it can classified quite well because error rate less than 15%.

  11. Improved semi-supervised online boosting for object tracking

    Science.gov (United States)

    Li, Yicui; Qi, Lin; Tan, Shukun

    2016-10-01

    The advantage of an online semi-supervised boosting method which takes object tracking problem as a classification problem, is training a binary classifier from labeled and unlabeled examples. Appropriate object features are selected based on real time changes in the object. However, the online semi-supervised boosting method faces one key problem: The traditional self-training using the classification results to update the classifier itself, often leads to drifting or tracking failure, due to the accumulated error during each update of the tracker. To overcome the disadvantages of semi-supervised online boosting based on object tracking methods, the contribution of this paper is an improved online semi-supervised boosting method, in which the learning process is guided by positive (P) and negative (N) constraints, termed P-N constraints, which restrict the labeling of the unlabeled samples. First, we train the classification by an online semi-supervised boosting. Then, this classification is used to process the next frame. Finally, the classification is analyzed by the P-N constraints, which are used to verify if the labels of unlabeled data assigned by the classifier are in line with the assumptions made about positive and negative samples. The proposed algorithm can effectively improve the discriminative ability of the classifier and significantly alleviate the drifting problem in tracking applications. In the experiments, we demonstrate real-time tracking of our tracker on several challenging test sequences where our tracker outperforms other related on-line tracking methods and achieves promising tracking performance.

  12. Supervised Mineral Classification with Semi-automatic Training and Validation Set Generation in Scanning Electron Microscope Energy Dispersive Spectroscopy Images of Thin Sections

    DEFF Research Database (Denmark)

    Flesche, Harald; Nielsen, Allan Aasbjerg; Larsen, Rasmus

    2000-01-01

    This paper addresses the problem of classifying minerals common in siliciclastic and carbonate rocks. Twelve chemical elements are mapped from thin sections by energy dispersive spectroscopy in a scanning electron microscope (SEM). Extensions to traditional multivariate statistical methods...... are applied to perform the classification. First, training and validation sets are grown from one or a few seed points by a method that ensures spatial and spectral closeness of observations. Spectral closeness is obtained by excluding observations that have high Mahalanobis distances to the training class......–Matusita distance and the posterior probability of a class mean being classified as another class. Fourth, the actual classification is carried out based on four supervised classifiers all assuming multinormal distributions: simple quadratic, a contextual quadratic, and two hierarchical quadratic classifiers...

  13. Assessing the Accuracy of Prediction Algorithms for Classification

    DEFF Research Database (Denmark)

    Baldi, P.; Brunak, Søren; Chauvin, Y.

    2000-01-01

    We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, ann correlation coefficients, and to information theoretic measures such as relative entropy and mutual...

  14. Incremental Image Classification Method Based on Semi-Supervised Learning%基于半监督学习的增量图像分类方法

    Institute of Scientific and Technical Information of China (English)

    梁鹏; 黎绍发; 覃姜维; 罗剑高

    2012-01-01

    In order to use large numbers of unlabeled images effectively, an image classification method is proposed based on semi-supervised learning. The proposed method bridges a large amount of unlabeled images and limited numbers of labeled images by exploiting the common topics. The classification accuracy is improved by using the must-link constraint and cannot-link constraint of labeled images. The experimental results on Caltech-101 and 7-classes image dataset demonstrate that the classification accuracy improves about 10% by the proposed method. Furthermore, due to the present semi-supervised image classification methods lacking of incremental learning ability, an incremental implementation of our method is proposed. Comparing with non-incremental learning model in literature, the incrementallearning method improves the computation efficiency of nearly 90%.%为有效使用大量未标注的图像进行分类,提出一种基于半监督学习的图像分类方法.通过共同的隐含话题桥接少量已标注的图像和大量未标注的图像,利用已标注图像的Must-link约束和Cannot-link约束提高未标注图像分类的精度.实验结果表明,该方法有效提高Caltech-101数据集和7类图像集约10%的分类精度.此外,针对目前绝大部分半监督图像分类方法不具备增量学习能力这一缺点,提出该方法的增量学习模型.实验结果表明,增量学习模型相比无增量学习模型提高近90%的计算效率.

  15. A Support Vector Machine Hydrometeor Classification Algorithm for Dual-Polarization Radar

    Directory of Open Access Journals (Sweden)

    Nicoletta Roberto

    2017-07-01

    Full Text Available An algorithm based on a support vector machine (SVM is proposed for hydrometeor classification. The training phase is driven by the output of a fuzzy logic hydrometeor classification algorithm, i.e., the most popular approach for hydrometer classification algorithms used for ground-based weather radar. The performance of SVM is evaluated by resorting to a weather scenario, generated by a weather model; the corresponding radar measurements are obtained by simulation and by comparing results of SVM classification with those obtained by a fuzzy logic classifier. Results based on the weather model and simulations show a higher accuracy of the SVM classification. Objective comparison of the two classifiers applied to real radar data shows that SVM classification maps are spatially more homogenous (textural indices, energy, and homogeneity increases by 21% and 12% respectively and do not present non-classified data. The improvements found by SVM classifier, even though it is applied pixel-by-pixel, can be attributed to its ability to learn from the entire hyperspace of radar measurements and to the accurate training. The reliability of results and higher computing performance make SVM attractive for some challenging tasks such as its implementation in Decision Support Systems for helping pilots to make optimal decisions about changes inthe flight route caused by unexpected adverse weather.

  16. Sequential Classification of Palm Gestures Based on A* Algorithm and MLP Neural Network for Quadrocopter Control

    Directory of Open Access Journals (Sweden)

    Wodziński Marek

    2017-06-01

    Full Text Available This paper presents an alternative approach to the sequential data classification, based on traditional machine learning algorithms (neural networks, principal component analysis, multivariate Gaussian anomaly detector and finding the shortest path in a directed acyclic graph, using A* algorithm with a regression-based heuristic. Palm gestures were used as an example of the sequential data and a quadrocopter was the controlled object. The study includes creation of a conceptual model and practical construction of a system using the GPU to ensure the realtime operation. The results present the classification accuracy of chosen gestures and comparison of the computation time between the CPU- and GPU-based solutions.

  17. Pap-smear Classification Using Efficient Second Order Neural Network Training Algorithms

    DEFF Research Database (Denmark)

    Ampazis, Nikolaos; Dounias, George; Jantzen, Jan

    2004-01-01

    problem. The classification results obtained from the application of the algorithms on a standard benchmark pap-smear data set reveal the power of the two methods to obtain excellent solutions in difficult classification problems whereas other standard computational intelligence techniques achieve......In this paper we make use of two highly efficient second order neural network training algorithms, namely the LMAM (Levenberg-Marquardt with Adaptive Momentum) and OLMAM (Optimized Levenberg-Marquardt with Adaptive Momentum), for the construction of an efficient pap-smear test classifier...

  18. Land cover classification using random forest with genetic algorithm-based parameter optimization

    Science.gov (United States)

    Ming, Dongping; Zhou, Tianning; Wang, Min; Tan, Tian

    2016-07-01

    Land cover classification based on remote sensing imagery is an important means to monitor, evaluate, and manage land resources. However, it requires robust classification methods that allow accurate mapping of complex land cover categories. Random forest (RF) is a powerful machine-learning classifier that can be used in land remote sensing. However, two important parameters of RF classification, namely, the number of trees and the number of variables tried at each split, affect classification accuracy. Thus, optimal parameter selection is an inevitable problem in RF-based image classification. This study uses the genetic algorithm (GA) to optimize the two parameters of RF to produce optimal land cover classification accuracy. HJ-1B CCD2 image data are used to classify six different land cover categories in Changping, Beijing, China. Experimental results show that GA-RF can avoid arbitrariness in the selection of parameters. The experiments also compare land cover classification results by using GA-RF method, traditional RF method (with default parameters), and support vector machine method. When the GA-RF method is used, classification accuracies, respectively, improved by 1.02% and 6.64%. The comparison results show that GA-RF is a feasible solution for land cover classification without compromising accuracy or incurring excessive time.

  19. A RBF classification method of remote sensing image based on genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    The remote sensing image classification has stimulated considerable interest as an effective method for better retrieving information from the rapidly increasing large volume, complex and distributed satellite remote imaging data of large scale and cross-time, due to the increase of remote image quantities and image resolutions. In the paper, the genetic algorithms were employed to solve the weighting of the radial basis faction networks in order to improve the precision of remote sensing image classification. The remote sensing image classification was also introduced for the GIS spatial analysis and the spatial online analytical processing (OLAP) ,and the resulted effectiveness was demonstrated in the analysis of land utilization variation of Daqing city.

  20. PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2011-01-01

    Full Text Available Packet classification plays a crucial role for a number of network services such as policy-based routing, firewalls, and traffic billing, to name a few. However, classification can be a bottleneck in the above-mentioned applications if not implemented properly and efficiently. In this paper, we propose PCIU, a novel classification algorithm, which improves upon previously published work. PCIU provides lower preprocessing time, lower memory consumption, ease of incremental rule update, and reasonable classification time compared to state-of-the-art algorithms. The proposed algorithm was evaluated and compared to RFC and HiCut using several benchmarks. Results obtained indicate that PCIU outperforms these algorithms in terms of speed, memory usage, incremental update capability, and preprocessing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Two such implementations are detailed and discussed in this paper. The results indicate that a hardware/software codesign approach results in a slower, but easier to optimize and improve within time constraints, PCIU solution. A hardware accelerator based on an ESL approach using Handel-C, on the other hand, resulted in a 31x speed-up over a pure software implementation running on a state of the art Xeon processor.

  1. Image-Derived Input Function Derived from a Supervised Clustering Algorithm: Methodology and Validation in a Clinical Protocol Using [11C](R)-Rolipram

    OpenAIRE

    Chul Hyoung Lyoo; Paolo Zanotti-Fregonara; Zoghbi, Sami S.; Jeih-San Liow; Rong Xu; Pike, Victor W.; Zarate, Carlos A.; Masahiro Fujita; Innis, Robert B.

    2014-01-01

    Image-derived input function (IDIF) obtained by manually drawing carotid arteries (manual-IDIF) can be reliably used in [(11)C](R)-rolipram positron emission tomography (PET) scans. However, manual-IDIF is time consuming and subject to inter- and intra-operator variability. To overcome this limitation, we developed a fully automated technique for deriving IDIF with a supervised clustering algorithm (SVCA). To validate this technique, 25 healthy controls and 26 patients with moderate to severe...

  2. Classification of posture maintenance data with fuzzy clustering algorithms

    Science.gov (United States)

    Bezdek, James C.

    1992-01-01

    Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.

  3. Data classification with radial basis function networks based on a novel kernel density estimation algorithm.

    Science.gov (United States)

    Oyang, Yen-Jen; Hwang, Shien-Ching; Ou, Yu-Yen; Chen, Chien-Yu; Chen, Zhi-Wei

    2005-01-01

    This paper presents a novel learning algorithm for efficient construction of the radial basis function (RBF) networks that can deliver the same level of accuracy as the support vector machines (SVMs) in data classification applications. The proposed learning algorithm works by constructing one RBF subnetwork to approximate the probability density function of each class of objects in the training data set. With respect to algorithm design, the main distinction of the proposed learning algorithm is the novel kernel density estimation algorithm that features an average time complexity of O(n log n), where n is the number of samples in the training data set. One important advantage of the proposed learning algorithm, in comparison with the SVM, is that the proposed learning algorithm generally takes far less time to construct a data classifier with an optimized parameter setting. This feature is of significance for many contemporary applications, in particular, for those applications in which new objects are continuously added into an already large database. Another desirable feature of the proposed learning algorithm is that the RBF networks constructed are capable of carrying out data classification with more than two classes of objects in one single run. In other words, unlike with the SVM, there is no need to resort to mechanisms such as one-against-one or one-against-all for handling datasets with more than two classes of objects. The comparison with SVM is of particular interest, because it has been shown in a number of recent studies that SVM generally are able to deliver higher classification accuracy than the other existing data classification algorithms. As the proposed learning algorithm is instance-based, the data reduction issue is also addressed in this paper. One interesting observation in this regard is that, for all three data sets used in data reduction experiments, the number of training samples remaining after a naive data reduction mechanism is

  4. Supervised Classification in the Presence of Misclassified Training Data: A Monte Carlo Simulation Study in the Three Group Case

    Directory of Open Access Journals (Sweden)

    Jocelyn E Bolin

    2014-02-01

    Full Text Available Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three-group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.

  5. Experimental analysis of the performance of machine learning algorithms in the classification of navigation accident records

    Directory of Open Access Journals (Sweden)

    REIS, M V. S. de A.

    2017-06-01

    Full Text Available This paper aims to evaluate the use of machine learning techniques in a database of marine accidents. We analyzed and evaluated the main causes and types of marine accidents in the Northern Fluminense region. For this, machine learning techniques were used. The study showed that the modeling can be done in a satisfactory manner using different configurations of classification algorithms, varying the activation functions and training parameters. The SMO (Sequential Minimal Optimization algorithm showed the best performance result.

  6. Research of information classification and strategy intelligence extract algorithm based on military strategy hall

    Science.gov (United States)

    Chen, Lei; Li, Dehua; Yang, Jie

    2007-12-01

    Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.

  7. Data classification using metaheuristic Cuckoo Search technique for Levenberg Marquardt back propagation (CSLM) algorithm

    Science.gov (United States)

    Nawi, Nazri Mohd.; Khan, Abdullah; Rehman, M. Z.

    2015-05-01

    A nature inspired behavior metaheuristic techniques which provide derivative-free solutions to solve complex problems. One of the latest additions to the group of nature inspired optimization procedure is Cuckoo Search (CS) algorithm. Artificial Neural Network (ANN) training is an optimization task since it is desired to find optimal weight set of a neural network in training process. Traditional training algorithms have some limitation such as getting trapped in local minima and slow convergence rate. This study proposed a new technique CSLM by combining the best features of two known algorithms back-propagation (BP) and Levenberg Marquardt algorithm (LM) for improving the convergence speed of ANN training and avoiding local minima problem by training this network. Some selected benchmark classification datasets are used for simulation. The experiment result show that the proposed cuckoo search with Levenberg Marquardt algorithm has better performance than other algorithm used in this study.

  8. Walking pattern classification and walking distance estimation algorithms using gait phase information.

    Science.gov (United States)

    Wang, Jeen-Shing; Lin, Che-Wei; Yang, Ya-Ting C; Ho, Yu-Jen

    2012-10-01

    This paper presents a walking pattern classification and a walking distance estimation algorithm using gait phase information. A gait phase information retrieval algorithm was developed to analyze the duration of the phases in a gait cycle (i.e., stance, push-off, swing, and heel-strike phases). Based on the gait phase information, a decision tree based on the relations between gait phases was constructed for classifying three different walking patterns (level walking, walking upstairs, and walking downstairs). Gait phase information was also used for developing a walking distance estimation algorithm. The walking distance estimation algorithm consists of the processes of step count and step length estimation. The proposed walking pattern classification and walking distance estimation algorithm have been validated by a series of experiments. The accuracy of the proposed walking pattern classification was 98.87%, 95.45%, and 95.00% for level walking, walking upstairs, and walking downstairs, respectively. The accuracy of the proposed walking distance estimation algorithm was 96.42% over a walking distance.

  9. Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

    Science.gov (United States)

    Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

    2016-09-01

    We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.

  10. Model classification rate control algorithm for video coding

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    A model classification rate control method for video coding is proposed. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model.The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps,such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macroblocks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG and H. 264 rate control.

  11. Improved Fault Classification in Series Compensated Transmission Line: Comparative Evaluation of Chebyshev Neural Network Training Algorithms.

    Science.gov (United States)

    Vyas, Bhargav Y; Das, Biswarup; Maheshwari, Rudra Prakash

    2016-08-01

    This paper presents the Chebyshev neural network (ChNN) as an improved artificial intelligence technique for power system protection studies and examines the performances of two ChNN learning algorithms for fault classification of series compensated transmission line. The training algorithms are least-square Levenberg-Marquardt (LSLM) and recursive least-square algorithm with forgetting factor (RLSFF). The performances of these algorithms are assessed based on their generalization capability in relating the fault current parameters with an event of fault in the transmission line. The proposed algorithm is fast in response as it utilizes postfault samples of three phase currents measured at the relaying end corresponding to half-cycle duration only. After being trained with only a small part of the generated fault data, the algorithms have been tested over a large number of fault cases with wide variation of system and fault parameters. Based on the studies carried out in this paper, it has been found that although the RLSFF algorithm is faster for training the ChNN in the fault classification application for series compensated transmission lines, the LSLM algorithm has the best accuracy in testing. The results prove that the proposed ChNN-based method is accurate, fast, easy to design, and immune to the level of compensations. Thus, it is suitable for digital relaying applications.

  12. The Optimization of Trained and Untrained Image Classification Algorithms for Use on Large Spatial Datasets

    Science.gov (United States)

    Kocurek, Michael J.

    2005-01-01

    The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.

  13. Classification of Parkinson's disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples.

    Science.gov (United States)

    Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang

    2016-11-16

    The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.

  14. Kollegial supervision

    DEFF Research Database (Denmark)

    Andersen, Ole Dibbern; Petersson, Erling

    Publikationen belyser, hvordan kollegial supervision i en kan organiseres i en uddannelsesinstitution......Publikationen belyser, hvordan kollegial supervision i en kan organiseres i en uddannelsesinstitution...

  15. Co-Training Semi-Supervised Active Learning Algorithm with Noise Filter%具有噪声过滤功能的协同训练半监督主动学习算法

    Institute of Scientific and Technical Information of China (English)

    詹永照; 陈亚必

    2009-01-01

    针对基于半监督学习的分类器利用未标记样本训练会引入噪声而使得分类性能下降的情形,文中提出一种具有噪声过滤功能的协同训练半监督主动学习算法.该算法以3个模糊深隐马尔可夫模型进行协同半监督学习,在适当的时候主动引入一些人机交互来补充类别标记,避免判决类别不相同时的拒判和初始时判决一致即认为正确的误判情形.同时加入噪声过滤机制,用以过滤南机器自动标记的可能是噪声的样本.将该算法应用于人脸表情识别.实验结果表明,该算法能有效提高未标记样本的利用率并降低半监督学习而引入的噪声,提高表情识别的准确率.%The classification performance of the classifier based on semi-supervised learning is weakened when the noise samples are introduced. An algorithm called co-training semi-supervised active learning with noise filter is presented to overcome this disadvantage. In this algorithm, three fuzzy buried Markov models are used to perform semi-supervised learning cooperatively. Some human-computer interactions are actively introduced into labelling the unlabeled sample at certain time in order to avoid the rejective judgment when the classifiers do not agree with each other and the inaccurate judgment when the initial weak classifiers all agree. Meanwhile, the noise filter is used to filter the possible noise samples which are labeled automatically by the computer. The proposed algorithm is applied to facial expression recognition. The experimental results show that the algorithm can effectively improve the utilization of unlabeled samples, reduce the introduction of noise samples and raise the accuracy of expression recognition.

  16. Engineering and Image Classification Framework Using Multi Instance Learning with KCCA Algorithm

    Directory of Open Access Journals (Sweden)

    P. Bhuvaneswari

    2012-12-01

    Full Text Available Image classification is a challenging task with many applications in computer vision. Images are annotated with multiple keywords that may or may not correlated. Therefore, image classification may be naturally modelled as Multiple Instance Learning problem. The main challenge of this problem is that usually classes are overlapped and correlated. In single label classification the correlation among instance is not taken into account. In an image the instance may belongs to several classes. The correlations among different tags can significantly help predicting precise labels for improving the performance of multi label image classification. This study proposes a method Kernel Canonical Correlation Analysis (KCCA and Multi Instance Learning for multi label image classification, for improving the performance of classification accuracy. The proposed framework comprises an input image which can be partitioned into image patches and features can be extracted. It breaks the original training set into several disjoint clusters of data. It then trains a multilabel classifier from the data of each cluster. The K means clustering is used to perform automatic instance cluster. Kernel canonical Correlation analysis can be made between disjoint clusters to know exact correspondence between image patches. Multi Instance Learning is one potential solution to address the issue of huge inter-concept visual similarity and improve the classification accuracy. The proposed approach reduces the training time of standard multi-label classification algorithms, particularly in the case of large number of labels.

  17. Improved neural network algorithm for classification of UAV imagery related to Wenchuan earthquake

    Science.gov (United States)

    Lin, Na; Yang, Wunian; Wang, Bin

    2009-06-01

    When Wenchuan earthquake struck, the terrain of the region changed violently. Unmanned aerial vehicles (UAV) remote sensing is effective in extracting first hand information. The high resolution images are of great importance in disaster management and relief operations. Back propagation (BP) neural network is an artificial neural network which combines multi-layer feed-forward network and error back-propagation algorithm. It has a strong input-output mapping capability, and does not require the object to be identified obeying certain distribution law. It has strong non-linear features and error-tolerant capabilities. Remotely-sensed image classification can achieve high accuracy and satisfactory error-tolerant capabilities. But it also has drawbacks such as slow convergence speed and can probably be trapped by local minimum points. In order to solve these problems, we have improved this algorithm through setting up self-adaptive training rate and adding momentum factor. UAV high-resolution aerial image in Taoguan District of Wenchuan County is used as data source. First, we preprocess UAV aerial images and rectify geometric distortion in images. Training samples were selected and purified. The image is then classified using the improved BP neural network algorithm. Finally, we compare such classification result with the maximum likelihood classification (MLC) result. Numerical comparison shows that the overall accuracy of maximum likelihood classification is 83.8%, while the improved BP neural network classification is 89.7%. The testing results indicate that the latter is better.

  18. The surgical algorithm for the AOSpine thoracolumbar spine injury classification system

    NARCIS (Netherlands)

    Vaccaro, Alexander R.; Schroeder, Gregory D.; Kepler, Christopher K.; Cumhur Oner, F.; Vialle, Luiz R.; Kandziora, Frank; Koerner, John D.; Kurd, Mark F.; Reinhold, Max; Schnake, Klaus J.; Chapman, Jens; Aarabi, Bizhan; Fehlings, Michael G.; Dvorak, Marcel F.

    2016-01-01

    Purpose: The goal of the current study is to establish a surgical algorithm to accompany the AOSpine thoracolumbar spine injury classification system. Methods: A survey was sent to AOSpine members from the six AO regions of the world, and surgeons were asked if a patient should undergo an initial tr

  19. A Comparative Study of Classification and Regression Algorithms for Modelling Students' Academic Performance

    Science.gov (United States)

    Strecht, Pedro; Cruz, Luís; Soares, Carlos; Mendes-Moreira, João; Abreu, Rui

    2015-01-01

    Predicting the success or failure of a student in a course or program is a problem that has recently been addressed using data mining techniques. In this paper we evaluate some of the most popular classification and regression algorithms on this problem. We address two problems: prediction of approval/failure and prediction of grade. The former is…

  20. Experiments in Discourse Analysis Impact on Information Classification and Retrieval Algorithms.

    Science.gov (United States)

    Morato, Jorge; Llorens, J.; Genova, G.; Moreiro, J. A.

    2003-01-01

    Discusses the inclusion of contextual information in indexing and retrieval systems to improve results and the ability to carry out text analysis by means of linguistic knowledge. Presents research that investigated whether discourse variables have an impact on information and retrieval and classification algorithms. (Author/LRW)

  1. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis

    Directory of Open Access Journals (Sweden)

    Benjamin W. Y. Lo

    2016-01-01

    Conclusions: A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.

  2. Classification of EEG Signals using adaptive weighted distance nearest neighbor algorithm

    Directory of Open Access Journals (Sweden)

    E. Parvinnia

    2014-01-01

    Full Text Available Electroencephalogram (EEG signals are often used to diagnose diseases such as seizure, alzheimer, and schizophrenia. One main problem with the recorded EEG samples is that they are not equally reliable due to the artifacts at the time of recording. EEG signal classification algorithms should have a mechanism to handle this issue. It seems that using adaptive classifiers can be useful for the biological signals such as EEG. In this paper, a general adaptive method named weighted distance nearest neighbor (WDNN is applied for EEG signal classification to tackle this problem. This classification algorithm assigns a weight to each training sample to control its influence in classifying test samples. The weights of training samples are used to find the nearest neighbor of an input query pattern. To assess the performance of this scheme, EEG signals of thirteen schizophrenic patients and eighteen normal subjects are analyzed for the classification of these two groups. Several features including, fractal dimension, band power and autoregressive (AR model are extracted from EEG signals. The classification results are evaluated using Leave one (subject out cross validation for reliable estimation. The results indicate that combination of WDNN and selected features can significantly outperform the basic nearest-neighbor and the other methods proposed in the past for the classification of these two groups. Therefore, this method can be a complementary tool for specialists to distinguish schizophrenia disorder.

  3. Consistent Classification of Landsat Time Series with an Improved Automatic Adaptive Signature Generalization Algorithm

    Directory of Open Access Journals (Sweden)

    Matthew P. Dannenberg

    2016-08-01

    Full Text Available Classifying land cover is perhaps the most common application of remote sensing, yet classification at frequent temporal intervals remains a challenging task due to radiometric differences among scenes, time and budget constraints, and semantic differences among class definitions from different dates. The automatic adaptive signature generalization (AASG algorithm overcomes many of these limitations by locating stable sites between two images and using them to adapt class spectral signatures from a high-quality reference classification to a new image, which mitigates the impacts of radiometric and phenological differences between images and ensures that class definitions remain consistent between the two classifications. We refined AASG to adapt stable site identification parameters to each individual land cover class, while also incorporating improved input data and a random forest classifier. In the Research Triangle region of North Carolina, our new version of AASG demonstrated an improved ability to update existing land cover classifications compared to the initial version of AASG, particularly for low intensity developed, mixed forest, and woody wetland classes. Topographic indices were particularly important for distinguishing woody wetlands from other forest types, while multi-seasonal imagery contributed to improved classification of water, developed, forest, and hay/pasture classes. These results demonstrate both the flexibility of the AASG algorithm and the potential for using it to produce high-quality land cover classifications that can utilize the entire temporal range of the Landsat archive in an automated fashion while maintaining consistent class definitions through time.

  4. Multispectral image classification of MRI data using an empirically-derived clustering algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Horn, K.M.; Osbourn, G.C.; Bouchard, A.M. [Sandia National Labs., Albuquerque, NM (United States); Sanders, J.A. [Univ. of New Mexico, Albuquerque, NM (United States)]|[VA Hospital, Albuquerque, NM (United States)

    1998-08-01

    Multispectral image analysis of magnetic resonance imaging (MRI) data has been performed using an empirically-derived clustering algorithm. This algorithm groups image pixels into distinct classes which exhibit similar response in the T{sub 2} 1st and 2nd-echo, and T{sub 1} (with ad without gadolinium) MRI images. The grouping is performed in an n-dimensional mathematical space; the n-dimensional volumes bounding each class define each specific tissue type. The classification results are rendered again in real-space by colored-coding each grouped class of pixels (associated with differing tissue types). This classification method is especially well suited for class volumes with complex boundary shapes, and is also expected to robustly detect abnormal tissue classes. The classification process is demonstrated using a three dimensional data set of MRI scans of a human brain tumor.

  5. Incremental Classification Algorithm of Hyperspectral Remote Sensing Images Based on Spectral-spatial Information

    Directory of Open Access Journals (Sweden)

    WANG Junshu

    2015-09-01

    Full Text Available An incremental classification algorithm INC_SPEC_MPext was proposed for hyperspectral remote sensing images based on spectral and spatial information. The spatial information was extracted by building morphological profiles based on several principle components of hyperspectral image. The morphological profiles were combined together in extended morphological profiles (MPext. Combine spectral and MPext to enrich knowledge and utilize the useful information of unlabeled data at the most extent to optimize the classifier. Pick out high confidence data and add to training set, then retrain the classifier with augmented training set to predict the rest samples. The process was performed iteratively. The proposed algorithm was tested on AVIRIS Indian Pines and Hyperion EO-1 Botswana data, which take on different covers, and experimental results show low classification cost and significant improvements in terms of accuracies and Kappa coefficient under limited training samples compared with the classification results based on spectral, MPext and the combination of sepctral and MPext.

  6. A Novel Algorithm for Imbalance Data Classification Based on Neighborhood Hypergraph

    Directory of Open Access Journals (Sweden)

    Feng Hu

    2014-01-01

    Full Text Available The classification problem for imbalance data is paid more attention to. So far, many significant methods are proposed and applied to many fields. But more efficient methods are needed still. Hypergraph may not be powerful enough to deal with the data in boundary region, although it is an efficient tool to knowledge discovery. In this paper, the neighborhood hypergraph is presented, combining rough set theory and hypergraph. After that, a novel classification algorithm for imbalance data based on neighborhood hypergraph is developed, which is composed of three steps: initialization of hyperedge, classification of training data set, and substitution of hyperedge. After conducting an experiment of 10-fold cross validation on 18 data sets, the proposed algorithm has higher average accuracy than others.

  7. [Comparative efficiency of algorithms based on support vector machines for binary classification].

    Science.gov (United States)

    Kadyrova, N O; Pavlova, L V

    2015-01-01

    Methods of construction of support vector machines require no further a priori infoimation and provide big data processing, what is especially important for various problems in computational biology. The question of the quality of learning algorithms is considered. The main algorithms of support vector machines for binary classification are reviewed and they were comparatively explored for their efficiencies. The critical analysis of the results of this study revealed the most effective support-vector-classifiers. The description of the recommended algorithms, sufficient for their practical implementation, is presented.

  8. Active Learning Algorithms for the Classification of Hyperspectral Sea Ice Images

    Directory of Open Access Journals (Sweden)

    Yanling Han

    2015-01-01

    Full Text Available Sea ice is one of the most critical marine disasters, especially in the polar and high latitude regions. Hyperspectral image is suitable for monitoring the sea ice, which contains continuous spectrum information and has better ability of target recognition. The principal bottleneck for the classification of hyperspectral image is a large number of labeled training samples required. However, the collection of labeled samples is time consuming and costly. In order to solve this problem, we apply the active learning (AL algorithm to hyperspectral sea ice detection which can select the most informative samples. Moreover, we propose a novel investigated AL algorithm based on the evaluation of two criteria: uncertainty and diversity. The uncertainty criterion is based on the difference between the probabilities of the two classes having the highest estimated probabilities, while the diversity criterion is based on a kernel k-means clustering technology. In the experiments of Baffin Bay in northwest Greenland on April 12, 2014, our proposed AL algorithm achieves the highest classification accuracy of 89.327% compared with other AL algorithms and random sampling, while achieving the same classification accuracy, the proposed AL algorithm needs less labeling cost.

  9. Classification of textures in satellite image with Gabor filters and a multi layer perceptron with back propagation algorithm obtaining high accuracy

    Directory of Open Access Journals (Sweden)

    Adriano Beluco, Paulo M. Engel, Alexandre Beluco

    2015-01-01

    Full Text Available The classification of images, in many cases, is applied to identify an alphanumeric string, a facial expression or any other characteristic. In the case of satellite images is necessary to classify all the pixels of the image. This article describes a supervised classification method for remote sensing images that integrates the importance of attributes in selecting features with the efficiency of artificial neural networks in the classification process, resulting in high accuracy for real images. The method consists of a texture segmentation based on Gabor filtering followed by an image classification itself with an application of a multi layer artificial neural network with a back propagation algorithm. The method was first applied to a synthetic image, like training, and then applied to a satellite image. Some results of experiments are presented in detail and discussed. The application of the method to the synthetic image resulted in the identification of 89.05% of the pixels of the image, while applying to the satellite image resulted in the identification of 85.15% of the pixels. The result for the satellite image can be considered a result of high accuracy.

  10. Hyperspectral Image Land Cover Classification Algorithm Based on Spatial-spectral Coordination Embedding

    Directory of Open Access Journals (Sweden)

    HUANG Hong

    2016-08-01

    Full Text Available Aiming at the problem that in hyperspectral image land cover classification, the traditional classification methods just apply the spectral information while they ignore the relationship between the spatial neighbors, a new dimensionality algorithm called spatial-spectral coordination embedding (SSCE and a new classifier called spatial-spectral coordination nearest neighbor (SSCNN were proposed in this paper. Firstly, the proposed method defines a spatial-spectral coordination distance and the distance is applied to the neighbor selection and low-dimensional embedding. Then, it constructs a spatial-spectral neighborhood graph to maintain the manifold structure of the data set, and enhances the aggregation of data through raising weight of the spatial neighbor points to extract the discriminant features. Finally, it uses the SSCNN to classify the reduced dimensional data. Experimental results using PaviaU and Salinas data set show that the proposed method can effectively improve ground objects classification accuracy comparing with traditional spectral classification methods.

  11. Fanning - A classification algorithm for mixture landscapes applied to Landsat data of Maine forests

    Science.gov (United States)

    Ungar, S. G.; Bryant, E.

    1981-01-01

    It is pointed out that typical landscapes include a relatively small number of 'pure' land cover types which combine in various proportions to form a myriad of mixture types. Most Landsat classifications algorithms used today require a separate user specification for each category, including mixture categories. Attention is given to a simpler approach, which would require the user to specify only the 'pure' types. Mixture pixels would be classified on the basis of the proportion of the area covered by each pure type within the pixel. The 'fanning' algorithm quantifies varying proportions of two 'pure' land cover types in selected mixture pixels. This algorithm was applied to 200,000 ha of forest land in Maine, taking into account a comparison with standard inventory information. Results compared well with a discrete categories classification of the same area.

  12. Image processing and classification algorithm for yeast cell morphology in a microfluidic chip.

    Science.gov (United States)

    Yang Yu, Bo; Elbuken, Caglar; Ren, Carolyn L; Huissoon, Jan P

    2011-06-01

    The study of yeast cell morphology requires consistent identification of cell cycle phases based on cell bud size. A computer-based image processing algorithm is designed to automatically classify microscopic images of yeast cells in a microfluidic channel environment. The images were enhanced to reduce background noise, and a robust segmentation algorithm is developed to extract geometrical features including compactness, axis ratio, and bud size. The features are then used for classification, and the accuracy of various machine-learning classifiers is compared. The linear support vector machine, distance-based classification, and k-nearest-neighbor algorithm were the classifiers used in this experiment. The performance of the system under various illumination and focusing conditions were also tested. The results suggest it is possible to automatically classify yeast cells based on their morphological characteristics with noisy and low-contrast images.

  13. Algorithms for the Automatic Classification and Sorting of Conifers in the Garden Nursery Industry

    DEFF Research Database (Denmark)

    Petri, Stig

    was used as the basis for evaluating the constructed feature extraction algorithms. Through an analysis of the construction of a machine vision system suitable for classifying and sorting plants, the needs with regard to physical frame, lighting system, camera and software algorithms have been uncovered......, resulting in a prototype data acquisition system that can possibly be integrated into a production line (conveyor) system. The developed software includes the necessary functions for acquiring images, normalizing these, extracting features, creating and optimizing classification models, and evaluating......The ultimate purpose of this work is the development of general feature extraction algorithms useful for the classification and sorting of plants in the garden nursery industry. Narrowing the area of focus to bare-root plants, more specifically Nordmann firs, the scientific literature dealing...

  14. PERFORMANCE EVALUATION OF THE DATA MINING CLASSIFICATION METHODS

    Directory of Open Access Journals (Sweden)

    CRISTINA OPREA

    2014-05-01

    Full Text Available The paper aims to analyze how the performance evaluation of different classification models from data mining process. Classification is the most widely used data mining technique of supervised learning. This is the process of identifying a set of features and templates that describe the data classes or concepts. We applied various classification algorithms on different data sets to streamline and improve the algorithm performance.

  15. Active semi-supervised learning method with hybrid deep belief networks.

    Science.gov (United States)

    Zhou, Shusen; Chen, Qingcai; Wang, Xiaolong

    2014-01-01

    In this paper, we develop a novel semi-supervised learning algorithm called active hybrid deep belief networks (AHD), to address the semi-supervised sentiment classification problem with deep learning. First, we construct the previous several hidden layers using restricted Boltzmann machines (RBM), which can reduce the dimension and abstract the information of the reviews quickly. Second, we construct the following hidden layers using convolutional restricted Boltzmann machines (CRBM), which can abstract the information of reviews effectively. Third, the constructed deep architecture is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Finally, active learning method is combined based on the proposed deep architecture. We did several experiments on five sentiment classification datasets, and show that AHD is competitive with previous semi-supervised learning algorithm. Experiments are also conducted to verify the effectiveness of our proposed method with different number of labeled reviews and unlabeled reviews respectively.

  16. Robust algorithm for arrhythmia classification in ECG using extreme learning machine

    Directory of Open Access Journals (Sweden)

    Shin Kwangsoo

    2009-10-01

    Full Text Available Abstract Background Recently, extensive studies have been carried out on arrhythmia classification algorithms using artificial intelligence pattern recognition methods such as neural network. To improve practicality, many studies have focused on learning speed and the accuracy of neural networks. However, algorithms based on neural networks still have some problems concerning practical application, such as slow learning speeds and unstable performance caused by local minima. Methods In this paper we propose a novel arrhythmia classification algorithm which has a fast learning speed and high accuracy, and uses Morphology Filtering, Principal Component Analysis and Extreme Learning Machine (ELM. The proposed algorithm can classify six beat types: normal beat, left bundle branch block, right bundle branch block, premature ventricular contraction, atrial premature beat, and paced beat. Results The experimental results of the entire MIT-BIH arrhythmia database demonstrate that the performances of the proposed algorithm are 98.00% in terms of average sensitivity, 97.95% in terms of average specificity, and 98.72% in terms of average accuracy. These accuracy levels are higher than or comparable with those of existing methods. We make a comparative study of algorithm using an ELM, back propagation neural network (BPNN, radial basis function network (RBFN, or support vector machine (SVM. Concerning the aspect of learning time, the proposed algorithm using ELM is about 290, 70, and 3 times faster than an algorithm using a BPNN, RBFN, and SVM, respectively. Conclusion The proposed algorithm shows effective accuracy performance with a short learning time. In addition we ascertained the robustness of the proposed algorithm by evaluating the entire MIT-BIH arrhythmia database.

  17. Extreme Learning Machine for land cover classification

    OpenAIRE

    Pal, Mahesh

    2008-01-01

    This paper explores the potential of extreme learning machine based supervised classification algorithm for land cover classification. In comparison to a backpropagation neural network, which requires setting of several user-defined parameters and may produce local minima, extreme learning machine require setting of one parameter and produce a unique solution. ETM+ multispectral data set (England) was used to judge the suitability of extreme learning machine for remote sensing classifications...

  18. Partial imputation to improve predictive modelling in insurance risk classification using a hybrid positive selection algorithm and correlation-based feature selection

    CSIR Research Space (South Africa)

    Duma, M

    2013-09-01

    Full Text Available We propose a hybrid missing data imputation technique using positive selection and correlation-based feature selection for insurance data. The hybrid is used to help supervised learning methods improve their classification accuracy and resilience...

  19. Automatic classification of schizophrenia using resting-state functional language network via an adaptive learning algorithm

    Science.gov (United States)

    Zhu, Maohu; Jie, Nanfeng; Jiang, Tianzi

    2014-03-01

    A reliable and precise classification of schizophrenia is significant for its diagnosis and treatment of schizophrenia. Functional magnetic resonance imaging (fMRI) is a novel tool increasingly used in schizophrenia research. Recent advances in statistical learning theory have led to applying pattern classification algorithms to access the diagnostic value of functional brain networks, discovered from resting state fMRI data. The aim of this study was to propose an adaptive learning algorithm to distinguish schizophrenia patients from normal controls using resting-state functional language network. Furthermore, here the classification of schizophrenia was regarded as a sample selection problem where a sparse subset of samples was chosen from the labeled training set. Using these selected samples, which we call informative vectors, a classifier for the clinic diagnosis of schizophrenia was established. We experimentally demonstrated that the proposed algorithm incorporating resting-state functional language network achieved 83.6% leaveone- out accuracy on resting-state fMRI data of 27 schizophrenia patients and 28 normal controls. In contrast with KNearest- Neighbor (KNN), Support Vector Machine (SVM) and l1-norm, our method yielded better classification performance. Moreover, our results suggested that a dysfunction of resting-state functional language network plays an important role in the clinic diagnosis of schizophrenia.

  20. Liver disorder diagnosis using linear, nonlinear and decision tree classification algorithms

    Directory of Open Access Journals (Sweden)

    Aman Singh

    2016-10-01

    Full Text Available In India and across the globe, liver disease is a serious area of concern in medicine. Therefore, it becomes essential to use classification algorithms for assessing the disease in order to improve the efficiency of medical diagnosis which eventually leads to appropriate and timely treatment. The study accordingly implemented various classification algorithms including linear discriminant analysis (LDA, diagonal linear discriminant analysis (DLDA, quadratic discriminant analysis (QDA, diagonal quadratic discriminant analysis (DQDA, naive bayes (NB, feed-forward neural network (FFNN and classification and regression tree (CART in an attempt to enhance the diagnostic accuracy of liver disorder and to reduce the inefficiencies caused by false diagnosis. The results demonstrated that CART had emerged as the best model by achieving higher diagnostic accuracy than LDA, DLDA, QDA, DQDA, NB and FFNN. FFNN stood second in comparison and performed better than rest of the classifiers. After evaluation, it can be said that the precision of a classification algorithm depends on the type and features of a dataset. For the given dataset, decision tree classifier CART outperforms all other linear and nonlinear classifiers. It also showed the capability of assisting clinicians in determining the existence of liver disorder, in attaining better diagnosis and in avoiding delay in treatment.

  1. 基于ENVI的遥感图像监督分类方法比较研究%The Comparative Study of Remote Sensing Image Supervised Classification Methods Based on ENVI

    Institute of Scientific and Technical Information of China (English)

    闫琰; 董秀兰; 李燕

    2011-01-01

    基于监督分类方法在遥感影像分类中的普遍应用,介绍了四种ENVI提供的常用的监督分类方法。对同一TM图像运用这四种方法进行分类,并对分类结果进行了对比,从而分析了这四种方法分类精度之间的差异。%This paper describes four commonly used methods of supervised classification ENVI provides,based on the universal application of supervised classification in remote sensing image classification.The same TM image is classified using four methods,the result was analyzed essentially.Therefore,the paper analyzes the difference between the classification accuracy of these four methods.

  2. Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data.

    Science.gov (United States)

    Zintzaras, Elias; Kowald, Axel

    2010-05-01

    Classification into multiple classes when the measured variables are outnumbered is a major methodological challenge in -omics studies. Two algorithms that overcome the dimensionality problem are presented: the forest classification tree (FCT) and the forest support vector machines (FSVM). In FCT, a set of variables is randomly chosen and a classification tree (CT) is grown using a forward classification algorithm. The process is repeated and a forest of CTs is derived. Finally, the most frequent variables from the trees with the smallest apparent misclassification rate (AMR) are used to construct a productive tree. In FSVM, the CTs are replaced by SVMs. The methods are demonstrated using prostate gene expression data for classifying tissue samples into four tumor types. For threshold split value 0.001 and utilizing 100 markers the productive CT consisted of 29 terminal nodes and achieved perfect classification (AMR=0). When the threshold value was set to 0.01, a tree with 17 terminal nodes was constructed based on 15 markers (AMR=7%). In FSVM, reducing the fraction of the forest that was used to construct the best classifier from the top 80% to the top 20% reduced the misclassification to 25% (when using 200 markers). The proposed methodologies may be used for identifying important variables in high dimensional data. Furthermore, the FCT allows exploring the data structure and provides a decision rule.

  3. Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA – A statistical learning approach

    Directory of Open Access Journals (Sweden)

    R. Jegadeeshwaran

    2015-03-01

    Full Text Available In automobile, brake system is an essential part responsible for control of the vehicle. Any failure in the brake system impacts the vehicle's motion. It will generate frequent catastrophic effects on the vehicle cum passenger's safety. Thus the brake system plays a vital role in an automobile and hence condition monitoring of the brake system is essential. Vibration based condition monitoring using machine learning techniques are gaining momentum. This study is one such attempt to perform the condition monitoring of a hydraulic brake system through vibration analysis. In this research, the performance of a Clonal Selection Classification Algorithm (CSCA for brake fault diagnosis has been reported. A hydraulic brake system test rig was fabricated. Under good and faulty conditions of a brake system, the vibration signals were acquired using a piezoelectric transducer. The statistical parameters were extracted from the vibration signal. The best feature set was identified for classification using attribute evaluator. The selected features were then classified using CSCA. The classification accuracy of such artificial intelligence technique has been compared with other machine learning approaches and discussed. The Clonal Selection Classification Algorithm performs better and gives the maximum classification accuracy (96% for the fault diagnosis of a hydraulic brake system.

  4. A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification

    CERN Document Server

    Weber, I; Henzinger, M; Baykan, E

    2011-01-01

    Given only the URL of a Web page, can we identify its topic? We study this problem in detail by exploring a large number of different feature sets and algorithms on several datasets. We also show that the inherent overlap between topics and the sparsity of the information in URLs makes this a very challenging problem. Web page classification without a page's content is desirable when the content is not available at all, when a classification is needed before obtaining the content, or when classification speed is of utmost importance. For our experiments we used five different corpora comprising a total of about 3 million (URL, classification) pairs. We evaluated several techniques for feature generation and classification algorithms. The individual binary classifiers were then combined via boosting into metabinary classifiers. We achieve typical F-measure values between 80 and 85, and a typical precision of around 86. The precision can be pushed further over 90 while maintaining a typical level of recall betw...

  5. A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

    Directory of Open Access Journals (Sweden)

    Lavner Yizhar

    2009-01-01

    Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.

  6. Comparison of some classification algorithms based on deterministic and nondeterministic decision rules

    KAUST Repository

    Delimata, Paweł

    2010-01-01

    We discuss two, in a sense extreme, kinds of nondeterministic rules in decision tables. The first kind of rules, called as inhibitory rules, are blocking only one decision value (i.e., they have all but one decisions from all possible decisions on their right hand sides). Contrary to this, any rule of the second kind, called as a bounded nondeterministic rule, can have on the right hand side only a few decisions. We show that both kinds of rules can be used for improving the quality of classification. In the paper, two lazy classification algorithms of polynomial time complexity are considered. These algorithms are based on deterministic and inhibitory decision rules, but the direct generation of rules is not required. Instead of this, for any new object the considered algorithms extract from a given decision table efficiently some information about the set of rules. Next, this information is used by a decision-making procedure. The reported results of experiments show that the algorithms based on inhibitory decision rules are often better than those based on deterministic decision rules. We also present an application of bounded nondeterministic rules in construction of rule based classifiers. We include the results of experiments showing that by combining rule based classifiers based on minimal decision rules with bounded nondeterministic rules having confidence close to 1 and sufficiently large support, it is possible to improve the classification quality. © 2010 Springer-Verlag.

  7. Spectral Classification of Similar Materials using the Tetracorder Algorithm: The Calcite-Epidote-Chlorite Problem

    Science.gov (United States)

    Dalton, J. Brad; Bove, Dana; Mladinich, Carol; Clark, Roger; Rockwell, Barnaby; Swayze, Gregg; King, Trude; Church, Stanley

    2001-01-01

    Recent work on automated spectral classification algorithms has sought to distinguish ever-more similar materials. From modest beginnings separating shade, soil, rock and vegetation to ambitious attempts to discriminate mineral types and specific plant species, the trend seems to be toward using increasingly subtle spectral differences to perform the classification. Rule-based expert systems exploiting the underlying physics of spectroscopy such as the US Geological Society Tetracorder system are now taking advantage of the high spectral resolution and dimensionality of current imaging spectrometer designs to discriminate spectrally similar materials. The current paper details recent efforts to discriminate three minerals having absorptions centered at the same wavelength, with encouraging results.

  8. Spectral Classification of Similar Materials using the Tetracorder Algorithm: The Calcite-Epidote-Chlorite Problem

    Science.gov (United States)

    Dalton, J. Brad; Bove, Dana; Mladinich, Carol; Clark, Roger; Rockwell, Barnaby; Swayze, Gregg; King, Trude; Church, Stanley

    2001-01-01

    Recent work on automated spectral classification algorithms has sought to distinguish ever-more similar materials. From modest beginnings separating shade, soil, rock and vegetation to ambitious attempts to discriminate mineral types and specific plant species, the trend seems to be toward using increasingly subtle spectral differences to perform the classification. Rule-based expert systems exploiting the underlying physics of spectroscopy such as the US Geological Society Tetracorder system are now taking advantage of the high spectral resolution and dimensionality of current imaging spectrometer designs to discriminate spectrally similar materials. The current paper details recent efforts to discriminate three minerals having absorptions centered at the same wavelength, with encouraging results.

  9. Classification of Aerosol Retrievals from Spaceborne Polarimetry Using a Multiparameter Algorithm

    Science.gov (United States)

    Russell, Philip B.; Kacenelenbogen, Meloe; Livingston, John M.; Hasekamp, Otto P.; Burton, Sharon P.; Schuster, Gregory L.; Johnson, Matthew S.; Knobelspiesse, Kirk D.; Redemann, Jens; Ramachandran, S.; hide

    2013-01-01

    In this presentation, we demonstrate application of a new aerosol classification algorithm to retrievals from the POLDER-3 polarimter on the PARASOL spacecraft. Motivation and method: Since the development of global aerosol measurements by satellites and AERONET, classification of observed aerosols into several types (e.g., urban-industrial, biomass burning, mineral dust, maritime, and various subtypes or mixtures of these) has proven useful to: understanding aerosol sources, transformations, effects, and feedback mechanisms; improving accuracy of satellite retrievals and quantifying assessments of aerosol radiative impacts on climate.

  10. Human Talent Prediction in HRM using C4.5 Classification Algorithm

    Directory of Open Access Journals (Sweden)

    Hamidah Jantan,

    2010-11-01

    Full Text Available In HRM, among the challenges for HR professionals is to manage an organization’s talents, especially to ensure the right person for the right job at the right time. Human talent prediction is an alternative to handle this issue. Due to that reason, classification and prediction in data mining which is commonly used in many areas can also be implemented to human talent. There are many classification techniques in data mining techniques such as Decision Tree, Neural Network, Rough Set Theory, Bayesian theory and Fuzzy logic. Decision tree is among the popular classification techniques, which can produce the interpretable rules or logic statement. Thegenerated rules from the selected technique can be used for future prediction. In this article, we present the study on how the potential human talent can be predicted using a decision tree classifier. By using this technique, the pattern of talent performance can be identified through the classification process. In that case, the hidden and valuable knowledge discovered in the related databases will be summarized in the decision tree structure. In this study, we use decision tree C4.5 classification algorithm to generate the classification rules for human talent performance records. Finally, the generated rules are evaluated using the unseen data in order to estimate the accuracy of the prediction result.

  11. A fast version of the k-means classification algorithm for astronomical applications

    CERN Document Server

    Ordovás-Pascual, I

    2014-01-01

    Context. K-means is a clustering algorithm that has been used to classify large datasets in astronomical databases. It is an unsupervised method, able to cope very different types of problems. Aims. We check whether a variant of the algorithm called single-pass k-means can be used as a fast alternative to the traditional k-means. Methods. The execution time of the two algorithms are compared when classifying subsets drawn from the SDSS-DR7 catalog of galaxy spectra. Results. Single-pass k-means turn out to be between 20 % and 40 % faster than k-means and provide statistically equivalent classifications. This conclusion can be scaled up to other larger databases because the execution time of both algorithms increases linearly with the number of objects. Conclusions. Single-pass k-means can be safely used as a fast alternative to k-means.

  12. Application of the probability-based covering algorithm model in text classification

    Institute of Scientific and Technical Information of China (English)

    ZHOU; Ying

    2009-01-01

    The probability-based covering algorithm(PBCA)is a new algorithm based on probability distribution.It decides,by voting,the class of the tested samples on the border of the coverage area,based on the probability of training samples.When using the original covering algorithm(CA),many tested samples that are located on the border of the coverage cannot be classified by the spherical neighborhood gained.The network structure of PBCA is a mixed structure composed of both a feed-forward network and a feedback network.By using this method of adding some heterogeneous samples and enlarging the coverage radius,it is possible to decrease the number of rejected samples and improve the rate of recognition accuracy.Relevant computer experiments indicate that the algorithm improves the study precision and achieves reasonably good results in text classification.

  13. A Region-Based GeneSIS Segmentation Algorithm for the Classification of Remotely Sensed Images

    Directory of Open Access Journals (Sweden)

    Stelios K. Mylonas

    2015-03-01

    Full Text Available This paper proposes an object-based segmentation/classification scheme for remotely sensed images, based on a novel variant of the recently proposed Genetic Sequential Image Segmentation (GeneSIS algorithm. GeneSIS segments the image in an iterative manner, whereby at each iteration a single object is extracted via a genetic-based object extraction algorithm. Contrary to the previous pixel-based GeneSIS where the candidate objects to be extracted were evaluated through the fuzzy content of their included pixels, in the newly developed region-based GeneSIS algorithm, a watershed-driven fine segmentation map is initially obtained from the original image, which serves as the basis for the forthcoming GeneSIS segmentation. Furthermore, in order to enhance the spatial search capabilities, we introduce a more descriptive encoding scheme in the object extraction algorithm, where the structural search modules are represented by polygonal shapes. Our objectives in the new framework are posed as follows: enhance the flexibility of the algorithm in extracting more flexible object shapes, assure high level classification accuracies, and reduce the execution time of the segmentation, while at the same time preserving all the inherent attributes of the GeneSIS approach. Finally, exploiting the inherent attribute of GeneSIS to produce multiple segmentations, we also propose two segmentation fusion schemes that operate on the ensemble of segmentations generated by GeneSIS. Our approaches are tested on an urban and two agricultural images. The results show that region-based GeneSIS has considerably lower computational demands compared to the pixel-based one. Furthermore, the suggested methods achieve higher classification accuracies and good segmentation maps compared to a series of existing algorithms.

  14. Discharges Classification using Genetic Algorithms and Feature Selection Algorithms on Time and Frequency Domain Data Extracted from Leakage Current Measurements

    Directory of Open Access Journals (Sweden)

    D. Pylarinos

    2013-12-01

    Full Text Available A number of 387 discharge portraying waveforms recorded on 18 different 150 kV post insulators installed at two different Substations in Crete, Greece are considered in this paper. Twenty different features are extracted from each waveform and two feature selection algorithms (t-test and mRMR are employed. Genetic algorithms are used to classify waveforms in two different classes related to the portrayed discharges. Five different data sets are employed (1. the original feature vector, 2. time domain features, 3. frequency domain features, 4. t-test selected features 5. mRMR selected features. Results are discussed and compared with previous classification implementations on this particular data group.

  15. Algorithms for Hyperspectral Signature Classification in Non-resolved Object Characterization Using Tabular Nearest Neighbor Encoding

    Science.gov (United States)

    Schmalz, M.; Key, G.

    Accurate spectral signature classification is key to the nonimaging detection and recognition of spaceborne objects. In classical hyperspectral recognition applications, signature classification accuracy depends on accurate spectral endmember determination [1]. However, in selected target recognition (ATR) applications, it is possible to circumvent the endmember detection problem by employing a Bayesian classifier. Previous approaches to Bayesian classification of spectral signatures have been rule- based, or predicated on a priori parameterized information obtained from offline training, as in the case of neural networks [1,2]. Unfortunately, class separation and classifier refinement results in these methods tends to be suboptimal, and the number of signatures that can be accurately classified often depends linearly on the number of inputs. This can lead to potentially significant classification errors in the presence of noise or densely interleaved signatures. In this paper, we present an emerging technology for nonimaging spectral signature classfication based on a highly accurate but computationally efficient search engine called Tabular Nearest Neighbor Encoding (TNE) [3]. Based on prior results, TNE can optimize its classifier performance to track input nonergodicities, as well as yield measures of confidence or caution for evaluation of classification results. Unlike neural networks, TNE does not have a hidden intermediate data structure (e.g., the neural net weight matrix). Instead, TNE generates and exploits a user-accessible data structure called the agreement map (AM), which can be manipulated by Boolean logic operations to effect accurate classifier refinement algorithms. This allows the TNE programmer or user to determine parameters for classification accuracy, and to mathematically analyze the signatures for which TNE did not obtain classification matches. This dual approach to analysis (i.e., correct vs. incorrect classification) has been shown to

  16. Algorithm for optimizing bipolar interconnection weights with applications in associative memories and multitarget classification.

    Science.gov (United States)

    Chang, S; Wong, K W; Zhang, W; Zhang, Y

    1999-08-10

    An algorithm for optimizing a bipolar interconnection weight matrix with the Hopfield network is proposed. The effectiveness of this algorithm is demonstrated by computer simulation and optical implementation. In the optical implementation of the neural network the interconnection weights are biased to yield a nonnegative weight matrix. Moreover, a threshold subchannel is added so that the system can realize, in real time, the bipolar weighted summation in a single channel. Preliminary experimental results obtained from the applications in associative memories and multitarget classification with rotation invariance are shown.

  17. Algorithms for the Automatic Classification and Sorting of Conifers in the Garden Nursery Industry

    DEFF Research Database (Denmark)

    Petri, Stig

    , resulting in a prototype data acquisition system that can possibly be integrated into a production line (conveyor) system. The developed software includes the necessary functions for acquiring images, normalizing these, extracting features, creating and optimizing classification models, and evaluating...... was used as the basis for evaluating the constructed feature extraction algorithms. Through an analysis of the construction of a machine vision system suitable for classifying and sorting plants, the needs with regard to physical frame, lighting system, camera and software algorithms have been uncovered...

  18. A simulation of remote sensor systems and data processing algorithms for spectral feature classification

    Science.gov (United States)

    Arduini, R. F.; Aherron, R. M.; Samms, R. W.

    1984-01-01

    A computational model of the deterministic and stochastic processes involved in multispectral remote sensing was designed to evaluate the performance of sensor systems and data processing algorithms for spectral feature classification. Accuracy in distinguishing between categories of surfaces or between specific types is developed as a means to compare sensor systems and data processing algorithms. The model allows studies to be made of the effects of variability of the atmosphere and of surface reflectance, as well as the effects of channel selection and sensor noise. Examples of these effects are shown.

  19. A comparison of classification techniques for glacier change detection using multispectral images

    Directory of Open Access Journals (Sweden)

    Rahul Nijhawan

    2016-09-01

    Full Text Available Main aim of this paper is to compare the classification accuracies of glacier change detection by following classifiers: sub-pixel classification algorithm, indices based supervised classification and object based algorithm using Landsat imageries. It was observed that shadow effect was not removed in sub-pixel based classification which was removed by the indices method. Further the accuracy was improved by object based classification. Objective of the paper is to analyse different classification algorithms and interpret which one gives the best results in mountainous regions. The study showed that object based method was best in mountainous regions as optimum results were obtained in the shadowed covered regions.

  20. On the Automated Segmentation of Epicardial and Mediastinal Cardiac Adipose Tissues Using Classification Algorithms.

    Science.gov (United States)

    Rodrigues, Érick Oliveira; Cordeiro de Morais, Felipe Fernandes; Conci, Aura

    2015-01-01

    The quantification of fat depots on the surroundings of the heart is an accurate procedure for evaluating health risk factors correlated with several diseases. However, this type of evaluation is not widely employed in clinical practice due to the required human workload. This work proposes a novel technique for the automatic segmentation of cardiac fat pads. The technique is based on applying classification algorithms to the segmentation of cardiac CT images. Furthermore, we extensively evaluate the performance of several algorithms on this task and discuss which provided better predictive models. Experimental results have shown that the mean accuracy for the classification of epicardial and mediastinal fats has been 98.4% with a mean true positive rate of 96.2%. On average, the Dice similarity index, regarding the segmented patients and the ground truth, was equal to 96.8%. Therfore, our technique has achieved the most accurate results for the automatic segmentation of cardiac fats, to date.

  1. Activity recognition in planetary navigation field tests using classification algorithms applied to accelerometer data.

    Science.gov (United States)

    Song, Wen; Ade, Carl; Broxterman, Ryan; Barstow, Thomas; Nelson, Thomas; Warren, Steve

    2012-01-01

    Accelerometer data provide useful information about subject activity in many different application scenarios. For this study, single-accelerometer data were acquired from subjects participating in field tests that mimic tasks that astronauts might encounter in reduced gravity environments. The primary goal of this effort was to apply classification algorithms that could identify these tasks based on features present in their corresponding accelerometer data, where the end goal is to establish methods to unobtrusively gauge subject well-being based on sensors that reside in their local environment. In this initial analysis, six different activities that involve leg movement are classified. The k-Nearest Neighbors (kNN) algorithm was found to be the most effective, with an overall classification success rate of 90.8%.

  2. Granular computing classification algorithms based on distance measures between granules from the view of set.

    Science.gov (United States)

    Liu, Hongbing; Liu, Chunhua; Wu, Chang-an

    2014-01-01

    Granular computing classification algorithms are proposed based on distance measures between two granules from the view of set. Firstly, granules are represented as the forms of hyperdiamond, hypersphere, hypercube, and hyperbox. Secondly, the distance measure between two granules is defined from the view of set, and the union operator between two granules is formed to obtain the granule set including the granules with different granularity. Thirdly the threshold of granularity determines the union between two granules and is used to form the granular computing classification algorithms based on distance measures (DGrC). The benchmark datasets in UCI Machine Learning Repository are used to verify the performance of DGrC, and experimental results show that DGrC improved the testing accuracies.

  3. Study on the classification algorithm of degree of arteriosclerosis based on fuzzy pattern recognition

    Science.gov (United States)

    Ding, Li; Zhou, Runjing; Liu, Guiying

    2010-08-01

    Pulse wave of human body contains large amount of physiological and pathological information, so the degree of arteriosclerosis classification algorithm is study based on fuzzy pattern recognition in this paper. Taking the human's pulse wave as the research object, we can extract the characteristic of time and frequency domain of pulse signal, and select the parameters with a better clustering effect for arteriosclerosis identification. Moreover, the validity of characteristic parameters is verified by fuzzy ISODATA clustering method (FISOCM). Finally, fuzzy pattern recognition system can quantitatively distinguish the degree of arteriosclerosis with patients. By testing the 50 samples in the built pulse database, the experimental result shows that the algorithm is practical and achieves a good classification recognition result.

  4. Classification performance of a block-compressive sensing algorithm for hyperspectral data processing

    Science.gov (United States)

    Arias, Fernando X.; Sierra, Heidy; Arzuaga, Emmanuel

    2016-05-01

    Compressive Sensing is an area of great recent interest for efficient signal acquisition, manipulation and reconstruction tasks in areas where sensor utilization is a scarce and valuable resource. The current work shows that approaches based on this technology can improve the efficiency of manipulation, analysis and storage processes already established for hyperspectral imagery, with little discernible loss in data performance upon reconstruction. We present the results of a comparative analysis of classification performance between a hyperspectral data cube acquired by traditional means, and one obtained through reconstruction from compressively sampled data points. To obtain a broad measure of the classification performance of compressively sensed cubes, we classify a commonly used scene in hyperspectral image processing algorithm evaluation using a set of five classifiers commonly used in hyperspectral image classification. Global accuracy statistics are presented and discussed, as well as class-specific statistical properties of the evaluated data set.

  5. Classification of EEG Signals using adaptive weighted distance nearest neighbor algorithm

    OpenAIRE

    E. Parvinnia; M. Sabeti; M. Zolghadri Jahromi; Boostani, R

    2014-01-01

    Electroencephalogram (EEG) signals are often used to diagnose diseases such as seizure, alzheimer, and schizophrenia. One main problem with the recorded EEG samples is that they are not equally reliable due to the artifacts at the time of recording. EEG signal classification algorithms should have a mechanism to handle this issue. It seems that using adaptive classifiers can be useful for the biological signals such as EEG. In this paper, a general adaptive method named weighted distance near...

  6. Hybrid SPR algorithm to select predictive genes for effectual cancer classification

    OpenAIRE

    2012-01-01

    Designing an automated system for classifying DNA microarray data is an extremely challenging problem because of its high dimension and low amount of sample data. In this paper, a hybrid statistical pattern recognition algorithm is proposed to reduce the dimensionality and select the predictive genes for the classification of cancer. Colon cancer gene expression profiles having 62 samples of 2000 genes were used for the experiment. A gene subset of 6 highly informative genes was selecte...

  7. GriMa: a Grid Mining Algorithm for Bag-of-Grid-Based Classification

    OpenAIRE

    Deville, Romain; Fromont, Elisa; Jeudy, Baptiste; Solnon, Christine

    2016-01-01

    International audience; General-purpose exhaustive graph mining algorithms have seldom been used in real life contexts due to the high complexity of the process that is mostly based on costly isomorphism tests and countless expansion possibilities. In this paper, we explain how to exploit grid-based representations of problems to efficiently extract frequent grid subgraphs and create Bag-of-Grids which can be used as new features for classification purposes. We provide an efficient grid minin...

  8. Analysis of magnetic source localization of P300 using the MUSIC (multiple signal classification) algorithm

    OpenAIRE

    魚橋, 哲夫

    2006-01-01

    The authors studied the localization of P300 magnetic sources using the multiple signal classification (MUSIC) algorithm. Six healthy subjects (aged 24–34 years old) were investigated with 148-channel whole-head type magnetencephalography using an auditory oddball paradigm in passive mode. The authors also compared six stimulus combinations in order to find the optimal stimulus parameters for P300 magnetic field (P300m) in passive mode. Bilateral MUSIC peaks were located on the mesial tempora...

  9. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection

    Directory of Open Access Journals (Sweden)

    Wang Haiyan

    2013-01-01

    Full Text Available Abstract Background One of the challenges in classification of cancer tissue samples based on gene expression data is to establish an effective method that can select a parsimonious set of informative genes. The Top Scoring Pair (TSP, k-Top Scoring Pairs (k-TSP, Support Vector Machines (SVM, and prediction analysis of microarrays (PAM are four popular classifiers that have comparable performance on multiple cancer datasets. SVM and PAM tend to use a large number of genes and TSP, k-TSP always use even number of genes. In addition, the selection of distinct gene pairs in k-TSP simply combined the pairs of top ranking genes without considering the fact that the gene set with best discrimination power may not be the combined pairs. The k-TSP algorithm also needs the user to specify an upper bound for the number of gene pairs. Here we introduce a computational algorithm to address the problems. The algorithm is named Chisquare-statistic-based Top Scoring Genes (Chi-TSG classifier simplified as TSG. Results The TSG classifier starts with the top two genes and sequentially adds additional gene into the candidate gene set to perform informative gene selection. The algorithm automatically reports the total number of informative genes selected with cross validation. We provide the algorithm for both binary and multi-class cancer classification. The algorithm was applied to 9 binary and 10 multi-class gene expression datasets involving human cancers. The TSG classifier outperforms TSP family classifiers by a big margin in most of the 19 datasets. In addition to improved accuracy, our classifier shares all the advantages of the TSP family classifiers including easy interpretation, invariant to monotone transformation, often selects a small number of informative genes allowing follow-up studies, resistant to sampling variations due to within sample operations. Conclusions Redefining the scores for gene set and the classification rules in TSP family

  10. Classification of Medical Datasets Using SVMs with Hybrid Evolutionary Algorithms Based on Endocrine-Based Particle Swarm Optimization and Artificial Bee Colony Algorithms.

    Science.gov (United States)

    Lin, Kuan-Cheng; Hsieh, Yi-Hsiu

    2015-10-01

    The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.

  11. Automatic classification of pathological gait patterns using ground reaction forces and machine learning algorithms.

    Science.gov (United States)

    Alaqtash, Murad; Sarkodie-Gyan, Thompson; Yu, Huiying; Fuentes, Olac; Brower, Richard; Abdelgawad, Amr

    2011-01-01

    An automated gait classification method is developed in this study, which can be applied to analysis and to classify pathological gait patterns using 3D ground reaction force (GRFs) data. The study involved the discrimination of gait patterns of healthy, cerebral palsy (CP) and multiple sclerosis subjects. The acquired 3D GRFs data were categorized into three groups. Two different algorithms were used to extract the gait features; the GRFs parameters and the discrete wavelet transform (DWT), respectively. Nearest neighbor classifier (NNC) and artificial neural networks (ANN) were also investigated for the classification of gait features in this study. Furthermore, different feature sets were formed using a combination of the 3D GRFs components (mediolateral, anterioposterior, and vertical) and their various impacts on the acquired results were evaluated. The best leave-one-out (LOO) classification accuracy 85% was achieved. The results showed some improvement through the application of a features selection algorithm based on M-shaped value of vertical force and the statistical test ANOVA of mediolateral and anterioposterior forces. The optimal feature set of six features enhanced the accuracy to 95%. This work can provide an automated gait classification tool that may be useful to the clinician in the diagnosis and identification of pathological gait impairments.

  12. Classification and authentication of unknown water samples using machine learning algorithms.

    Science.gov (United States)

    Kundu, Palash K; Panchariya, P C; Kundu, Madhusree

    2011-07-01

    This paper proposes the development of water sample classification and authentication, in real life which is based on machine learning algorithms. The proposed techniques used experimental measurements from a pulse voltametry method which is based on an electronic tongue (E-tongue) instrumentation system with silver and platinum electrodes. E-tongue include arrays of solid state ion sensors, transducers even of different types, data collectors and data analysis tools, all oriented to the classification of liquid samples and authentication of unknown liquid samples. The time series signal and the corresponding raw data represent the measurement from a multi-sensor system. The E-tongue system, implemented in a laboratory environment for 6 numbers of different ISI (Bureau of Indian standard) certified water samples (Aquafina, Bisleri, Kingfisher, Oasis, Dolphin, and McDowell) was the data source for developing two types of machine learning algorithms like classification and regression. A water data set consisting of 6 numbers of sample classes containing 4402 numbers of features were considered. A PCA (principal component analysis) based classification and authentication tool was developed in this study as the machine learning component of the E-tongue system. A proposed partial least squares (PLS) based classifier, which was dedicated as well; to authenticate a specific category of water sample evolved out as an integral part of the E-tongue instrumentation system. The developed PCA and PLS based E-tongue system emancipated an overall encouraging authentication percentage accuracy with their excellent performances for the aforesaid categories of water samples.

  13. Genetic Algorithm Optimized Back Propagation Neural Network for Knee Osteoarthritis Classification

    Directory of Open Access Journals (Sweden)

    Jian WeiKoh

    2014-10-01

    Full Text Available Osteoarthritis (OA is the most common form of arthritis that caused by degeneration of articular cartilage, which function as shock absorption cushion in our joint. The most common joints that infected by osteoarthritis are hand, hip, spine and knee. Knee osteoarthritis is the focus in this study. These days, Magnetic Resonance Imaging (MRI technique is widely applied in diagnosis the progression of osteoarthritis due to the ability to display the contrast between bone and cartilage. Traditionally, interpretation of MR image is done manually by physicians who are very inconsistent and time consuming. Hence, automated classifier is needed for minimize the processing time of classification. In this study, genetic algorithm optimized neural network technique is used for the knee osteoarthritis classification. This classifier consists of 4 stages, which are feature extraction by Discrete Wavelet Transform (DWT, training stage of neural network, testing stage of neural network and optimization stage by Genetic Algorithm (GA. This technique obtained 98.5% of classification accuracy when training and 94.67% on testing stage. Besides, classification time is reduced by 17.24% after optimization of the neural network.

  14. Evolving Neural Network Using Variable String Genetic Algorithm for Color Infrared Aerial Image Classification

    Institute of Scientific and Technical Information of China (English)

    FU Xiaoyang; P E R Dale; ZHANG Shuqing

    2008-01-01

    Coastal wetlands are characterized by complex patterns both in their geomorphic and ecological features.Besides field observations,it is necessary to analyze the land cover of wetlands through the color infrared (CIR) aerial photography or remote sensing image.In this paper,we designed an evolving neural network classifier using variable string genetic algorithm (VGA) for the land cover classification of CIR aerial image.With the VGA,the classifier that we designed is able to evolve automatically the appropriate number of hidden nodes for modeling the neural network topology optimally and to find a near-optimal set of connection weights globally.Then,with backpropagation algorithm (BP),it can find the best connection weights.The VGA-BP classifier,which is derived from hybrid algorithms mentioned above,is demonstrated on CIR images classification effectively.Compared with standard classifiers,such as Bayes maximum-likelihood classifier,VGA classifier and BP-MLP (multi-layer perception) classifier,it has shown that the VGA-BP classifier can have better performance on highly resolution land cover classification.

  15. Detecting cognitive impairment by eye movement analysis using automatic classification algorithms.

    Science.gov (United States)

    Lagun, Dmitry; Manzanares, Cecelia; Zola, Stuart M; Buffalo, Elizabeth A; Agichtein, Eugene

    2011-09-30

    The Visual Paired Comparison (VPC) task is a recognition memory test that has shown promise for the detection of memory impairments associated with mild cognitive impairment (MCI). Because patients with MCI often progress to Alzheimer's Disease (AD), the VPC may be useful in predicting the onset of AD. VPC uses noninvasive eye tracking to identify how subjects view novel and repeated visual stimuli. Healthy control subjects demonstrate memory for the repeated stimuli by spending more time looking at the novel images, i.e., novelty preference. Here, we report an application of machine learning methods from computer science to improve the accuracy of detecting MCI by modeling eye movement characteristics such as fixations, saccades, and re-fixations during the VPC task. These characteristics are represented as features provided to automatic classification algorithms such as Support Vector Machines (SVMs). Using the SVM classification algorithm, in tandem with modeling the patterns of fixations, saccade orientation, and regression patterns, our algorithm was able to automatically distinguish age-matched normal control subjects from MCI subjects with 87% accuracy, 97% sensitivity and 77% specificity, compared to the best available classification performance of 67% accuracy, 60% sensitivity, and 73% specificity when using only the novelty preference information. These results demonstrate the effectiveness of applying machine-learning techniques to the detection of MCI, and suggest a promising approach for detection of cognitive impairments associated with other disorders.

  16. A Hybrid Multiobjective Differential Evolution Algorithm and Its Application to the Optimization of Grinding and Classification

    Directory of Open Access Journals (Sweden)

    Yalin Wang

    2013-01-01

    Full Text Available The grinding-classification is the prerequisite process for full recovery of the nonrenewable minerals with both production quality and quantity objectives concerned. Its natural formulation is a constrained multiobjective optimization problem of complex expression since the process is composed of one grinding machine and two classification machines. In this paper, a hybrid differential evolution (DE algorithm with multi-population is proposed. Some infeasible solutions with better performance are allowed to be saved, and they participate randomly in the evolution. In order to exploit the meaningful infeasible solutions, a functionally partitioned multi-population mechanism is designed to find an optimal solution from all possible directions. Meanwhile, a simplex method for local search is inserted into the evolution process to enhance the searching strategy in the optimization process. Simulation results from the test of some benchmark problems indicate that the proposed algorithm tends to converge quickly and effectively to the Pareto frontier with better distribution. Finally, the proposed algorithm is applied to solve a multiobjective optimization model of a grinding and classification process. Based on the technique for order performance by similarity to ideal solution (TOPSIS, the satisfactory solution is obtained by using a decision-making method for multiple attributes.

  17. EVALUATION OF SOUND CLASSIFICATION USING MODIFIED CLASSIFIER AND SPEECH ENHANCEMENT USING ICA ALGORITHM FOR HEARING AID APPLICATION

    Directory of Open Access Journals (Sweden)

    N. Shanmugapriya

    2016-03-01

    Full Text Available Hearing aid users are exposed to diversified vocal scenarios. The necessity for sound classification algorithms becomes a vital factor to yield good listening experience. In this work, an approach is proposed to improve the speech quality in the hearing aids based on Independent Component Analysis (ICA algorithm with modified speech signal classification methods. The proposed algorithm has better results on speech intelligibility than other existing algorithm and this result has been proved by the intelligibility experiments. The ICA algorithm and modified Bayesian with Adaptive Neural Fuzzy Interference System (ANFIS is to effectiveness of the strategies of speech quality, thus this classification increases noise resistance of the new speech processing algorithm that proposed in this present work. This proposed work indicates that the new Modified classifier can be feasible in hearing aid applications.

  18. A Novel Co-training Ob ject Tracking Algorithm Based on Online Semi-supervised Boosting%基于在线半监督boosting的协同训练目标跟踪算法

    Institute of Scientific and Technical Information of China (English)

    陈思; 苏松志; 李绍滋; 吕艳萍; 曹冬林

    2014-01-01

    The self-training based discriminative tracking methods use the classification results to update the classifier itself. However, these methods easily suffer from the drifting issue because the classification errors are accumulated during tracking. To overcome the disadvantages of self-training based tracking methods, a novel co-training tracking algorithm, termed Co-SemiBoost, is proposed based on online semi-supervised boosting. The proposed algorithm employs a new online co-training framework, where unlabeled samples are used to collaboratively train the classifiers respectively built on two feature views. Moreover, the pseudo-labels and weights of unlabeled samples are iteratively predicted by combining the decisions of a prior model and an online classifier. The proposed algorithm can effectively improve the discriminative ability of the classifier, and is robust to occlusions, illumination changes, etc. Thus the algorithm can better adapt to object appearance changes. Experimental results on several challenging video sequences show that the proposed algorithm achieves promising tracking performance.%基于自训练的判别式目标跟踪算法使用分类器的预测结果更新分类器自身,容易累积分类错误,从而导致漂移问题。为了克服自训练跟踪算法的不足,该文提出一种基于在线半监督boosting的协同训练目标跟踪算法(简称Co-SemiBoost),其采用一种新的在线协同训练框架,利用未标记样本协同训练两个特征视图中的分类器,同时结合先验模型和在线分类器迭代预测未标记样本的类标记和权重。该算法能够有效提高分类器的判别能力,鲁棒地处理遮挡、光照变化等问题,从而较好地适应目标外观的变化。在若干个视频序列的实验结果表明,该算法具有良好的跟踪性能。

  19. An Up-to-Date Comparison of State-of-the-Art Classification Algorithms

    KAUST Repository

    Zhang, Chongsheng

    2017-04-05

    Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

  20. Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease.

    Science.gov (United States)

    Tsanas, Athanasios; Little, Max A; McSharry, Patrick E; Spielman, Jennifer; Ramig, Lorraine O

    2012-05-01

    There has been considerable recent research into the connection between Parkinson's disease (PD) and speech impairment. Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to predict PD symptom severity using speech signals have been introduced. In this paper, we test how accurately these novel algorithms can be used to discriminate PD subjects from healthy controls. In total, we compute 132 dysphonia measures from sustained vowels. Then, we select four parsimonious subsets of these dysphonia measures using four feature selection algorithms, and map these feature subsets to a binary classification response using two statistical classifiers: random forests and support vector machines. We use an existing database consisting of 263 samples from 43 subjects, and demonstrate that these new dysphonia measures can outperform state-of-the-art results, reaching almost 99% overall classification accuracy using only ten dysphonia features. We find that some of the recently proposed dysphonia measures complement existing algorithms in maximizing the ability of the classifiers to discriminate healthy controls from PD subjects. We see these results as an important step toward noninvasive diagnostic decision support in PD.

  1. Classification of frontal cortex haemodynamic responses during cognitive tasks using wavelet transforms and machine learning algorithms.

    Science.gov (United States)

    Abibullaev, Berdakh; An, Jinung

    2012-12-01

    Recent advances in neuroimaging demonstrate the potential of functional near-infrared spectroscopy (fNIRS) for use in brain-computer interfaces (BCIs). fNIRS uses light in the near-infrared range to measure brain surface haemoglobin concentrations and thus determine human neural activity. Our primary goal in this study is to analyse brain haemodynamic responses for application in a BCI. Specifically, we develop an efficient signal processing algorithm to extract important mental-task-relevant neural features and obtain the best possible classification performance. We recorded brain haemodynamic responses due to frontal cortex brain activity from nine subjects using a 19-channel fNIRS system. Our algorithm is based on continuous wavelet transforms (CWTs) for multi-scale decomposition and a soft thresholding algorithm for de-noising. We adopted three machine learning algorithms and compared their performance. Good performance can be achieved by using the de-noised wavelet coefficients as input features for the classifier. Moreover, the classifier performance varied depending on the type of mother wavelet used for wavelet decomposition. Our quantitative results showed that CWTs can be used efficiently to extract important brain haemodynamic features at multiple frequencies if an appropriate mother wavelet function is chosen. The best classification results were obtained by a specific combination of input feature type and classifier.

  2. 基于半监督的SVM迁移学习文本分类算法%Semi-Supervised Transfer Learning Text Classiifcation Algorithms Based on SVM

    Institute of Scientific and Technical Information of China (English)

    谭建平; 刘波; 肖燕珊

    2016-01-01

    随着互联网的快速发展,文本信息量巨大,大规模的文本处理已经成为一个挑战。文本处理的一个重要技术便是分类,基于SVM的传统文本分类算法已经无法满足快速的文本增长分类。于是如何利用过时的历史文本数据(源任务数据)进行迁移来帮助新产生文本数据进行分类显得异常重要。文章提出了基于半监督的SVM迁移学习算法(Semi-supervised TL_SVM)来对文本进行分类。首先,在半监督SVM的模型中引入迁移学习,构建分类模型。其次,采用交互迭代的方法对目标方程求解,最终得到面向目标领域的分类器。实验验证了基于半监督的SVM迁移学习分类器具有比传统分类器更高的精确度。%With the rapid development of the Internet, texts contain a huge amount of information and the large-scale text processing has become a challenge. An important technical of the text processing is classiifcation, the traditional text categorization algorithm based on SVM has been unable to meet the rapid growth of text classiifcation. So how to utilize the source tasks data to help build a transfer learning classiifer for the target task is especially important. Semi-supervised TL_SVM algorithms is proposed to text classiifcation. First, semi-supervised SVM model combines transfer learning to build the model of classiifcation. Second, we utilize the iterative algorithm to solve the optimization function and obtain the transfer classiifer for the target task. Experiments have shown that our Semi-supervised-based transfer SVM can obtain higher accuracy compared with the traditional method.

  3. Development of a polarimetric radar based hydrometeor classification algorithm for winter precipitation

    Science.gov (United States)

    Thompson, Elizabeth Jennifer

    The nation-wide WSR-88D radar network is currently being upgraded for dual-polarized technology. While many convective, warm-season fuzzy-logic hydrometeor classification algorithms based on this new suite of radar variables and temperature have been refined, less progress has been made thus far in developing hydrometeor classification algorithms for winter precipitation. Unlike previous studies, the focus of this work is to exploit the discriminatory power of polarimetric variables to distinguish the most common precipitation types found in winter storms without the use of temperature as an additional variable. For the first time, detailed electromagnetic scattering of plates, dendrites, dry aggregated snowflakes, rain, freezing rain, and sleet are conducted at X-, C-, and S-band wavelengths. These physics-based results are used to determine the characteristic radar variable ranges associated with each precipitation type. A variable weighting system was also implemented in the algorithm's decision process to capitalize on the strengths of specific dual-polarimetric variables to discriminate between certain classes of hydrometeors, such as wet snow to indicate the melting layer. This algorithm was tested on observations during three different winter storms in Colorado and Oklahoma with the dual-wavelength X- and S-band CSU-CHILL, C-band OU-PRIME, and X-band CASA IP1 polarimetric radars. The algorithm showed success at all three frequencies, but was slightly more reliable at X-band because of the algorithm's strong dependence on KDP. While plates were rarely distinguished from dendrites, the latter were satisfactorily differentiated from dry aggregated snowflakes and wet snow. Sleet and freezing rain could not be distinguished from rain or light rain based on polarimetric variables alone. However, high-resolution radar observations illustrated the refreezing process of raindrops into ice pellets, which has been documented before but not yet explained. Persistent

  4. A BENCHMARK TO SELECT DATA MINING BASED CLASSIFICATION ALGORITHMS FOR BUSINESS INTELLIGENCE AND DECISION SUPPORT SYSTEMS

    Directory of Open Access Journals (Sweden)

    Pardeep Kumar

    2012-09-01

    Full Text Available In today’s business scenario, we percept major changes in how managers use computerized support inmaking decisions. As more number of decision-makers use computerized support in decision making,decision support systems (DSS is developing from its starting as a personal support tool and is becomingthe common resource in an organization. DSS serve the management, operations, and planning levels of anorganization and help to make decisions, which may be rapidly changing and not easily specified inadvance. Data mining has a vital role to extract important information to help in decision making of adecision support system. It has been the active field of research in the last two-three decades. Integration ofdata mining and decision support systems (DSS can lead to the improved performance and can enable thetackling of new types of problems. Artificial Intelligence methods are improving the quality of decisionsupport, and have become embedded in many applications ranges from ant locking automobile brakes tothese days interactive search engines. It provides various machine learning techniques to support datamining. The classification is one of the main and valuable tasks of data mining. Several types ofclassification algorithms have been suggested, tested and compared to determine the future trends based onunseen data. There has been no single algorithm found to be superior over all others for all data sets.Various issues such as predictive accuracy, training time to build the model, robustness and scalabilitymust be considered and can have tradeoffs, further complex the quest for an overall superior method. Theobjective of this paper is to compare various classification algorithms that have been frequently used indata mining for decision support systems. Three decision trees based algorithms, one artificial neuralnetwork, one statistical, one support vector machines with and without adaboost and one clusteringalgorithm are tested and compared on

  5. Grapevine Yield and Leaf Area Estimation Using Supervised Classification Methodology on RGB Images Taken under Field Conditions

    Science.gov (United States)

    Diago, Maria-Paz; Correa, Christian; Millán, Borja; Barreiro, Pilar; Valero, Constantino; Tardaguila, Javier

    2012-01-01

    The aim of this research was to implement a methodology through the generation of a supervised classifier based on the Mahalanobis distance to characterize the grapevine canopy and assess leaf area and yield using RGB images. The method automatically processes sets of images, and calculates the areas (number of pixels) corresponding to seven different classes (Grapes, Wood, Background, and four classes of Leaf, of increasing leaf age). Each one is initialized by the user, who selects a set of representative pixels for every class in order to induce the clustering around them. The proposed methodology was evaluated with 70 grapevine (V. vinifera L. cv. Tempranillo) images, acquired in a commercial vineyard located in La Rioja (Spain), after several defoliation and de-fruiting events on 10 vines, with a conventional RGB camera and no artificial illumination. The segmentation results showed a performance of 92% for leaves and 98% for clusters, and allowed to assess the grapevine’s leaf area and yield with R2 values of 0.81 (p < 0.001) and 0.73 (p = 0.002), respectively. This methodology, which operates with a simple image acquisition setup and guarantees the right number and kind of pixel classes, has shown to be suitable and robust enough to provide valuable information for vineyard management. PMID:23235443

  6. The efficiency of the RULES-4 classification learning algorithm in predicting the density of agents

    Directory of Open Access Journals (Sweden)

    Ziad Salem

    2014-12-01

    Full Text Available Learning is the act of obtaining new or modifying existing knowledge, behaviours, skills or preferences. The ability to learn is found in humans, other organisms and some machines. Learning is always based on some sort of observations or data such as examples, direct experience or instruction. This paper presents a classification algorithm to learn the density of agents in an arena based on the measurements of six proximity sensors of a combined actuator sensor units (CASUs. Rules are presented that were induced by the learning algorithm that was trained with data-sets based on the CASU’s sensor data streams collected during a number of experiments with “Bristlebots (agents in the arena (environment”. It was found that a set of rules generated by the learning algorithm is able to predict the number of bristlebots in the arena based on the CASU’s sensor readings with satisfying accuracy.

  7. Kernel Clustering with a Differential Harmony Search Algorithm for Scheme Classification

    Directory of Open Access Journals (Sweden)

    Yu Feng

    2017-01-01

    Full Text Available This paper presents a kernel fuzzy clustering with a novel differential harmony search algorithm to coordinate with the diversion scheduling scheme classification. First, we employed a self-adaptive solution generation strategy and differential evolution-based population update strategy to improve the classical harmony search. Second, we applied the differential harmony search algorithm to the kernel fuzzy clustering to help the clustering method obtain better solutions. Finally, the combination of the kernel fuzzy clustering and the differential harmony search is applied for water diversion scheduling in East Lake. A comparison of the proposed method with other methods has been carried out. The results show that the kernel clustering with the differential harmony search algorithm has good performance to cooperate with the water diversion scheduling problems.

  8. Clinical significance of bcl-2 protein expression and classification algorithm in diffuse large B-cell lymphoma

    Institute of Scientific and Technical Information of China (English)

    李敏

    2013-01-01

    Objective To investigate the clinical significance of bcl-2 protein expression and three classification algorithms including Hans model,Chan model and Muris model in patients with diffuse large B-cell lymphoma(DLBCL).

  9. Advanced EMI Models and Classification Algorithms: The Next Level of Sophistication to Improve Discrimination of Challenging Targets

    Science.gov (United States)

    2017-01-02

    FINAL REPORT Advanced EMI Models and Classification Algorithms: The Next Level of Sophistication to Improve Discrimination of Challenging... Discrimination of Challenging Targets i January, 2017 Contents Abstract...A global optimization technique .............................................................. 24 Discrimination parameters

  10. AUTOMATIC REMOTE SENSING IMAGE CLASSIFICATION ALGORITHM BASED ONFCM AND BP NEURAL NETWORK%基于模糊C均值和BP神经网络的遥感影像自动分类算法

    Institute of Scientific and Technical Information of China (English)

    黄奇瑞

    2015-01-01

    针对非监督分类算法分类精度不高、监督法分类算法的训练样本需要人工选择且容易误选的问题,提出了一种基于模糊C均值聚类( FCM)和BP神经网络相结合的遥感影像自动分类算法. 首先利用FCM对影像进行初始聚类,然后根据聚类结果,由该算法自动选取其中的纯净像元作为训练样本,并送入BP网络进行学习,用最终训练得到的BP神经网络分类器对TM遥感影像进行分类,实验结果表明该算法具有较高的分类精度,能够满足大尺度地物类别判定的需要.%As for the problems that low classification accuracy of non-supervise classification algorithm and training sample of super-vise classification algorithm needs manual selection which is easy to be made wrongly, there is an automatic classfication algorithm of remote sensing image which is based on the combination of FCM and BP neural network. First, this paper uses FCM to make initial clusters of images. Then in accordance with the results of clusters, this paper picks out the endmembers which are automatically select-ed by the algorithm as the traaning samples, sends the samples to study in BP network and uses the BP neural network classifier which is got from the final training to classify the TM remote sensing images. The result shows that the algorithm owns high accuracy which could meet the requirements of determination of object types in a large scale.

  11. Classification decision tree algorithm assisting in diagnosing solitary pulmonary nodule by SPECT/CT fusion imaging

    Institute of Scientific and Technical Information of China (English)

    Qiang Yongqian; Guo Youmin; Jin Chenwang; Liu Min; Yang Aimin; Wang Qiuping; Niu Gang

    2008-01-01

    Objective To develop a classification tree algorithm to improve diagnostic performances of 99mTc-MIBI SPECT/CT fusion imaging in differentiating solitary pulmonary nodules (SPNs). Methods Forty-four SPNs, including 30 malignant cases and 14 benign ones that were eventually pathologically identified, were included in this prospective study. All patients received 99Tcm-MIBI SPECT/CT scanning at an early stage and a delayed stage before operation. Thirty predictor variables, including 11 clinical variables, 4 variables of emission and 15 variables of transmission information from SPECT/CT scanning, were analyzed independently by the classification tree algorithm and radiological residents. Diagnostic rules were demonstrated in tree-topology, and diagnostic performances were compared with Area under Curve (AUC) of Receiver Operating Characteristic Curve (ROC). Results A classification decision tree with lowest relative cost of 0.340 was developed for 99Tcm-MIBI SPECT/CT scanning in which the value of Target/Normal region of 99Tcm-MIBI uptake in the delayed stage and in the early stage, age, cough and specula sign were five most important contributors. The sensitivity and specificity were 93.33% and 78. 57e, respectively, a little higher than those of the expert. The sensitivity and specificity by residents of Grade one were 76.67% and 28.57%, respectively, and AUC of CART and expert was 0.886±0.055 and 0.829±0.062, respectively, and the corresponding AUC of residents was 0.566±0.092. Comparisons of AUCs suggest that performance of CART was similar to that of expert (P=0.204), but greater than that of residents (P<0.001). Conclusion Our data mining technique using classification decision tree has a much higher accuracy than residents. It suggests that the application of this algorithm will significantly improve the diagnostic performance of residents.

  12. Tree Species Abundance Predictions in a Tropical Agricultural Landscape with a Supervised Classification Model and Imbalanced Data

    Directory of Open Access Journals (Sweden)

    Sarah J. Graves

    2016-02-01

    Full Text Available Mapping species through classification of imaging spectroscopy data is facilitating research to understand tree species distributions at increasingly greater spatial scales. Classification requires a dataset of field observations matched to the image, which will often reflect natural species distributions, resulting in an imbalanced dataset with many samples for common species and few samples for less common species. Despite the high prevalence of imbalanced datasets in multiclass species predictions, the effect on species prediction accuracy and landscape species abundance has not yet been quantified. First, we trained and assessed the accuracy of a support vector machine (SVM model with a highly imbalanced dataset of 20 tropical species and one mixed-species class of 24 species identified in a hyperspectral image mosaic (350–2500 nm of Panamanian farmland and secondary forest fragments. The model, with an overall accuracy of 62% ± 2.3% and F-score of 59% ± 2.7%, was applied to the full image mosaic (23,000 ha at a 2-m resolution to produce a species prediction map, which suggested that this tropical agricultural landscape is more diverse than what has been presented in field-based studies. Second, we quantified the effect of class imbalance on model accuracy. Model assessment showed a trend where species with more samples were consistently over predicted while species with fewer samples were under predicted. Standardizing sample size reduced model accuracy, but also reduced the level of species over- and under-prediction. This study advances operational species mapping of diverse tropical landscapes by detailing the effect of imbalanced data on classification accuracy and providing estimates of tree species abundance in an agricultural landscape. Species maps using data and methods presented here can be used in landscape analyses of species distributions to understand human or environmental effects, in addition to focusing conservation

  13. Improved SMO Text Classification Algorithm%改进的SMO文本分类算法

    Institute of Scientific and Technical Information of China (English)

    王欣欣; 赖惠成

    2011-01-01

    The support vector maehine(SVM) text classification algorithm is widely applied, and its special case is the sequence of minimum optimization(SMO) algorithm. SMO algorithm, with blocks and decomposition technology, is simple and easy to implement but slow in convergence, and has many iterative times. The solution for this is to improve the selection algorithm in the working set of SMO algorithm, and update the step factor, thus to make objective function decline as much as possible. With this goal, the improved SMO algorithm is proposed, thus to further improve the SVM training speed and the classification accuracy.%支持向量机(SVM)的文本分类算法被广泛应用,其中序列最小优化算法(SMO)是它的一个特例。SMO算法使用了块与分解技术,简单并且容易实现,但是它的收敛较慢,迭代次数较多。解决的办法是改进SMO算法中工作集的选择算法,并更新步长因子,目的是为了使目标函数尽可能地下降。文中基于这个目标提出了改进的SMO算法来进一步提高SVM的训练速度和分类的准确程度。

  14. Classification of JERS-1 Image Mosaic of Central Africa Using A Supervised Multiscale Classifier of Texture Features

    Science.gov (United States)

    Saatchi, Sassan; DeGrandi, Franco; Simard, Marc; Podest, Erika

    1999-01-01

    In this paper, a multiscale approach is introduced to classify the Japanese Research Satellite-1 (JERS-1) mosaic image over the Central African rainforest. A series of texture maps are generated from the 100 m mosaic image at various scales. Using a quadtree model and relating classes at each scale by a Markovian relationship, the multiscale images are classified from course to finer scale. The results are verified at various scales and the evolution of classification is monitored by calculating the error at each stage.

  15. Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.

    Science.gov (United States)

    Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

    2016-01-01

    Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.

  16. Supervised Classification of Agricultural Land Cover Using a Modified k-NN Technique (MNN and Landsat Remote Sensing Imagery

    Directory of Open Access Journals (Sweden)

    Karsten Schulz

    2009-11-01

    Full Text Available Nearest neighbor techniques are commonly used in remote sensing, pattern recognition and statistics to classify objects into a predefined number of categories based on a given set of predictors. These techniques are especially useful for highly nonlinear relationship between the variables. In most studies the distance measure is adopted a priori. In contrast we propose a general procedure to find an adaptive metric that combines a local variance reducing technique and a linear embedding of the observation space into an appropriate Euclidean space. To illustrate the application of this technique, two agricultural land cover classifications using mono-temporal and multi-temporal Landsat scenes are presented. The results of the study, compared with standard approaches used in remote sensing such as maximum likelihood (ML or k-Nearest Neighbor (k-NN indicate substantial improvement with regard to the overall accuracy and the cardinality of the calibration data set. Also, using MNN in a soft/fuzzy classification framework demonstrated to be a very useful tool in order to derive critical areas that need some further attention and investment concerning additional calibration data.

  17. Classification of traumatic brain injury severity using informed data reduction in a series of binary classifier algorithms.

    Science.gov (United States)

    Prichep, Leslie S; Jacquin, Arnaud; Filipenko, Julie; Dastidar, Samanwoy Ghosh; Zabele, Stephen; Vodencarević, Asmir; Rothman, Neil S

    2012-11-01

    Assessment of medical disorders is often aided by objective diagnostic tests which can lead to early intervention and appropriate treatment. In the case of brain dysfunction caused by head injury, there is an urgent need for quantitative evaluation methods to aid in acute triage of those subjects who have sustained traumatic brain injury (TBI). Current clinical tools to detect mild TBI (mTBI/concussion) are limited to subjective reports of symptoms and short neurocognitive batteries, offering little objective evidence for clinical decisions; or computed tomography (CT) scans, with radiation-risk, that are most often negative in mTBI. This paper describes a novel methodology for the development of algorithms to provide multi-class classification in a substantial population of brain injured subjects, across a broad age range and representative subpopulations. The method is based on age-regressed quantitative features (linear and nonlinear) extracted from brain electrical activity recorded from a limited montage of scalp electrodes. These features are used as input to a unique "informed data reduction" method, maximizing confidence of prospective validation and minimizing over-fitting. A training set for supervised learning was used, including: "normal control," "concussed," and "structural injury/CT positive (CT+)." The classifier function separating CT+ from the other groups demonstrated a sensitivity of 96% and specificity of 78%; the classifier separating "normal controls" from the other groups demonstrated a sensitivity of 81% and specificity of 74%, suggesting high utility of such classifiers in acute clinical settings. The use of a sequence of classifiers where the desired risk can be stratified further supports clinical utility.

  18. SOMOTE_EASY: AN ALGORITHM TO TREAT THE CLASSIFICATION ISSUE IN REAL DATABASES

    Directory of Open Access Journals (Sweden)

    Hugo Leonardo Pereira Rufino

    2016-04-01

    Full Text Available Most classification tools assume that data distribution be balanced or with similar costs, when not properly classified. Nevertheless, in practical terms, the existence of database where unbalanced classes occur is commonplace, such as in the diagnosis of diseases, in which the confirmed cases are usually rare when compared with a healthy population. Other examples are the detection of fraudulent calls and the detection of system intruders. In these cases, the improper classification of a minority class (for instance, to diagnose a person with cancer as healthy may result in more serious consequences that incorrectly classify a majority class. Therefore, it is important to treat the database where unbalanced classes occur. This paper presents the SMOTE_Easy algorithm, which can classify data, even if there is a high level of unbalancing between different classes. In order to prove its efficiency, a comparison with the main algorithms to treat classification issues was made, where unbalanced data exist. This process was successful in nearly all tested databases

  19. Ensemble learning algorithms for classification of mtDNA into haplogroups.

    Science.gov (United States)

    Wong, Carol; Li, Yuran; Lee, Chih; Huang, Chun-Hsi

    2011-01-01

    Classification of mitochondrial DNA (mtDNA) into their respective haplogroups allows the addressing of various anthropologic and forensic issues. Unique to mtDNA is its abundance and non-recombining uni-parental mode of inheritance; consequently, mutations are the only changes observed in the genetic material. These individual mutations are classified into their cladistic haplogroups allowing the tracing of different genetic branch points in human (and other organisms) evolution. Due to the large number of samples, it becomes necessary to automate the classification process. Using 5-fold cross-validation, we investigated two classification techniques on the consented database of 21 141 samples published by the Genographic project. The support vector machines (SVM) algorithm achieved a macro-accuracy of 88.06% and micro-accuracy of 96.59%, while the random forest (RF) algorithm achieved a macro-accuracy of 87.35% and micro-accuracy of 96.19%. In addition to being faster and more memory-economic in making predictions, SVM and RF are better than or comparable to the nearest-neighbor method employed by the Genographic project in terms of prediction accuracy.

  20. Optimal Combination of Classification Algorithms and Feature Ranking Methods for Object-Based Classification of Submeter Resolution Z/I-Imaging DMC Imagery

    OpenAIRE

    Fulgencio Cánovas-García; Francisco Alonso-Sarría

    2015-01-01

    Object-based image analysis allows several different features to be calculated for the resulting objects. However, a large number of features means longer computing times and might even result in a loss of classification accuracy. In this study, we use four feature ranking methods (maximum correlation, average correlation, Jeffries–Matusita distance and mean decrease in the Gini index) and five classification algorithms (linear discriminant analysis, naive Bayes, weighted k-nearest neighbors,...

  1. A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment

    Directory of Open Access Journals (Sweden)

    Polz Martin F

    2009-05-01

    Full Text Available Abstract Background Cyanobacteria of the genera Synechococcus and Prochlorococcus play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Recently, genes encoding the photosystem II reaction center (psbA and psbD were found in cyanophage genomes. This phenomenon suggested that the horizontal transfer of these genes may be involved in increasing phage fitness. To date, a very small percentage of marine bacteria and phages has been cultured. Thus, mapping genomic data extracted directly from the environment to its taxonomic origin is necessary for a better understanding of phage-host relationships and dynamics. Results To achieve an accurate and rapid taxonomic classification, we employed a computational approach combining a multi-class Support Vector Machine (SVM with a codon usage position specific scoring matrix (cuPSSM. Our method has been applied successfully to classify core-photosystem-II gene fragments, including partial sequences coming directly from the ocean, to seven different taxonomic classes. Applying the method on a large set of DNA and RNA psbA clones from the Mediterranean Sea, we studied the distribution of cyanobacterial psbA genes and transcripts in their natural environment. Using our approach, we were able to simultaneously examine taxonomic and ecological distributions in the marine environment. Conclusion The ability to accurately classify the origin of individual genes and transcripts coming directly from the environment is of great importance in studying marine ecology. The classification method presented in this paper could be applied further to classify other genes amplified from the environment, for which training data is available.

  2. A novel hybrid classification model of genetic algorithms, modified k-Nearest Neighbor and developed backpropagation neural network.

    Science.gov (United States)

    Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy

    2014-01-01

    Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the

  3. Improved Algorithms for the Classification of Rough Rice Using a Bionic Electronic Nose Based on PCA and the Wilks Distribution

    Directory of Open Access Journals (Sweden)

    Sai Xu

    2014-03-01

    Full Text Available Principal Component Analysis (PCA is one of the main methods used for electronic nose pattern recognition. However, poor classification performance is common in classification and recognition when using regular PCA. This paper aims to improve the classification performance of regular PCA based on the existing Wilks ?-statistic (i.e., combined PCA with the Wilks distribution. The improved algorithms, which combine regular PCA with the Wilks ?-statistic, were developed after analysing the functionality and defects of PCA. Verification tests were conducted using a PEN3 electronic nose. The collected samples consisted of the volatiles of six varieties of rough rice (Zhongxiang1, Xiangwan13, Yaopingxiang, WufengyouT025, Pin 36, and Youyou122, grown in same area and season. The first two principal components used as analysis vectors cannot perform the rough rice varieties classification task based on a regular PCA. Using the improved algorithms, which combine the regular PCA with the Wilks ?-statistic, many different principal components were selected as analysis vectors. The set of data points of the Mahalanobis distance between each of the varieties of rough rice was selected to estimate the performance of the classification. The result illustrates that the rough rice varieties classification task is achieved well using the improved algorithm. A Probabilistic Neural Networks (PNN was also established to test the effectiveness of the improved algorithms. The first two principal components (namely PC1 and PC2 and the first and fifth principal component (namely PC1 and PC5 were selected as the inputs of PNN for the classification of the six rough rice varieties. The results indicate that the classification accuracy based on the improved algorithm was improved by 6.67% compared to the results of the regular method. These results prove the effectiveness of using the Wilks ?-statistic to improve the classification accuracy of the regular PCA approach. The

  4. Feasibility of Genetic Algorithm for Textile Defect Classification Using Neural Network

    Directory of Open Access Journals (Sweden)

    Md. Tarek Habib

    2012-07-01

    Full Text Available The global market for textile industry is highly competitive nowadays. Quality control in production process in textile industry has been a key factor for retaining existence in such competitive market. Automated textile inspection systems are very useful in this respect, because manual inspection is time consuming and not accurate enough. Hence, automated textile inspection systems have been drawing plenty of attention of the researchers of different countries in order to replace manual inspection. Defect detection and defect classification are the two major problems that are posed by the research of automated textile inspection systems. In this paper, we perform an extensive investigation on the applicability of genetic algorithm (GA in the context of textile defect classification using neural network (NN. We observe the effect of tuning different network parameters and explain the reasons. We empirically find a suitable NN model in the context of textile defect classification. We compare the performance of this model with that of the classification models implemented by others.

  5. Feature Selection and Classification of Electroencephalographic Signals: An Artificial Neural Network and Genetic Algorithm Based Approach.

    Science.gov (United States)

    Erguzel, Turker Tekin; Ozekes, Serhat; Tan, Oguz; Gultekin, Selahattin

    2015-10-01

    Feature selection is an important step in many pattern recognition systems aiming to overcome the so-called curse of dimensionality. In this study, an optimized classification method was tested in 147 patients with major depressive disorder (MDD) treated with repetitive transcranial magnetic stimulation (rTMS). The performance of the combination of a genetic algorithm (GA) and a back-propagation (BP) neural network (BPNN) was evaluated using 6-channel pre-rTMS electroencephalographic (EEG) patterns of theta and delta frequency bands. The GA was first used to eliminate the redundant and less discriminant features to maximize classification performance. The BPNN was then applied to test the performance of the feature subset. Finally, classification performance using the subset was evaluated using 6-fold cross-validation. Although the slow bands of the frontal electrodes are widely used to collect EEG data for patients with MDD and provide quite satisfactory classification results, the outcomes of the proposed approach indicate noticeably increased overall accuracy of 89.12% and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.904 using the reduced feature set.

  6. A novel algorithm for ventricular arrhythmia classification using a fuzzy logic approach.

    Science.gov (United States)

    Weixin, Nong

    2016-12-01

    In the present study, it has been shown that an unnecessary implantable cardioverter-defibrillator (ICD) shock is often delivered to patients with an ambiguous ECG rhythm in the overlap zone between ventricular tachycardia (VT) and ventricular fibrillation (VF); these shocks significantly increase mortality. Therefore, accurate classification of the arrhythmia into VT, organized VF (OVF) or disorganized VF (DVF) is crucial to assist ICDs to deliver appropriate therapy. A classification algorithm using a fuzzy logic classifier was developed for accurately classifying the arrhythmias into VT, OVF or DVF. Compared with other studies, our method aims to combine ten ECG detectors that are calculated in the time domain and the frequency domain in addition to different levels of complexity for detecting subtle structure differences between VT, OVF and DVF. The classification in the overlap zone between VT and VF is refined by this study to avoid ambiguous identification. The present method was trained and tested using public ECG signal databases. A two-level classification was performed to first detect VT with an accuracy of 92.6 %, and then the discrimination between OVF and DVF was detected with an accuracy of 84.5 %. The validation results indicate that the proposed method has superior performance in identifying the organization level between the three types of arrhythmias (VT, OVF and DVF) and is promising for improving the appropriate therapy choice and decreasing the possibility of sudden cardiac death.

  7. Spectral areas and ratios classifier algorithm for pancreatic tissue classification using optical spectroscopy.

    Science.gov (United States)

    Chandra, Malavika; Scheiman, James; Simeone, Diane; McKenna, Barbara; Purdy, Julianne; Mycek, Mary-Ann

    2010-01-01

    Pancreatic adenocarcinoma is one of the leading causes of cancer death, in part because of the inability of current diagnostic methods to reliably detect early-stage disease. We present the first assessment of the diagnostic accuracy of algorithms developed for pancreatic tissue classification using data from fiber optic probe-based bimodal optical spectroscopy, a real-time approach that would be compatible with minimally invasive diagnostic procedures for early cancer detection in the pancreas. A total of 96 fluorescence and 96 reflectance spectra are considered from 50 freshly excised tissue sites-including human pancreatic adenocarcinoma, chronic pancreatitis (inflammation), and normal tissues-on nine patients. Classification algorithms using linear discriminant analysis are developed to distinguish among tissues, and leave-one-out cross-validation is employed to assess the classifiers' performance. The spectral areas and ratios classifier (SpARC) algorithm employs a combination of reflectance and fluorescence data and has the best performance, with sensitivity, specificity, negative predictive value, and positive predictive value for correctly identifying adenocarcinoma being 85, 89, 92, and 80%, respectively.

  8. Hybrid Ant Bee Algorithm for Fuzzy Expert System Based Sample Classification.

    Science.gov (United States)

    GaneshKumar, Pugalendhi; Rani, Chellasamy; Devaraj, Durairaj; Victoire, T Aruldoss Albert

    2014-01-01

    Accuracy maximization and complexity minimization are the two main goals of a fuzzy expert system based microarray data classification. Our previous Genetic Swarm Algorithm (GSA) approach has improved the classification accuracy of the fuzzy expert system at the cost of their interpretability. The if-then rules produced by the GSA are lengthy and complex which is difficult for the physician to understand. To address this interpretability-accuracy tradeoff, the rule set is represented using integer numbers and the task of rule generation is treated as a combinatorial optimization task. Ant colony optimization (ACO) with local and global pheromone updations are applied to find out the fuzzy partition based on the gene expression values for generating simpler rule set. In order to address the formless and continuous expression values of a gene, this paper employs artificial bee colony (ABC) algorithm to evolve the points of membership function. Mutual Information is used for idenfication of informative genes. The performance of the proposed hybrid Ant Bee Algorithm (ABA) is evaluated using six gene expression data sets. From the simulation study, it is found that the proposed approach generated an accurate fuzzy system with highly interpretable and compact rules for all the data sets when compared with other approaches.

  9. Spectral areas and ratios classifier algorithm for pancreatic tissue classification using optical spectroscopy

    Science.gov (United States)

    Chandra, Malavika; Scheiman, James; Simeone, Diane; McKenna, Barbara; Purdy, Julianne; Mycek, Mary-Ann

    2010-01-01

    Pancreatic adenocarcinoma is one of the leading causes of cancer death, in part because of the inability of current diagnostic methods to reliably detect early-stage disease. We present the first assessment of the diagnostic accuracy of algorithms developed for pancreatic tissue classification using data from fiber optic probe-based bimodal optical spectroscopy, a real-time approach that would be compatible with minimally invasive diagnostic procedures for early cancer detection in the pancreas. A total of 96 fluorescence and 96 reflectance spectra are considered from 50 freshly excised tissue sites-including human pancreatic adenocarcinoma, chronic pancreatitis (inflammation), and normal tissues-on nine patients. Classification algorithms using linear discriminant analysis are developed to distinguish among tissues, and leave-one-out cross-validation is employed to assess the classifiers' performance. The spectral areas and ratios classifier (SpARC) algorithm employs a combination of reflectance and fluorescence data and has the best performance, with sensitivity, specificity, negative predictive value, and positive predictive value for correctly identifying adenocarcinoma being 85, 89, 92, and 80%, respectively.

  10. A Hybrid Constrained Semi-Supervised Clustering Algorithm%一种混合约束的半监督聚类算法

    Institute of Scientific and Technical Information of China (English)

    李雪梅; 王立宏; 宋宜斌

    2011-01-01

    提出一种混合约束的半监督聚类算法(HCC),综合考虑标号点和成对点约束信息的作用,使两种先验信息在聚类的过程中能以不同的方式发挥作用.给出理论推导、具体算法步骤、实验及分析.实验表明在HCC算法中,标号点对提高聚类结果的作用要比成对点约束信息的作用更明显,算法得到的CRI、聚类数、运行时间等多项指标都比对比算法好.%A hybrid constrained semi-supervised clustering algorithm (HCC) is proposed based on consistency algorithm. To get a better clustering result, both labeled data and pairwise constraints are considered in clustering to make use of two types of prior knowledge supplementary to each other. The theoretical derivation and the algorithm are presented in detail. Experimental results show that labeled data outperform pairwise constraints in promoting the quality of clustering. Additionally, for many indices, such as CRI, number of clusters and running time, HCC is better than comparative algorithms.

  11. Multi-objective evolutionary algorithms for fuzzy classification in survival prediction.

    Science.gov (United States)

    Jiménez, Fernando; Sánchez, Gracia; Juárez, José M

    2014-03-01

    This paper presents a novel rule-based fuzzy classification methodology for survival/mortality prediction in severe burnt patients. Due to the ethical aspects involved in this medical scenario, physicians tend not to accept a computer-based evaluation unless they understand why and how such a recommendation is given. Therefore, any fuzzy classifier model must be both accurate and interpretable. The proposed methodology is a three-step process: (1) multi-objective constrained optimization of a patient's data set, using Pareto-based elitist multi-objective evolutionary algorithms to maximize accuracy and minimize the complexity (number of rules) of classifiers, subject to interpretability constraints; this step produces a set of alternative (Pareto) classifiers; (2) linguistic labeling, which assigns a linguistic label to each fuzzy set of the classifiers; this step is essential to the interpretability of the classifiers; (3) decision making, whereby a classifier is chosen, if it is satisfactory, according to the preferences of the decision maker. If no classifier is satisfactory for the decision maker, the process starts again in step (1) with a different input parameter set. The performance of three multi-objective evolutionary algorithms, niched pre-selection multi-objective algorithm, elitist Pareto-based multi-objective evolutionary algorithm for diversity reinforcement (ENORA) and the non-dominated sorting genetic algorithm (NSGA-II), was tested using a patient's data set from an intensive care burn unit and a standard machine learning data set from an standard machine learning repository. The results are compared using the hypervolume multi-objective metric. Besides, the results have been compared with other non-evolutionary techniques and validated with a multi-objective cross-validation technique. Our proposal improves the classification rate obtained by other non-evolutionary techniques (decision trees, artificial neural networks, Naive Bayes, and case

  12. Classification accuracy of algorithms for blood chemistry data for three aquaculture-affected marine fish species.

    Science.gov (United States)

    Coz-Rakovac, R; Topic Popovic, N; Smuc, T; Strunjak-Perovic, I; Jadan, M

    2009-11-01

    The objective of this study was determination and discrimination of biochemical data among three aquaculture-affected marine fish species (sea bass, Dicentrarchus labrax; sea bream, Sparus aurata L., and mullet, Mugil spp.) based on machine-learning methods. The approach relying on machine-learning methods gives more usable classification solutions and provides better insight into the collected data. So far, these new methods have been applied to the problem of discrimination of blood chemistry data with respect to season and feed of a single species. This is the first time these classification algorithms have been used as a framework for rapid differentiation among three fish species. Among the machine-learning methods used, decision trees provided the clearest model, which correctly classified 210 samples or 85.71%, and incorrectly classified 35 samples or 14.29% and clearly identified three investigated species from their biochemical traits.

  13. Generalization performance of graph-based semisupervised classification

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    Semi-supervised learning has been of growing interest over the past few years and many methods have been proposed. Although various algorithms are provided to implement semi-supervised learning,there are still gaps in our understanding of the dependence of generalization error on the numbers of labeled and unlabeled data. In this paper,we consider a graph-based semi-supervised classification algorithm and establish its generalization error bounds. Our results show the close relations between the generalization performance and the structural invariants of data graph.

  14. Supervised Multi-Manifold Learning Algorithm Based on ISOMAP%基于等距映射的监督多流形学习算法

    Institute of Scientific and Technical Information of China (English)

    邵超; 万春红

    2014-01-01

    The existing supervised multi-manifold learning algorithms adjust the distances between data points according to their class labels, and hence the multiple manifolds can be classified successfully. However, the poor generalization ability of these algorithms results in unfaithful display of the intrinsic geometric structure of some manifolds. A supervised multi-manifold learning algorithm based on Isometric mapping ( ISOMAP) is proposed. The shortest path algorithm suitable for the multi-manifold structure is used to compute the shortest path distances which can effectively approximate the corresponding geodesic distances even in the multi-manifold structure. Then, Sammon mapping is used to further preserve shorter distances in the low-dimensional embedding space. Consequently, the intrinsic geometric structure of each manifold can be faithfully displayed. Moreover, the manifolds of new data points can be precisely judged based on the similarities between neighboring local tangent spaces according to the local Euclidean nature of the manifold, and thus the proposed algorithm obtains a good generalization ability. The effectiveness of the proposed algorithm is verified by experimental results.%目前的监督多流形学习算法大多数都根据数据的类别标记对彼此间的距离进行调整,能较好实现多流形的分类,但难以成功展现各流形的内在几何结构,泛化能力也较差,因此文中提出一种基于等距映射的监督多流形学习算法。该算法采用适合于多流形的最短路径算法,得到在多流形下依然能正确逼近相应测地距离的最短路径距离,并采用Sammon映射以更好地保持短距离,最终可成功展现各流形的内在几何结构。此外,该算法根据邻近局部切空间的相似性可准确判定新数据点所在的流形,从而具有较强的泛化能力。该算法的有效性可通过实验结果得以证实。

  15. Text Classification using Artificial Intelligence

    CERN Document Server

    Kamruzzaman, S M

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms for classifying text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using artificial intelligence technique that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of na\\"ive Bayes classifier is then used on derived features and finally only a single concept of genetic algorithm has been added for final classification. A syste...

  16. Text Classification using Data Mining

    CERN Document Server

    Kamruzzaman, S M; Hasan, Ahmed Ryadh

    2010-01-01

    Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using data mining that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of Naive Bayes classifier is then used on derived features and finally only a single concept of Genetic Algorithm has been added for final classification. A system based on the...

  17. Gastric Cancer Risk Analysis in Unhealthy Habits Data with Classification Algorithms

    OpenAIRE

    2015-01-01

    Data mining methods are applied to a medical task that seeks for the information about the influence of Helicobacter Pylori on the gastric cancer risk increase by analysing the adverse factors of individual lifestyle. In the process of data pre-processing, the data are cleared of noise and other factors, reduced in dimensionality, as well as transformed for the task and cleared of non-informative attributes. Data classification using C4.5, CN2 and k-nearest neighbour algorithms is carried out...

  18. Classification decision tree algorithm assisting in diagnosing solitary pulmonary nodule by SPECT/CT fusion imaging

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Objective To develop a classification tree algorithm to improve diagnostic performances of 99mTc-MIBI SPECT/CT fusion imaging in differentiating solitary pulmonary nodules(SPNs).Methods Forty-four SPNs,including 30 malignant cases and 14 benign ones that were eventually pathologically identified,were included in this prospective study.All patients received 99Tcm-MIBI SPECT/CT scanning at an early stage and a delayed stage before operation.Thirty predictor variables,including 11 clinical variables,4 variable...

  19. PMSVM: An Optimized Support Vector Machine Classification Algorithm Based on PCA and Multilevel Grid Search Methods

    Directory of Open Access Journals (Sweden)

    Yukai Yao

    2015-01-01

    Full Text Available We propose an optimized Support Vector Machine classifier, named PMSVM, in which System Normalization, PCA, and Multilevel Grid Search methods are comprehensively considered for data preprocessing and parameters optimization, respectively. The main goals of this study are to improve the classification efficiency and accuracy of SVM. Sensitivity, Specificity, Precision, and ROC curve, and so forth, are adopted to appraise the performances of PMSVM. Experimental results show that PMSVM has relatively better accuracy and remarkable higher efficiency compared with traditional SVM algorithms.

  20. Classification

    Data.gov (United States)

    National Aeronautics and Space Administration — A supervised learning task involves constructing a mapping from an input data space (normally described by several features) to an output space. A set of training...

  1. Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

    Energy Technology Data Exchange (ETDEWEB)

    Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.; Pounds, Joel G.

    2013-09-01

    This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the set of predictors to around 8 factors that can be validated using reputable medical and public health resources.

  2. Code Syntax-Comparison Algorithm Based on Type-Redefinition-Preprocessing and Rehash Classification

    Directory of Open Access Journals (Sweden)

    Baojiang Cui

    2011-08-01

    Full Text Available The code comparison technology plays an important role in the fields of software security protection and plagiarism detection. Nowadays, there are mainly FIVE approaches of plagiarism detection, file-attribute-based, text-based, token-based, syntax-based and semantic-based. The prior three approaches have their own limitations, while the technique based on syntax has its shortage of detection ability and low efficiency that all of these approaches cannot meet the requirements on large-scale software plagiarism detection. Based on our prior research, we propose an algorithm on type redefinition plagiarism detection, which could detect the level of simple type redefinition, repeating pattern redefinition, and the redefinition of type with pointer. Besides, this paper also proposes a code syntax-comparison algorithm based on rehash classification, which enhances the node storage structure of the syntax tree, and greatly improves the efficiency.

  3. A classification system based on a new wrapper feature selection algorithm for the diagnosis of primary and secondary polycythemia.

    Science.gov (United States)

    Korfiatis, Vasileios Ch; Asvestas, Pantelis A; Delibasis, Konstantinos K; Matsopoulos, George K

    2013-12-01

    Primary and Secondary Polycythemia are diseases of the bone marrow that affect the blood's composition and prohibit patients from becoming blood donors. Since these diseases may become fatal, their early diagnosis is important. In this paper, a classification system for the diagnosis of Primary and Secondary Polycythemia is proposed. The proposed system classifies input data into three classes; Healthy, Primary Polycythemic (PP) and Secondary Polycythemic (SP) and is implemented using two separate binary classification levels. The first level performs the Healthy/non-Healthy classification and the second level the PP/SP classification. To this end, a novel wrapper feature selection algorithm, called the LM-FM algorithm, is presented in order to maximize the classifier's performance. The algorithm is comprised of two stages that are applied sequentially: the Local Maximization (LM) stage and the Floating Maximization (FM) stage. The LM stage finds the best possible subset of a fixed predefined size, which is then used as an input for the next stage. The FM stage uses a floating size technique to search for an even better solution by varying the initially provided subset size. Then, the Support Vector Machine (SVM) classifier is used for the discrimination of the data at each classification level. The proposed classification system is compared with various well-established feature selection techniques such as the Sequential Floating Forward Selection (SFFS) and the Maximum Output Information (MOI) wrapper schemes, and with standalone classification techniques such as the Multilayer Perceptron (MLP) and SVM classifier. The proposed LM-FM feature selection algorithm combined with the SVM classifier increases the overall performance of the classification system, scoring up to 98.9% overall accuracy at the first classification level and up to 96.6% at the second classification level. Moreover, it provides excellent robustness regardless of the size of the input feature

  4. Development of a thresholding algorithm for calcium classification at multiple CT energies

    Science.gov (United States)

    Ng, LY.; Alssabbagh, M.; Tajuddin, A. A.; Shuaib, I. L.; Zainon, R.

    2017-05-01

    The objective of this study was to develop a thresholding method for calcium classification with different concentration using single-energy computed tomography. Five different concentrations of calcium chloride were filled in PMMA tubes and placed inside a water-filled PMMA phantom (diameter 10 cm). The phantom was scanned at 70, 80, 100, 120 and 140 kV using a SECT. CARE DOSE 4D was used and the slice thickness was set to 1 mm for all energies. ImageJ software inspired by the National Institute of Health (NIH) was used to measure the CT numbers for each calcium concentration from the CT images. The results were compared with a developed algorithm for verification. The percentage differences between the measured CT numbers obtained from the developed algorithm and the ImageJ show similar results. The multi-thresholding algorithm was found to be able to distinguish different concentrations of calcium chloride. However, it was unable to detect low concentrations of calcium chloride and iron (III) nitrate with CT numbers between 25 HU and 65 HU. The developed thresholding method used in this study may help to differentiate between calcium plaques and other types of plaques in blood vessels as it is proven to have a good ability to detect the high concentration of the calcium chloride. However, the algorithm needs to be improved to solve the limitations of detecting calcium chloride solution which has a similar CT number with iron (III) nitrate solution.

  5. Hip and Wrist Accelerometer Algorithms for Free-Living Behavior Classification.

    Science.gov (United States)

    Ellis, Katherine; Kerr, Jacqueline; Godbole, Suneeta; Staudenmayer, John; Lanckriet, Gert

    2016-05-01

    Accelerometers are a valuable tool for objective measurement of physical activity (PA). Wrist-worn devices may improve compliance over standard hip placement, but more research is needed to evaluate their validity for measuring PA in free-living settings. Traditional cut-point methods for accelerometers can be inaccurate and need testing in free living with wrist-worn devices. In this study, we developed and tested the performance of machine learning (ML) algorithms for classifying PA types from both hip and wrist accelerometer data. Forty overweight or obese women (mean age = 55.2 ± 15.3 yr; BMI = 32.0 ± 3.7) wore two ActiGraph GT3X+ accelerometers (right hip, nondominant wrist; ActiGraph, Pensacola, FL) for seven free-living days. Wearable cameras captured ground truth activity labels. A classifier consisting of a random forest and hidden Markov model classified the accelerometer data into four activities (sitting, standing, walking/running, and riding in a vehicle). Free-living wrist and hip ML classifiers were compared with each other, with traditional accelerometer cut points, and with an algorithm developed in a laboratory setting. The ML classifier obtained average values of 89.4% and 84.6% balanced accuracy over the four activities using the hip and wrist accelerometer, respectively. In our data set with average values of 28.4 min of walking or running per day, the ML classifier predicted average values of 28.5 and 24.5 min of walking or running using the hip and wrist accelerometer, respectively. Intensity-based cut points and the laboratory algorithm significantly underestimated walking minutes. Our results demonstrate the superior performance of our PA-type classification algorithm, particularly in comparison with traditional cut points. Although the hip algorithm performed better, additional compliance achieved with wrist devices might justify using a slightly lower performing algorithm.

  6. Optimal Combination of Classification Algorithms and Feature Ranking Methods for Object-Based Classification of Submeter Resolution Z/I-Imaging DMC Imagery

    Directory of Open Access Journals (Sweden)

    Fulgencio Cánovas-García

    2015-04-01

    Full Text Available Object-based image analysis allows several different features to be calculated for the resulting objects. However, a large number of features means longer computing times and might even result in a loss of classification accuracy. In this study, we use four feature ranking methods (maximum correlation, average correlation, Jeffries–Matusita distance and mean decrease in the Gini index and five classification algorithms (linear discriminant analysis, naive Bayes, weighted k-nearest neighbors, support vector machines and random forest. The objective is to discover the optimal algorithm and feature subset to maximize accuracy when classifying a set of 1,076,937 objects, produced by the prior segmentation of a 0.45-m resolution multispectral image, with 356 features calculated on each object. The study area is both large (9070 ha and diverse, which increases the possibility to generalize the results. The mean decrease in the Gini index was found to be the feature ranking method that provided highest accuracy for all of the classification algorithms. In addition, support vector machines and random forest obtained the highest accuracy in the classification, both using their default parameters. This is a useful result that could be taken into account in the processing of high-resolution images in large and diverse areas to obtain a land cover classification.

  7. Semi-Supervised Learning Based on Manifold in BCI

    Institute of Scientific and Technical Information of China (English)

    Ji-Ying Zhong; Xu Lei; De-Zhong Yao

    2009-01-01

    A Laplacian support vector machine (LapSVM) algorithm,a semi-supervised learning based on manifold,is introduced to brain-computer interface (BCI) to raise the classification precision and reduce the subjects' training complexity.The data are collected from three subjects in a three-task mental imagery experiment.LapSVM and transductive SVM (TSVM) are trained with a few labeled samples and a large number of unlabeled samples.The results confirm that LapSVM has a much better classification than TSVM.

  8. Fuzzy Classification of Ocean Color Satellite Data for Bio-optical Algorithm Constituent Retrievals

    Science.gov (United States)

    Campbell, Janet W.

    1998-01-01

    The ocean has been traditionally viewed as a 2 class system. Morel and Prieur (1977) classified ocean water according to the dominant absorbent particle suspended in the water column. Case 1 is described as having a high concentration of phytoplankton (and detritus) relative to other particles. Conversely, case 2 is described as having inorganic particles such as suspended sediments in high concentrations. Little work has gone into the problem of mixing bio-optical models for these different water types. An approach is put forth here to blend bio-optical algorithms based on a fuzzy classification scheme. This scheme involves two procedures. First, a clustering procedure identifies classes and builds class statistics from in-situ optical measurements. Next, a classification procedure assigns satellite pixels partial memberships to these classes based on their ocean color reflectance signature. These membership assignments can be used as the basis for a weighting retrievals from class-specific bio-optical algorithms. This technique is demonstrated with in-situ optical measurements and an image from the SeaWiFS ocean color satellite.

  9. A HYBRID CLASSIFICATION ALGORITHM TO CLASSIFY ENGINEERING STUDENTS’ PROBLEMS AND PERKS

    Directory of Open Access Journals (Sweden)

    Mitali Desai

    2016-03-01

    Full Text Available The social networking sites have brought a new horizon for expressing views and opinions of individuals. Moreover, they provide medium to students to share their sentiments including struggles and joy during the learning process. Such informal information has a great venue for decision making. The large and growing scale of information needs automatic classification techniques. Sentiment analysis is one of the automated techniques to classify large data. The existing predictive sentiment analysis techniques are highly used to classify reviews on E-commerce sites to provide business intelligence. However, they are not much useful to draw decisions in education system since they classify the sentiments into merely three pre-set categories: positive, negative and neutral. Moreover, classifying the students’ sentiments into positive or negative category does not provide deeper insight into their problems and perks. In this paper, we propose a novel Hybrid Classification Algorithm to classify engineering students’ sentiments. Unlike traditional predictive sentiment analysis techniques, the proposed algorithm makes sentiment analysis process descriptive. Moreover, it classifies engineering students’ perks in addition to problems into several categories to help future students and education system in decision making.

  10. Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

    Directory of Open Access Journals (Sweden)

    Irena Jekova

    2005-12-01

    Full Text Available In this study we investigated the adequacy of two non-orthogonal ECG leads from Holter recordings to provide reliable vectorcardiogram (VCG parameters. The VCG loop was constructed using the QRS samples in a fixed-size window around the fiducial point. We developed an algorithm for fast approximation of the VCG loop, estimation of its area and calculation of relative VCG characteristics, which are expected to be minimally dependent on the patient individuality and the ECG recording conditions. Moreover, in order to obtain independent from the heart rate temporal QRS characteristics, we introduced a parameter for estimation of the differences of the interbeat RR intervals. The statistical assessment of the proposed VCG and RR interval parameters showed distinguishing distributions for N and PVC beats. The reliability for PVC detection of the extracted parameter set was estimated independently with two classification methods - a stepwise discriminant analysis and a decision-tree-like classification algorithm, using the publicly available MIT-BIH arrhythmia database. The accuracy achieved with the stepwise discriminant analysis presented sensitivity of 91% and specificity of 95.6%, while the decision-tree-like technique assured sensitivity of 93.3% and specificity of 94.6%. We suggested possibilities for accuracy improvement with adequate electrodes placement of the Holter leads, supplementary analysis of the type of the predominant beats in the reference VCG matrix and smaller step for VCG loop approximation.

  11. HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups.

    Science.gov (United States)

    Kloss-Brandstätter, Anita; Pacher, Dominic; Schönherr, Sebastian; Weissensteiner, Hansi; Binna, Robert; Specht, Günther; Kronenberg, Florian

    2011-01-01

    An ongoing source of controversy in mitochondrial DNA (mtDNA) research is based on the detection of numerous errors in mtDNA profiles that led to erroneous conclusions and false disease associations. Most of these controversies could be avoided if the samples' haplogroup status would be taken into consideration. Knowing the mtDNA haplogroup affiliation is a critical prerequisite for studying mechanisms of human evolution and discovering genes involved in complex diseases, and validating phylogenetic consistency using haplogroup classification is an important step in quality control. However, despite the availability of Phylotree, a regularly updated classification tree of global mtDNA variation, the process of haplogroup classification is still time-consuming and error-prone, as researchers have to manually compare the polymorphisms found in a population sample to those summarized in Phylotree, polymorphism by polymorphism, sample by sample. We present HaploGrep, a fast, reliable and straight-forward algorithm implemented in a Web application to determine the haplogroup affiliation of thousands of mtDNA profiles genotyped for the entire mtDNA or any part of it. HaploGrep uses the latest version of Phylotree and offers an all-in-one solution for quality assessment of mtDNA profiles in clinical genetics, population genetics and forensics. HaploGrep can be accessed freely at http://haplogrep.uibk.ac.at.

  12. Genetic algorithm for the optimization of features and neural networks in ECG signals classification

    Science.gov (United States)

    Li, Hongqiang; Yuan, Danyang; Ma, Xiangdong; Cui, Dianyin; Cao, Lu

    2017-01-01

    Feature extraction and classification of electrocardiogram (ECG) signals are necessary for the automatic diagnosis of cardiac diseases. In this study, a novel method based on genetic algorithm-back propagation neural network (GA-BPNN) for classifying ECG signals with feature extraction using wavelet packet decomposition (WPD) is proposed. WPD combined with the statistical method is utilized to extract the effective features of ECG signals. The statistical features of the wavelet packet coefficients are calculated as the feature sets. GA is employed to decrease the dimensions of the feature sets and to optimize the weights and biases of the back propagation neural network (BPNN). Thereafter, the optimized BPNN classifier is applied to classify six types of ECG signals. In addition, an experimental platform is constructed for ECG signal acquisition to supply the ECG data for verifying the effectiveness of the proposed method. The GA-BPNN method with the MIT-BIH arrhythmia database achieved a dimension reduction of nearly 50% and produced good classification results with an accuracy of 97.78%. The experimental results based on the established acquisition platform indicated that the GA-BPNN method achieved a high classification accuracy of 99.33% and could be efficiently applied in the automatic identification of cardiac arrhythmias.

  13. Sow-activity classification from acceleration patterns

    DEFF Research Database (Denmark)

    Escalante, Hugo Jair; Rodriguez, Sara V.; Cordero, Jorge

    2013-01-01

    This paper describes a supervised learning approach to sow-activity classification from accelerometer measurements. In the proposed methodology, pairs of accelerometer measurements and activity types are considered as labeled instances of a usual supervised classification task. Under this scenario...... sow-activity classification can be approached with standard machine learning methods for pattern classification. Individual predictions for elements of times series of arbitrary length are combined to classify it as a whole. An extensive comparison of representative learning algorithms, including...... neural networks, support vector machines, and ensemble methods, is presented. Experimental results are reported using a data set for sow-activity classification collected in a real production herd. The data set, which has been widely used in related works, includes measurements from active (Feeding...

  14. Discrimination of Rice Varieties using LS-SVM Classification Algorithms and Hyperspectral Data

    Directory of Open Access Journals (Sweden)

    Jin Xiaming

    2015-03-01

    Full Text Available Fast discrimination of rice varieties plays a key role in the rice processing industry and benefits the management of rice in the supermarket. In order to discriminate rice varieties in a fast and nondestructive way, hyperspectral technology and several classification algorithms were used in this study. The hyperspectral data of 250 rice samples of 5 varieties were obtained using FieldSpec®3 spectrometer. Multiplication Scatter Correction (MSC was used to preprocess the raw spectra. Principal Component Analysis (PCA was used to reduce the dimension of raw spectra. To investigate the influence of different linear and non-linear classification algorithms on the discrimination results, K-Nearest Neighbors (KNN, Support Vector Machine (SVM and Least Square Support Vector Machine (LS-SVM were used to develop the discrimination models respectively. Then the performances of these three multivariate classification methods were compared according to the discrimination accuracy. The number of Principal Components (PCs and K parameter of KNN, kernel function of SVM or LS-SVM, were optimized by cross-validation in corresponding models. One hundred and twenty five rice samples (25 of each variety were chosen as calibration set and the remaining 125 rice samples were prediction set. The experiment results showed that, the optimal PCs was 8 and the cross-validation accuracy of KNN (K = 2, SVM, LS-SVM were 94.4, 96.8 and 100%, respectively, while the prediction accuracy of KNN (K = 2, SVM, LS-SVM were 89.6, 93.6 and 100%, respectively. The results indicated that LS-SVM performed the best in the discrimination of rice varieties.

  15. Contracted Nose after Silicone Implantation: A New Classification System and Treatment Algorithm

    Science.gov (United States)

    Kim, Yong Kyu; Shin, Seungho; Kim, Joo Heon

    2017-01-01

    Background Silicone implants are frequently used in augmentation rhinoplasty in Asians. A common complication of silicone augmentation rhinoplasty is capsular contracture. This is similar to the capsular contracture after augmentation mammoplasty, but a classification for secondary contracture after augmentation rhinoplasty with silicone implants has not yet been established, and treatment algorithms by grade or severity have yet to be developed. Methods Photographs of 695 patients who underwent augmentation rhinoplasty with a silicone implant from May 2001 to May 2015 were analyzed. The mean observation period was 11.4 months. Of the patients, 81 were male and 614 were female, with a mean age of 35.9 years. Grades were assigned according to postoperative appearance. Grade I was a natural appearance, as if an implant had not been inserted. Grade II was an unnatural lateral margin of the implant. Clearly identifiable implant deviation was classified as grade III, and short nose deformation was grade IV. Results Grade I outcomes were found in 498 patients (71.7%), grade II outcomes in 101 (14.5%), grade III outcomes in 75 (10.8%), and grade IV outcomes in 21 patients (3.0%). Revision surgery was indicated for the 13.8% of all patients who had grade III or IV outcomes. Conclusions It is important to clinically classify the deformations due to secondary contracture after surgery and to establish treatment algorithms to improve scientific communication among rhinoplasty surgeons. In this study, we suggest guidelines for the clinical classification of secondary capsular contracture after augmentation rhinoplasty, and also propose a treatment algorithm. PMID:28194349

  16. Contracted Nose after Silicone Implantation: A New Classification System and Treatment Algorithm

    Directory of Open Access Journals (Sweden)

    Yong Kyu Kim

    2017-01-01

    Full Text Available BackgroundSilicone implants are frequently used in augmentation rhinoplasty in Asians. A common complication of silicone augmentation rhinoplasty is capsular contracture. This is similar to the capsular contracture after augmentation mammoplasty, but a classification for secondary contracture after augmentation rhinoplasty with silicone implants has not yet been established, and treatment algorithms by grade or severity have yet to be developed.MethodsPhotographs of 695 patients who underwent augmentation rhinoplasty with a silicone implant from May 2001 to May 2015 were analyzed. The mean observation period was 11.4 months. Of the patients, 81 were male and 614 were female, with a mean age of 35.9 years. Grades were assigned according to postoperative appearance. Grade I was a natural appearance, as if an implant had not been inserted. Grade II was an unnatural lateral margin of the implant. Clearly identifiable implant deviation was classified as grade III, and short nose deformation was grade IV.ResultsGrade I outcomes were found in 498 patients (71.7%, grade II outcomes in 101 (14.5%, grade III outcomes in 75 (10.8%, and grade IV outcomes in 21 patients (3.0%. Revision surgery was indicated for the 13.8% of all patients who had grade III or IV outcomes.ConclusionsIt is important to clinically classify the deformations due to secondary contracture after surgery and to establish treatment algorithms to improve scientific communication among rhinoplasty surgeons. In this study, we suggest guidelines for the clinical classification of secondary capsular contracture after augmentation rhinoplasty, and also propose a treatment algorithm.

  17. Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data

    Directory of Open Access Journals (Sweden)

    Viswanath Satish

    2012-02-01

    of high-dimensional biomedical data classification and segmentation problems. Our generalizable framework allows for improved representation and classification in the context of both imaging and non-imaging data. The algorithm offers a promising solution to problems that currently plague DR methods, and may allow for extension to other areas of biomedical data analysis.

  18. HAMA-Based Semi-Supervised Hashing Algorithm%基于HAMA的半监督哈希方法

    Institute of Scientific and Technical Information of China (English)

    刘扬; 朱明

    2014-01-01

    In the massive data retrieval applications, hashing-based approximate nearest(ANN) search has become popular due to its computational and memory efficiency for online search. Semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. But the training of hashing function of this framework is so slow due to the large-scale complex training process. HAMA is a Hadoop top-level parallel framework based on Bulk Synchronous Parallel mode (BSP). In this paper, we analyze calculation of adjusted covariance matrix in the training process of SSH, split it into two parts:unsupervised data variance part and supervised pairwise labeled data part, and explore its parallelization. And experiments show the performance and scalability over general commercial hardware and network environment.%在海量数据检索应用中,基于哈希算法的最近邻搜索算法有着很高的计算和内存效率。而半监督哈希算法,结合了无监督哈希算法的正规化信息以及监督算法跨越语义鸿沟的优点,从而取得了良好的结果。但其线下的哈希函数训练过程则非常之缓慢,要对全部数据集进行复杂的训练过程。 HAMA是在Hadoop平台基础上,按照分布式计算BSP模型构建的并行计算框架。本文尝试在HAMA框架基础上,将半监督哈希算法的训练过程中的调整相关矩阵计算过程分解为无监督的相关矩阵部分与监督性的调整部分,分别进行并行计算处理。这使得使得其可以水平扩展在较大规模的商业计算集群上,使得其可以应用于实际应用。实验表明,这种分布式算法,有效提高算法的性能,并且可以进一步应用在大规模的计算集群上。

  19. Classification of biological cells using bio-inspired descriptors

    OpenAIRE

    Bel Haj Ali, Wafa; Giampaglia, Dario; Barlaud, Michel; Piro, Paolo; Nock, Richard; Pourcher, Thierry

    2012-01-01

    International audience; This paper proposes a novel automated approach for the categorization of cells in fluorescence microscopy images. Our supervised classification method aims at recognizing patterns of unlabeled cells based on an annotated dataset. First, the cell images need to be indexed by encoding them in a feature space. For this purpose, we propose tailored bio-inspired features relying on the distribution of contrast information. Then, a supervised learning algorithm is proposed f...

  20. A Study on Approaches of Classification Supervision on Property Insurance Companies with the Analytic Hierarchy Model%利用层次分析模型研究财产保险公司监管分类方法

    Institute of Scientific and Technical Information of China (English)

    王智鑫; 罗军; 龙胤

    2012-01-01

    本文根据中国保险监督管理委员会《保险公司分支机构分类监管暂行办法》和陕西省保监局《陕西省保险公司分类监管办法(征求意见稿)》,选择在陕财产保险公司二级机构2010年相关数据,运用层次分析数学模型对相关指标进行分析和评价,与陕西保监局的《陕西省保险公司分类监管办法(征求意见稿)》分析结果进行对比,以达到对财产保险公司省级分支机构监管分类方法的有效性和科学性,积极探索对其分类的新方法和新思路,为财产保险公司二级机构分类监管体系的建立和完善.提供科学的方法依据和理论参考。%Accord to Interim Measures for Classification Supervision of Insurance Company Branches issued by China Insurance Regulatory Commission and Measures for Classification Supervision of Shaanxi Provincial Insurance Companies (Consultative Draft) issued by Shaanxi Bureau of CIRC, the paper chooses data of secondary level property insurance companies in Shaanxi province in 2010, analyzes and evaluates the relevant index with analytic hierarchy model, compares the results so as to establish an efficient and scientific classification supervision on provincial property insurance companies, actively explores new measures of classification supervision, and provides a scientific theoretical reference for establishment and improvement of classification supervision system on secondary institutions of property insurance companies.