WorldWideScience

Sample records for classification model based

  1. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases...

  2. An Agent Based Classification Model

    CERN Document Server

    Gu, Feng; Greensmith, Julie

    2009-01-01

    The major function of this model is to access the UCI Wisconsin Breast Can- cer data-set[1] and classify the data items into two categories, which are normal and anomalous. This kind of classifi cation can be referred as anomaly detection, which discriminates anomalous behaviour from normal behaviour in computer systems. One popular solution for anomaly detection is Artifi cial Immune Sys- tems (AIS). AIS are adaptive systems inspired by theoretical immunology and observed immune functions, principles and models which are applied to prob- lem solving. The Dendritic Cell Algorithm (DCA)[2] is an AIS algorithm that is developed specifi cally for anomaly detection. It has been successfully applied to intrusion detection in computer security. It is believed that agent-based mod- elling is an ideal approach for implementing AIS, as intelligent agents could be the perfect representations of immune entities in AIS. This model evaluates the feasibility of re-implementing the DCA in an agent-based simulation environ- ...

  3. An Efficient Semantic Model For Concept Based Clustering And Classification

    Directory of Open Access Journals (Sweden)

    SaiSindhu Bandaru

    2012-03-01

    Full Text Available Usually in text mining techniques the basic measures like term frequency of a term (word or phrase is computed to compute the importance of the term in the document. But with statistical analysis, the original semantics of the term may not carry the exact meaning of the term. To overcome this problem, a new framework has been introduced which relies on concept based model and synonym based approach. The proposed model can efficiently find significant matching and related concepts between documents according to concept based and synonym based approaches. Large sets of experiments using the proposed model on different set in clustering and classification are conducted. Experimental results demonstrate the substantialenhancement of the clustering quality using sentence based, document based, corpus based and combined approach concept analysis. A new similarity measure has been proposed to find the similarity between adocument and the existing clusters, which can be used in classification of the document with existing clusters.

  4. Choice-Based Conjoint Analysis: Classification vs. Discrete Choice Models

    Science.gov (United States)

    Giesen, Joachim; Mueller, Klaus; Taneva, Bilyana; Zolliker, Peter

    Conjoint analysis is a family of techniques that originated in psychology and later became popular in market research. The main objective of conjoint analysis is to measure an individual's or a population's preferences on a class of options that can be described by parameters and their levels. We consider preference data obtained in choice-based conjoint analysis studies, where one observes test persons' choices on small subsets of the options. There are many ways to analyze choice-based conjoint analysis data. Here we discuss the intuition behind a classification based approach, and compare this approach to one based on statistical assumptions (discrete choice models) and to a regression approach. Our comparison on real and synthetic data indicates that the classification approach outperforms the discrete choice models.

  5. Sparse Representation Based Binary Hypothesis Model for Hyperspectral Image Classification

    Directory of Open Access Journals (Sweden)

    Yidong Tang

    2016-01-01

    Full Text Available The sparse representation based classifier (SRC and its kernel version (KSRC have been employed for hyperspectral image (HSI classification. However, the state-of-the-art SRC often aims at extended surface objects with linear mixture in smooth scene and assumes that the number of classes is given. Considering the small target with complex background, a sparse representation based binary hypothesis (SRBBH model is established in this paper. In this model, a query pixel is represented in two ways, which are, respectively, by background dictionary and by union dictionary. The background dictionary is composed of samples selected from the local dual concentric window centered at the query pixel. Thus, for each pixel the classification issue becomes an adaptive multiclass classification problem, where only the number of desired classes is required. Furthermore, the kernel method is employed to improve the interclass separability. In kernel space, the coding vector is obtained by using kernel-based orthogonal matching pursuit (KOMP algorithm. Then the query pixel can be labeled by the characteristics of the coding vectors. Instead of directly using the reconstruction residuals, the different impacts the background dictionary and union dictionary have on reconstruction are used for validation and classification. It enhances the discrimination and hence improves the performance.

  6. A Fuzzy Similarity Based Concept Mining Model for Text Classification

    CERN Document Server

    Puri, Shalini

    2012-01-01

    Text Classification is a challenging and a red hot field in the current scenario and has great importance in text categorization applications. A lot of research work has been done in this field but there is a need to categorize a collection of text documents into mutually exclusive categories by extracting the concepts or features using supervised learning paradigm and different classification algorithms. In this paper, a new Fuzzy Similarity Based Concept Mining Model (FSCMM) is proposed to classify a set of text documents into pre - defined Category Groups (CG) by providing them training and preparing on the sentence, document and integrated corpora levels along with feature reduction, ambiguity removal on each level to achieve high system performance. Fuzzy Feature Category Similarity Analyzer (FFCSA) is used to analyze each extracted feature of Integrated Corpora Feature Vector (ICFV) with the corresponding categories or classes. This model uses Support Vector Machine Classifier (SVMC) to classify correct...

  7. A Fuzzy Similarity Based Concept Mining Model for Text Classification

    Directory of Open Access Journals (Sweden)

    Shalini Puri

    2011-11-01

    Full Text Available Text Classification is a challenging and a red hot field in the current scenario and has great importance in text categorization applications. A lot of research work has been done in this field but there is a need to categorize a collection of text documents into mutually exclusive categories by extracting the concepts or features using supervised learning paradigm and different classification algorithms. In this paper, a new Fuzzy Similarity Based Concept Mining Model (FSCMM is proposed to classify a set of text documents into pre - defined Category Groups (CG by providing them training and preparing on the sentence, document and integrated corpora levels along with feature reduction, ambiguity removal on each level to achieve high system performance. Fuzzy Feature Category Similarity Analyzer (FFCSA is used to analyze each extracted feature of Integrated Corpora Feature Vector (ICFV with the corresponding categories or classes. This model uses Support Vector Machine Classifier (SVMC to classify correctly the training data patterns into two groups; i. e., + 1 and – 1, thereby producing accurate and correct results. The proposed model works efficiently and effectively with great performance and high - accuracy results.

  8. TENSOR MODELING BASED FOR AIRBORNE LiDAR DATA CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    N. Li

    2016-06-01

    Full Text Available Feature selection and description is a key factor in classification of Earth observation data. In this paper a classification method based on tensor decomposition is proposed. First, multiple features are extracted from raw LiDAR point cloud, and raster LiDAR images are derived by accumulating features or the “raw” data attributes. Then, the feature rasters of LiDAR data are stored as a tensor, and tensor decomposition is used to select component features. This tensor representation could keep the initial spatial structure and insure the consideration of the neighborhood. Based on a small number of component features a k nearest neighborhood classification is applied.

  9. Tensor Modeling Based for Airborne LiDAR Data Classification

    Science.gov (United States)

    Li, N.; Liu, C.; Pfeifer, N.; Yin, J. F.; Liao, Z. Y.; Zhou, Y.

    2016-06-01

    Feature selection and description is a key factor in classification of Earth observation data. In this paper a classification method based on tensor decomposition is proposed. First, multiple features are extracted from raw LiDAR point cloud, and raster LiDAR images are derived by accumulating features or the "raw" data attributes. Then, the feature rasters of LiDAR data are stored as a tensor, and tensor decomposition is used to select component features. This tensor representation could keep the initial spatial structure and insure the consideration of the neighborhood. Based on a small number of component features a k nearest neighborhood classification is applied.

  10. State-Based Models for Light Curve Classification

    Science.gov (United States)

    Becker, A.

    I discuss here the application of continuous time autoregressive models to the characterization of astrophysical variability. These types of models are general enough to represent many classes of variability, and descriptive enough to provide features for lightcurve classification. Importantly, the features of these models may be interpreted in terms of the power spectrum of the lightcurve, enabling constraints on characteristic timescales and periodicity. These models may be extended to include vector-valued inputs, raising the prospect of a fully general modeling and classification environment that uses multi-passband inputs to create a single phenomenological model. These types of spectral-temporal models are an important extension of extant techniques, and necessary in the upcoming eras of Gaia and LSST.

  11. Semi-Supervised Classification based on Gaussian Mixture Model for remote imagery

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    Semi-Supervised Classification (SSC),which makes use of both labeled and unlabeled data to determine classification borders in feature space,has great advantages in extracting classification information from mass data.In this paper,a novel SSC method based on Gaussian Mixture Model (GMM) is proposed,in which each class’s feature space is described by one GMM.Experiments show the proposed method can achieve high classification accuracy with small amount of labeled data.However,for the same accuracy,supervised classification methods such as Support Vector Machine,Object Oriented Classification,etc.should be provided with much more labeled data.

  12. Pitch Based Sound Classification

    DEFF Research Database (Denmark)

    Nielsen, Andreas Brinch; Hansen, Lars Kai; Kjems, U

    2006-01-01

    A sound classification model is presented that can classify signals into music, noise and speech. The model extracts the pitch of the signal using the harmonic product spectrum. Based on the pitch estimate and a pitch error measure, features are created and used in a probabilistic model with soft......-max output function. Both linear and quadratic inputs are used. The model is trained on 2 hours of sound and tested on publicly available data. A test classification error below 0.05 with 1 s classification windows is achieved. Further more it is shown that linear input performs as well as a quadratic......, and that even though classification gets marginally better, not much is achieved by increasing the window size beyond 1 s....

  13. About Classification Methods Based on Tensor Modelling for Hyperspectral Images

    Directory of Open Access Journals (Sweden)

    Salah Bourennane

    2010-03-01

    Full Text Available Denoising and Dimensionality Reduction (DR are key issue to improve the classifiers efficiency for Hyper spectral images (HSI. The multi-way Wiener filtering recently developed is used, Principal and independent component analysis (PCA; ICA and projection pursuit(PP approaches to DR have been investigated. These matrix algebra methods are applied on vectorized images. Thereof, the spatial rearrangement is lost. To jointly take advantage of the spatial and spectral information, HSI has been recently represented as tensor. Offering multiple ways to decompose data orthogonally, we introduced filtering and DR methods based on multilinear algebra tools. The DR is performed on spectral way using PCA, or PP joint to an orthogonal projection onto a lower subspace dimension of the spatial ways. Weshow the classification improvement using the introduced methods in function to existing methods. This experiment is exemplified using real-world HYDICE data. Multi-way filtering, Dimensionality reduction, matrix and multilinear algebra tools, tensor processing.

  14. Objected-oriented remote sensing image classification method based on geographic ontology model

    Science.gov (United States)

    Chu, Z.; Liu, Z. J.; Gu, H. Y.

    2016-11-01

    Nowadays, with the development of high resolution remote sensing image and the wide application of laser point cloud data, proceeding objected-oriented remote sensing classification based on the characteristic knowledge of multi-source spatial data has been an important trend on the field of remote sensing image classification, which gradually replaced the traditional method through improving algorithm to optimize image classification results. For this purpose, the paper puts forward a remote sensing image classification method that uses the he characteristic knowledge of multi-source spatial data to build the geographic ontology semantic network model, and carries out the objected-oriented classification experiment to implement urban features classification, the experiment uses protégé software which is developed by Stanford University in the United States, and intelligent image analysis software—eCognition software as the experiment platform, uses hyperspectral image and Lidar data that is obtained through flight in DaFeng City of JiangSu as the main data source, first of all, the experiment uses hyperspectral image to obtain feature knowledge of remote sensing image and related special index, the second, the experiment uses Lidar data to generate nDSM(Normalized DSM, Normalized Digital Surface Model),obtaining elevation information, the last, the experiment bases image feature knowledge, special index and elevation information to build the geographic ontology semantic network model that implement urban features classification, the experiment results show that, this method is significantly higher than the traditional classification algorithm on classification accuracy, especially it performs more evidently on the respect of building classification. The method not only considers the advantage of multi-source spatial data, for example, remote sensing image, Lidar data and so on, but also realizes multi-source spatial data knowledge integration and application

  15. The method of narrow-band audio classification based on universal noise background model

    Science.gov (United States)

    Rui, Rui; Bao, Chang-chun

    2013-03-01

    Audio classification is the basis of content-based audio analysis and retrieval. The conventional classification methods mainly depend on feature extraction of audio clip, which certainly increase the time requirement for classification. An approach for classifying the narrow-band audio stream based on feature extraction of audio frame-level is presented in this paper. The audio signals are divided into speech, instrumental music, song with accompaniment and noise using the Gaussian mixture model (GMM). In order to satisfy the demand of actual environment changing, a universal noise background model (UNBM) for white noise, street noise, factory noise and car interior noise is built. In addition, three feature schemes are considered to optimize feature selection. The experimental results show that the proposed algorithm achieves a high accuracy for audio classification, especially under each noise background we used and keep the classification time less than one second.

  16. A unified classification model for modeling of seismic liquefaction potential of soil based on CPT.

    Science.gov (United States)

    Samui, Pijush; Hariharan, R

    2015-07-01

    The evaluation of liquefaction potential of soil due to an earthquake is an important step in geosciences. This article examines the capability of Minimax Probability Machine (MPM) for the prediction of seismic liquefaction potential of soil based on the Cone Penetration Test (CPT) data. The dataset has been taken from Chi-Chi earthquake. MPM is developed based on the use of hyperplanes. It has been adopted as a classification tool. This article uses two models (MODEL I and MODEL II). MODEL I employs Cone Resistance (q c) and Cyclic Stress Ratio (CSR) as input variables. q c and Peak Ground Acceleration (PGA) have been taken as inputs for MODEL II. The developed MPM gives 100% accuracy. The results show that the developed MPM can predict liquefaction potential of soil based on q c and PGA.

  17. Content-based similarity for 3D model retrieval and classification

    Institute of Scientific and Technical Information of China (English)

    Ke Lü; Ning He; Jian Xue

    2009-01-01

    With the rapid development of 3D digital shape information,content-based 3D model retrieval and classification has become an important research area.This paper presents a novel 3D model retrieval and classification algorithm.For feature representation,a method combining a distance histogram and moment invariants is proposed to improve the retrieval performance.The major advantage of using a distance histogram is its invariance to the transforms of scaling,translation and rotation.Based on the premise that two similar objects should have high mutual information,the querying of 3D data should convey a great deal of information on the shape of the two objects,and so we propose a mutual information distance measurement to perform the similarity comparison of 3D objects.The proposed algorithm is tested with a 3D model retrieval and classification prototype,and the experimental evaluation demonstrates satisfactory retrieval results and classification accuracy.

  18. Gaussian Mixture Model and Deep Neural Network based Vehicle Detection and Classification

    Directory of Open Access Journals (Sweden)

    S Sri Harsha

    2016-09-01

    Full Text Available The exponential rise in the demand of vision based traffic surveillance systems have motivated academia-industries to develop optimal vehicle detection and classification scheme. In this paper, an adaptive learning rate based Gaussian mixture model (GMM algorithm has been developed for background subtraction of multilane traffic data. Here, vehicle rear information and road dash-markings have been used for vehicle detection. Performing background subtraction, connected component analysis has been applied to retrieve vehicle region. A multilayered AlexNet deep neural network (DNN has been applied to extract higher layer features. Furthermore, scale invariant feature transform (SIFT based vehicle feature extraction has been performed. The extracted 4096-dimensional features have been processed for dimensional reduction using principle component analysis (PCA and linear discriminant analysis (LDA. The features have been mapped for SVM-based classification. The classification results have exhibited that AlexNet-FC6 features with LDA give the accuracy of 97.80%, followed by AlexNet-FC6 with PCA (96.75%. AlexNet-FC7 feature with LDA and PCA algorithms has exhibited classification accuracy of 91.40% and 96.30%, respectively. On the contrary, SIFT features with LDA algorithm has exhibited 96.46% classification accuracy. The results revealed that enhanced GMM with AlexNet DNN at FC6 and FC7 can be significant for optimal vehicle detection and classification.

  19. Predictive mapping of soil organic carbon in wet cultivated lands using classification-tree based models

    DEFF Research Database (Denmark)

    Kheir, Rania Bou; Greve, Mogens Humlekrog; Bøcher, Peder Klith

    2010-01-01

    field measurements in the area of interest (Denmark). A large number of tree-based classification models (588) were developed using (i) all of the parameters, (ii) all Digital Elevation Model (DEM) parameters only, (iii) the primary DEM parameters only, (iv), the remote sensing (RS) indices only, (v...... the geographic distribution of SOC across Denmark using remote sensing (RS), geographic information systems (GISs) and decision-tree modeling (un-pruned and pruned classification trees). Seventeen parameters, i.e. parent material, soil type, landscape type, elevation, slope gradient, slope aspect, mean curvature......) selected pairs of parameters, (vi) soil type, parent material and landscape type only, and (vii) the parameters having a high impact on SOC distribution in built pruned trees. The best constructed classification tree models (in the number of three) with the lowest misclassification error (ME...

  20. Estimation and Model Selection for Model-Based Clustering with the Conditional Classification Likelihood

    CERN Document Server

    Baudry, Jean-Patrick

    2012-01-01

    The Integrated Completed Likelihood (ICL) criterion has been proposed by Biernacki et al. (2000) in the model-based clustering framework to select a relevant number of classes and has been used by statisticians in various application areas. A theoretical study of this criterion is proposed. A contrast related to the clustering objective is introduced: the conditional classification likelihood. This yields an estimator and a model selection criteria class. The properties of these new procedures are studied and ICL is proved to be an approximation of one of these criteria. We oppose these results to the current leading point of view about ICL, that it would not be consistent. Moreover these results give insights into the class notion underlying ICL and feed a reflection on the class notion in clustering. General results on penalized minimum contrast criteria and on mixture models are derived, which are interesting in their own right.

  1. Wearable-Sensor-Based Classification Models of Faller Status in Older Adults

    Science.gov (United States)

    2016-01-01

    Wearable sensors have potential for quantitative, gait-based, point-of-care fall risk assessment that can be easily and quickly implemented in clinical-care and older-adult living environments. This investigation generated models for wearable-sensor based fall-risk classification in older adults and identified the optimal sensor type, location, combination, and modelling method; for walking with and without a cognitive load task. A convenience sample of 100 older individuals (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m under single-task and dual-task conditions while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, and left and right shanks. Participants also completed the Activities-specific Balance Confidence scale, Community Health Activities Model Program for Seniors questionnaire, six minute walk test, and ranked their fear of falling. Fall risk classification models were assessed for all sensor combinations and three model types: multi-layer perceptron neural network, naïve Bayesian, and support vector machine. The best performing model was a multi-layer perceptron neural network with input parameters from pressure-sensing insoles and head, pelvis, and left shank accelerometers (accuracy = 84%, F1 score = 0.600, MCC score = 0.521). Head sensor-based models had the best performance of the single-sensor models for single-task gait assessment. Single-task gait assessment models outperformed models based on dual-task walking or clinical assessment data. Support vector machines and neural networks were the best modelling technique for fall risk classification. Fall risk classification models developed for point-of-care environments should be developed using support vector machines and neural networks, with a multi-sensor single-task gait assessment. PMID:27054878

  2. Wearable-Sensor-Based Classification Models of Faller Status in Older Adults.

    Science.gov (United States)

    Howcroft, Jennifer; Lemaire, Edward D; Kofman, Jonathan

    2016-01-01

    Wearable sensors have potential for quantitative, gait-based, point-of-care fall risk assessment that can be easily and quickly implemented in clinical-care and older-adult living environments. This investigation generated models for wearable-sensor based fall-risk classification in older adults and identified the optimal sensor type, location, combination, and modelling method; for walking with and without a cognitive load task. A convenience sample of 100 older individuals (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m under single-task and dual-task conditions while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, and left and right shanks. Participants also completed the Activities-specific Balance Confidence scale, Community Health Activities Model Program for Seniors questionnaire, six minute walk test, and ranked their fear of falling. Fall risk classification models were assessed for all sensor combinations and three model types: multi-layer perceptron neural network, naïve Bayesian, and support vector machine. The best performing model was a multi-layer perceptron neural network with input parameters from pressure-sensing insoles and head, pelvis, and left shank accelerometers (accuracy = 84%, F1 score = 0.600, MCC score = 0.521). Head sensor-based models had the best performance of the single-sensor models for single-task gait assessment. Single-task gait assessment models outperformed models based on dual-task walking or clinical assessment data. Support vector machines and neural networks were the best modelling technique for fall risk classification. Fall risk classification models developed for point-of-care environments should be developed using support vector machines and neural networks, with a multi-sensor single-task gait assessment.

  3. Ligand and structure-based classification models for Prediction of P-glycoprotein inhibitors

    DEFF Research Database (Denmark)

    Klepsch, Freya; Poongavanam, Vasanthanathan; Ecker, Gerhard Franz

    2014-01-01

    The ABC transporter P-glycoprotein (P-gp) actively transports a wide range of drugs and toxins out of cells, and is therefore related to multidrug resistance and the ADME profile of therapeutics. Thus, development of predictive in silico models for the identification of P-gp inhibitors is of great...... interest in the field of drug discovery and development. So far in-silico P-gp inhibitor prediction was dominated by ligand-based approaches, due to the lack of high-quality structural information about P-gp. The present study aims at comparing the P-gp inhibitor/non-inhibitor classification performance...... an algorithm based on Euclidean distance. Results show that random forest and SVM performed best for classification of P-gp inhibitors and non-inhibitors, correctly predicting 73/75 % of the external test set compounds. Classification based on the docking experiments using the scoring function Chem...

  4. Hierarchical Web Page Classification Based on a Topic Model and Neighboring Pages Integration

    OpenAIRE

    Sriurai, Wongkot; Meesad, Phayung; Haruechaiyasak, Choochart

    2010-01-01

    Most Web page classification models typically apply the bag of words (BOW) model to represent the feature space. The original BOW representation, however, is unable to recognize semantic relationships between terms. One possible solution is to apply the topic model approach based on the Latent Dirichlet Allocation algorithm to cluster the term features into a set of latent topics. Terms assigned into the same topic are semantically related. In this paper, we propose a novel hierarchical class...

  5. SVM classification model in depression recognition based on mutation PSO parameter optimization

    Directory of Open Access Journals (Sweden)

    Zhang Ming

    2017-01-01

    Full Text Available At present, the clinical diagnosis of depression is mainly through structured interviews by psychiatrists, which is lack of objective diagnostic methods, so it causes the higher rate of misdiagnosis. In this paper, a method of depression recognition based on SVM and particle swarm optimization algorithm mutation is proposed. To address on the problem that particle swarm optimization (PSO algorithm easily trap in local optima, we propose a feedback mutation PSO algorithm (FBPSO to balance the local search and global exploration ability, so that the parameters of the classification model is optimal. We compared different PSO mutation algorithms about classification accuracy for depression, and found the classification accuracy of support vector machine (SVM classifier based on feedback mutation PSO algorithm is the highest. Our study promotes important reference value for establishing auxiliary diagnostic used in depression recognition of clinical diagnosis.

  6. A Trust Model Based on Service Classification in Mobile Services

    CERN Document Server

    Liu, Yang; Xia, Feng; Lv, Xiaoning; Bu, Fanyu

    2010-01-01

    Internet of Things (IoT) and B3G/4G communication are promoting the pervasive mobile services with its advanced features. However, security problems are also baffled the development. This paper proposes a trust model to protect the user's security. The billing or trust operator works as an agent to provide a trust authentication for all the service providers. The services are classified by sensitive value calculation. With the value, the user's trustiness for corresponding service can be obtained. For decision, three trust regions are divided, which is referred to three ranks: high, medium and low. The trust region tells the customer, with his calculated trust value, which rank he has got and which authentication methods should be used for access. Authentication history and penalty are also involved with reasons.

  7. A model presented for classification ECG signals base on Case-Based Reasoning

    Directory of Open Access Journals (Sweden)

    Elaheh Sayari

    2013-07-01

    Full Text Available Early detection of heart diseases/abnormalities can prolong life and enhance the quality of living through appropriate treatment; thus classifying cardiac signals will be helped to immediate diagnosing of heart beat type in cardiac patients. The present paper utilizes the case base reasoning (CBR for classification of ECG signals. Four types of ECG beats (normal beat, congestive heart failure beat, ventricular tachyarrhythmia beat and atrial fibrillation beat obtained from the PhysioBank database was classified by the proposed CBR model. The main purpose of this article is classifying heart signals and diagnosing the type of heart beat in cardiac patients that in proposed CBR (Case Base Reasoning system, Training and testing data for diagnosing and classifying types of heart beat have been used. The evaluation results from the model are shown that the proposed model has high accuracy in classifying heart signals and helps to clinical decisions for diagnosing the type of heart beat in cardiac patients which indeed has high impact on diagnosing the type of heart beat aided computer.

  8. Model-based Methods of Classification: Using the mclust Software in Chemometrics

    Directory of Open Access Journals (Sweden)

    Chris Fraley

    2007-01-01

    Full Text Available Due to recent advances in methods and software for model-based clustering, and to the interpretability of the results, clustering procedures based on probability models are increasingly preferred over heuristic methods. The clustering process estimates a model for the data that allows for overlapping clusters, producing a probabilistic clustering that quantifies the uncertainty of observations belonging to components of the mixture. The resulting clustering model can also be used for some other important problems in multivariate analysis, including density estimation and discriminant analysis. Examples of the use of model-based clustering and classification techniques in chemometric studies include multivariate image analysis, magnetic resonance imaging, microarray image segmentation, statistical process control, and food authenticity. We review model-based clustering and related methods for density estimation and discriminant analysis, and show how the R package mclust can be applied in each instance.

  9. Classification and estimation in the Stochastic Block Model based on the empirical degrees

    CERN Document Server

    Channarond, Antoine; Robin, Stéphane

    2011-01-01

    The Stochastic Block Model (Holland et al., 1983) is a mixture model for heterogeneous network data. Unlike the usual statistical framework, new nodes give additional information about the previous ones in this model. Thereby the distribution of the degrees concentrates in points conditionally on the node class. We show under a mild assumption that classification, estimation and model selection can actually be achieved with no more than the empirical degree data. We provide an algorithm able to process very large networks and consistent estimators based on it. In particular, we prove a bound of the probability of misclassification of at least one node, including when the number of classes grows.

  10. Latent classification models

    DEFF Research Database (Denmark)

    Langseth, Helge; Nielsen, Thomas Dyhre

    2005-01-01

    One of the simplest, and yet most consistently well-performing setof classifiers is the \\NB models. These models rely on twoassumptions: $(i)$ All the attributes used to describe an instanceare conditionally independent given the class of that instance,and $(ii)$ all attributes follow a specific...... parametric family ofdistributions.  In this paper we propose a new set of models forclassification in continuous domains, termed latent classificationmodels. The latent classification model can roughly be seen ascombining the \\NB model with a mixture of factor analyzers,thereby relaxing the assumptions...... classification model, and wedemonstrate empirically that the accuracy of the proposed model issignificantly higher than the accuracy of other probabilisticclassifiers....

  11. A novel transferable individual tree crown delineation model based on Fishing Net Dragging and boundary classification

    Science.gov (United States)

    Liu, Tao; Im, Jungho; Quackenbush, Lindi J.

    2015-12-01

    This study provides a novel approach to individual tree crown delineation (ITCD) using airborne Light Detection and Ranging (LiDAR) data in dense natural forests using two main steps: crown boundary refinement based on a proposed Fishing Net Dragging (FiND) method, and segment merging based on boundary classification. FiND starts with approximate tree crown boundaries derived using a traditional watershed method with Gaussian filtering and refines these boundaries using an algorithm that mimics how a fisherman drags a fishing net. Random forest machine learning is then used to classify boundary segments into two classes: boundaries between trees and boundaries between branches that belong to a single tree. Three groups of LiDAR-derived features-two from the pseudo waveform generated along with crown boundaries and one from a canopy height model (CHM)-were used in the classification. The proposed ITCD approach was tested using LiDAR data collected over a mountainous region in the Adirondack Park, NY, USA. Overall accuracy of boundary classification was 82.4%. Features derived from the CHM were generally more important in the classification than the features extracted from the pseudo waveform. A comprehensive accuracy assessment scheme for ITCD was also introduced by considering both area of crown overlap and crown centroids. Accuracy assessment using this new scheme shows the proposed ITCD achieved 74% and 78% as overall accuracy, respectively, for deciduous and mixed forest.

  12. Dynamic Latent Classification Model

    DEFF Research Database (Denmark)

    Zhong, Shengtong; Martínez, Ana M.; Nielsen, Thomas Dyhre

    as possible. Motivated by this problem setting, we propose a generative model for dynamic classification in continuous domains. At each time point the model can be seen as combining a naive Bayes model with a mixture of factor analyzers (FA). The latent variables of the FA are used to capture the dynamics...... in the process as well as modeling dependences between attributes....

  13. A technical study and analysis on fuzzy similarity based models for text classification

    CERN Document Server

    Puri, Shalini; 10.5121/ijdkp.2012.2201

    2012-01-01

    In this new and current era of technology, advancements and techniques, efficient and effective text document classification is becoming a challenging and highly required area to capably categorize text documents into mutually exclusive categories. Fuzzy similarity provides a way to find the similarity of features among various documents. In this paper, a technical review on various fuzzy similarity based models is given. These models are discussed and compared to frame out their use and necessity. A tour of different methodologies is provided which is based upon fuzzy similarity related concerns. It shows that how text and web documents are categorized efficiently into different categories. Various experimental results of these models are also discussed. The technical comparisons among each model's parameters are shown in the form of a 3-D chart. Such study and technical review provide a strong base of research work done on fuzzy similarity based text document categorization.

  14. Gene function classification using Bayesian models with hierarchy-based priors

    Directory of Open Access Journals (Sweden)

    Neal Radford M

    2006-10-01

    Full Text Available Abstract Background We investigate whether annotation of gene function can be improved using a classification scheme that is aware that functional classes are organized in a hierarchy. The classifiers look at phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL model, a hierarchical model based on a set of nested MNL models, and an MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs from the E. coli genome. Results The results from all three models show substantial improvement over previous methods, which were based on the C5 decision tree algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining the three sources of information in this dataset, our new approach to combining data sources produces a higher accuracy rate than applying our models to each data source alone. Conclusion Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information.

  15. Integrated knowledge-based modeling and its application for classification problems

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Knowledge discovery from data directly can hardly avoid the fact that it is biased towards the collected experimental data, whereas, expert systems are always baffled with the manual knowledge acquisition bottleneck. So it is believable that integrating the knowledge embedded in data and those possessed by experts can lead to a superior modeling approach. Aiming at the classification problems, a novel integrated knowledge-based modeling methodology, oriented by experts and driven by data, is proposed. It starts from experts identifying modeling parameters, and then the input space is partitioned followed by fuzzification. Afterwards, single rules are generated and then aggregated to form a rule base. on which a fuzzy inference mechanism is proposed. The experts are allowed to make necessary changes on the rule base to improve the model accuracy. A real-world application, welding fault diagnosis, is presented to demonstrate the effectiveness of the methodology.

  16. SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling.

    Science.gov (United States)

    Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin

    2013-03-01

    Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors.

  17. SVD-based modeling for image texture classification using wavelet transformation.

    Science.gov (United States)

    Selvan, Srinivasan; Ramakrishnan, Srinivasan

    2007-11-01

    This paper introduces a new model for image texture classification based on wavelet transformation and singular value decomposition. The probability density function of the singular values of wavelet transformation coefficients of image textures is modeled as an exponential function. The model parameter of the exponential function is estimated using maximum likelihood estimation technique. Truncation of lower singular values is employed to classify textures in the presence of noise. Kullback-Leibler distance (KLD) between estimated model parameters of image textures is used as a similarity metric to perform the classification using minimum distance classifier. The exponential function permits us to have closed-form expressions for the estimate of the model parameter and computation of the KLD. These closed-form expressions reduce the computational complexity of the proposed approach. Experimental results are presented to demonstrate the effectiveness of this approach on the entire 111 textures from Brodatz database. The experimental results demonstrate that the proposed approach improves recognition rates using a lower number of parameters on large databases. The proposed approach achieves higher recognition rates compared to the traditional sub-band energy-based approach, the hybrid IMM/SVM approach, and the GGD-based approach.

  18. Research on evaluating water resource resilience based on projection pursuit classification model

    Science.gov (United States)

    Liu, Dong; Zhao, Dan; Liang, Xu; Wu, Qiuchen

    2016-03-01

    Water is a fundamental natural resource while agriculture water guarantees the grain output, which shows that the utilization and management of water resource have a significant practical meaning. Regional agricultural water resource system features with unpredictable, self-organization, and non-linear which lays a certain difficulty on the evaluation of regional agriculture water resource resilience. The current research on water resource resilience remains to focus on qualitative analysis and the quantitative analysis is still in the primary stage, thus, according to the above issues, projection pursuit classification model is brought forward. With the help of artificial fish-swarm algorithm (AFSA), it optimizes the projection index function, seeks for the optimal projection direction, and improves AFSA with the application of self-adaptive artificial fish step and crowding factor. Taking Hongxinglong Administration of Heilongjiang as the research base and on the basis of improving AFSA, it established the evaluation of projection pursuit classification model to agriculture water resource system resilience besides the proceeding analysis of projection pursuit classification model on accelerating genetic algorithm. The research shows that the water resource resilience of Hongxinglong is the best than Raohe Farm, and the last 597 Farm. And the further analysis shows that the key driving factors influencing agricultural water resource resilience are precipitation and agriculture water consumption. The research result reveals the restoring situation of the local water resource system, providing foundation for agriculture water resource management.

  19. A Quaternary-Stage User Interest Model Based on User Browsing Behavior and Web Page Classification

    Institute of Scientific and Technical Information of China (English)

    Zongli Jiang; Hang Su

    2012-01-01

    The key to personalized search engine lies in user model. Traditional personalized model results in that the search results of secondary search are partial to the long-term interests, besides, forgetting to the long-term interests disenables effective recollection of user interests. This paper presents a quaternary-stage user interest model based on user browsing behavior and web page classification, which consults the principles of cache and recycle bin in operating system, by setting up an illuminating text-stage and a recycle bin interest-stage in front and rear of the traditional interest model respectively to constitute the quaternary-stage user interest model. The model can better reflect the user interests, by using an adaptive natural weight and its calculation method, and by efficiently integrating user browsing behavior and web document content.

  20. Adaptation of motor imagery EEG classification model based on tensor decomposition

    Science.gov (United States)

    Li, Xinyang; Guan, Cuntai; Zhang, Haihong; Keng Ang, Kai; Ong, Sim Heng

    2014-10-01

    Objective. Session-to-session nonstationarity is inherent in brain-computer interfaces based on electroencephalography. The objective of this paper is to quantify the mismatch between the training model and test data caused by nonstationarity and to adapt the model towards minimizing the mismatch. Approach. We employ a tensor model to estimate the mismatch in a semi-supervised manner, and the estimate is regularized in the discriminative objective function. Main results. The performance of the proposed adaptation method was evaluated on a dataset recorded from 16 subjects performing motor imagery tasks on different days. The classification results validated the advantage of the proposed method in comparison with other regularization-based or spatial filter adaptation approaches. Experimental results also showed that there is a significant correlation between the quantified mismatch and the classification accuracy. Significance. The proposed method approached the nonstationarity issue from the perspective of data-model mismatch, which is more direct than data variation measurement. The results also demonstrated that the proposed method is effective in enhancing the performance of the feature extraction model.

  1. Stability classification model of mine-lane surrounding rock based on distance discriminant analysis method

    Institute of Scientific and Technical Information of China (English)

    ZHANG Wei; LI Xi-bing; GONG Feng-qiang

    2008-01-01

    Based on the principle of Mahalanobis distance discriminant analysis (DDA) theory, a stability classification model for mine-lane surrounding rock was established, including six indexes of discriminant factors that reflect the engineering quality of surrounding rock: lane depth below surface, span of lane, ratio of directly top layer thickness to coal thickness, uniaxial comprehensive strength of surrounding rock, development degree coefficient of surrounding rock joint and range of broken surrounding rock zone. A DDA model was obtained through training 15 practical measuring samples. The re-substitution method was introduced to verify the stability of DDA model and the ratio of mis-discrimination is zero. The DDA model was used to discriminate3 new samples and the results are identical with actual rock kind. Compared with the artificial neural network method and support vector mechanic method, the results show that this model has high prediction accuracy and can be used in practical engineering.

  2. Object trajectory-based activity classification and recognition using hidden Markov models.

    Science.gov (United States)

    Bashir, Faisal I; Khokhar, Ashfaq A; Schonfeld, Dan

    2007-07-01

    Motion trajectories provide rich spatiotemporal information about an object's activity. This paper presents novel classification algorithms for recognizing object activity using object motion trajectory. In the proposed classification system, trajectories are segmented at points of change in curvature, and the subtrajectories are represented by their principal component analysis (PCA) coefficients. We first present a framework to robustly estimate the multivariate probability density function based on PCA coefficients of the subtrajectories using Gaussian mixture models (GMMs). We show that GMM-based modeling alone cannot capture the temporal relations and ordering between underlying entities. To address this issue, we use hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology (e.g., left-right versus ergodic). Experiments using a database of over 5700 complex trajectories (obtained from UCI-KDD data archives and Columbia University Multimedia Group) subdivided into 85 different classes demonstrate the superiority of our proposed HMM-based scheme using PCA coefficients of subtrajectories in comparison with other techniques in the literature.

  3. A physiologically-inspired model of numerical classification based on graded stimulus coding

    Directory of Open Access Journals (Sweden)

    John Pearson

    2010-01-01

    Full Text Available In most natural decision contexts, the process of selecting among competing actions takes place in the presence of informative, but potentially ambiguous, stimuli. Decisions about magnitudes—quantities like time, length, and brightness that are linearly ordered—constitute an important subclass of such decisions. It has long been known that perceptual judgments about such quantities obey Weber’s Law, wherein the just-noticeable difference in a magnitude is proportional to the magnitude itself. Current physiologically inspired models of numerical classification assume discriminations are made via a labeled line code of neurons selectively tuned for numerosity, a pattern observed in the firing rates of neurons in the ventral intraparietal area (VIP of the macaque. By contrast, neurons in the contiguous lateral intraparietal area (LIP signal numerosity in a graded fashion, suggesting the possibility that numerical classification could be achieved in the absence of neurons tuned for number. Here, we consider the performance of a decision model based on this analog coding scheme in a paradigmatic discrimination task—numerosity bisection. We demonstrate that a basic two-neuron classifier model, derived from experimentally measured monotonic responses of LIP neurons, is sufficient to reproduce the numerosity bisection behavior of monkeys, and that the threshold of the classifier can be set by reward maximization via a simple learning rule. In addition, our model predicts deviations from Weber Law scaling of choice behavior at high numerosity. Together, these results suggest both a generic neuronal framework for magnitude-based decisions and a role for reward contingency in the classification of such stimuli.

  4. A Classification Model and an Open E-Learning System Based on Intuitionistic Fuzzy Sets for Instructional Design Concepts

    Science.gov (United States)

    Güyer, Tolga; Aydogdu, Seyhmus

    2016-01-01

    This study suggests a classification model and an e-learning system based on this model for all instructional theories, approaches, models, strategies, methods, and technics being used in the process of instructional design that constitutes a direct or indirect resource for educational technology based on the theory of intuitionistic fuzzy sets…

  5. Approach for Text Classification Based on the Similarity Measurement between Normal Cloud Models

    Directory of Open Access Journals (Sweden)

    Jin Dai

    2014-01-01

    Full Text Available The similarity between objects is the core research area of data mining. In order to reduce the interference of the uncertainty of nature language, a similarity measurement between normal cloud models is adopted to text classification research. On this basis, a novel text classifier based on cloud concept jumping up (CCJU-TC is proposed. It can efficiently accomplish conversion between qualitative concept and quantitative data. Through the conversion from text set to text information table based on VSM model, the text qualitative concept, which is extraction from the same category, is jumping up as a whole category concept. According to the cloud similarity between the test text and each category concept, the test text is assigned to the most similar category. By the comparison among different text classifiers in different feature selection set, it fully proves that not only does CCJU-TC have a strong ability to adapt to the different text features, but also the classification performance is also better than the traditional classifiers.

  6. Site effect classification based on microtremor data analysis using concentration–area fractal model

    Directory of Open Access Journals (Sweden)

    A. Adib

    2014-07-01

    Full Text Available The aim of this study is to classify the site effect using concentration–area (C–A fractal model in Meybod city, Central Iran, based on microtremor data analysis. Log–log plots of the frequency, amplification and vulnerability index (k-g indicate a multifractal nature for the parameters in the area. The results obtained from the C–A fractal modeling reveal that proper soil types are located around the central city. The results derived via the fractal modeling were utilized to improve the Nogoshi's classification results in the Meybod city. The resulted categories are: (1 hard soil and weak rock with frequency of 6.2 to 8 Hz, (2 stiff soil with frequency of about 4.9 to 6.2 Hz, (3 moderately soft soil with the frequency of 2.4 to 4.9 Hz, and (4 soft soil with the frequency lower than 2.4 Hz.

  7. Model-based Clustering of Categorical Time Series with Multinomial Logit Classification

    Science.gov (United States)

    Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea

    2010-09-01

    A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.

  8. Hybrid model based on Genetic Algorithms and SVM applied to variable selection within fruit juice classification.

    Science.gov (United States)

    Fernandez-Lozano, C; Canto, C; Gestal, M; Andrade-Garda, J M; Rabuñal, J R; Dorado, J; Pazos, A

    2013-01-01

    Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected.

  9. In silico screening of estrogen-like chemicals based on different nonlinear classification models.

    Science.gov (United States)

    Liu, Huanxiang; Papa, Ester; Walker, John D; Gramatica, Paola

    2007-07-01

    Increasing concern is being shown by the scientific community, government regulators, and the public about endocrine-disrupting chemicals that are adversely affecting human and wildlife health through a variety of mechanisms. There is a great need for an effective means of rapidly assessing endocrine-disrupting activity, especially estrogen-simulating activity, because of the large number of such chemicals in the environment. In this study, quantitative structure activity relationship (QSAR) models were developed to quickly and effectively identify possible estrogen-like chemicals based on 232 structurally-diverse chemicals (training set) by using several nonlinear classification methodologies (least-square support vector machine (LS-SVM), counter-propagation artificial neural network (CP-ANN), and k nearest neighbour (kNN)) based on molecular structural descriptors. The models were externally validated by 87 chemicals (prediction set) not included in the training set. All three methods can give satisfactory prediction results both for training and prediction sets, and the most accurate model was obtained by the LS-SVM approach through the comparison of performance. In addition, our model was also applied to about 58,000 discrete organic chemicals; about 76% were predicted not to bind to Estrogen Receptor. The obtained results indicate that the proposed QSAR models are robust, widely applicable and could provide a feasible and practical tool for the rapid screening of potential estrogens.

  10. Classification of thermal waters based on their inorganic fingerprint and hydrogeothermal modelling

    Directory of Open Access Journals (Sweden)

    I. Delgado-Outeiriño

    2011-05-01

    Full Text Available Hydrothermal features in Galicia have been used since ancient times for therapeutic purposes. A characterization of these thermal waters was carried out in order to understand their behaviour based on inorganic pattern and water-rock interaction mechanisms. In this way 15 thermal water samples were collected in the same hydrographical system. The results of the hydrogeochemistry analysis showed one main water family of bicarbonate type sodium waters, typical in the post-orogenic basins of Galicia. Principal component analysis (PCA and partial lest squared (PLS clustered the selected thermal waters in two groups, regarding to their chemical composition. This classification agreed with the results obtained by the use of geothermometers and the hydrogeochemical modelling. The first included thermal samples that could be in contact with surface waters and therefore, their residence time in the reservoir and their water-rock interaction would be less important than for the thermal waters of the second group.

  11. Objects Classification by Learning-Based Visual Saliency Model and Convolutional Neural Network

    Science.gov (United States)

    Li, Na; Yang, Yongjia

    2016-01-01

    Humans can easily classify different kinds of objects whereas it is quite difficult for computers. As a hot and difficult problem, objects classification has been receiving extensive interests with broad prospects. Inspired by neuroscience, deep learning concept is proposed. Convolutional neural network (CNN) as one of the methods of deep learning can be used to solve classification problem. But most of deep learning methods, including CNN, all ignore the human visual information processing mechanism when a person is classifying objects. Therefore, in this paper, inspiring the completed processing that humans classify different kinds of objects, we bring forth a new classification method which combines visual attention model and CNN. Firstly, we use the visual attention model to simulate the processing of human visual selection mechanism. Secondly, we use CNN to simulate the processing of how humans select features and extract the local features of those selected areas. Finally, not only does our classification method depend on those local features, but also it adds the human semantic features to classify objects. Our classification method has apparently advantages in biology. Experimental results demonstrated that our method made the efficiency of classification improve significantly. PMID:27803711

  12. Mel-frequencies Stochastic Model for Gender Classification based on Pitch and Formant

    Directory of Open Access Journals (Sweden)

    Syifaun Nafisah

    2016-02-01

    Full Text Available Speech recognition applications are becoming more and more useful nowadays. Before this technology is applied, the first step is test the system to measure the reliability of system.  The reliability of system can be measured using accuracy to recognize the speaker such as speaker identity or gender.  This paper introduces the stochastic model based on mel-frequencies to identify the gender of speaker in a noisy environment.  The Euclidean minimum distance and back propagation neural networks were used to create a model to recognize the gender from his/her speech signal based on formant and pitch of Mel-frequencies. The system uses threshold technique as identification tool. By using this threshold value, the proposed method can identifies the gender of speaker up to 94.11% and the average of processing duration is 15.47 msec. The implementation result shows a good performance of the proposed technique in gender classification based on speech signal in a noisy environment.

  13. Climatic Classification over Asia during the Middle Holocene Climatic Optimum Based on PMIP Models

    Institute of Scientific and Technical Information of China (English)

    Hyuntaik Oh; Ho-Jeong Shin

    2016-01-01

    ABSTRACT:When considering potential global warming projections, it is useful to understand the im-pact of each climate condition at 6 kyr before present. Asian paleoclimate was simulated by performing an integration of the multi-model ensemble with the paleoclimate modeling intercomparison project (PMIP) models. The reconstructed winter (summer) surface air temperature at 6 kyr before present was 0.85 ºC (0.21 ºC) lower (higher) than the present day over Asia, 60ºE–150ºE, 10ºN–60ºN. The seasonal variation and heating differences of land and ocean in summer at 6 kyr before present might be much larger than present day. The winter and summer precipitation of 6 kyr before present were 0.067 and 0.017 mm·day-1 larger than present day, respectively. The Group B climate, which means the dry climates based on Köppen climate classification, at 6 kyr before present decreased 17%compared to present day, but the Group D which means the continental and microthermal climates at 6 kyr before present increased over 7%. Comparison between the results from the model simulation and published paleo-proxy record agrees within the limited sparse paleo-proxy record data.

  14. Classification of lung cancer tumors based on structural and physicochemical properties of proteins by bioinformatics models.

    Science.gov (United States)

    Hosseinzadeh, Faezeh; Ebrahimi, Mansour; Goliaei, Bahram; Shamabadi, Narges

    2012-01-01

    Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.

  15. Classification of lung cancer tumors based on structural and physicochemical properties of proteins by bioinformatics models.

    Directory of Open Access Journals (Sweden)

    Faezeh Hosseinzadeh

    Full Text Available Rapid distinction between small cell lung cancer (SCLC and non-small cell lung cancer (NSCLC tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.

  16. Discrimination-based Artificial Immune System: Modeling the Learning Mechanism of Self and Non-self Discrimination for Classification

    Directory of Open Access Journals (Sweden)

    Kazushi Igawa

    2007-01-01

    Full Text Available This study presents a new artificial immune system for classification. It was named discrimination-based artificial immune system (DAIS and was based on the principle of self and non-self discrimination by T cells in the human immune system. Ability of a natural immune system to distinguish between self and non-self molecules was applicable for classification in a way that one class was distinguished from others. We model this and the mechanism of the education in a thymus for classification. Especially, we introduce the method to decide the recognition distance threshold of the artificial lymphocyte, as the negative selection algorithm. We apply DAIS to real world datasets and show its performance to be comparable to that of other classifier systems. We conclude that this modeling was appropriate and DAIS was a useful classifier.

  17. Classification model based on mutual information%基于互信息量的分类模型

    Institute of Scientific and Technical Information of China (English)

    张震; 胡学钢

    2011-01-01

    Concerning the relevance between the attributes and the contribution difference of attribute values to attribute weights in classification dataset, an improved classification model and the formulas for calculating the impact factor and sample forecast information were proposed based on mutual information. And the classification model predicted the unlabelled object classes with the sample forecast information. Finally, the experimental results show that the classification model based on mutual information can effectively improve forecast precision and accuracy performance of classification algorithm.%针对分类数据集中属性之间的相关性及每个属性取值对属性权值的贡献程度的差别,提出基于互信息量的分类模型以及影响因子与样本预测信息量的计算公式,并利用样本预测信息量预测分类标号.经实验证明,基于互信息量的分类模型可以有效地提高分类算法的预测精度和准确率.

  18. hERG classification model based on a combination of support vector machine method and GRIND descriptors

    DEFF Research Database (Denmark)

    Li, Qiyuan; Jorgensen, Flemming Steen; Oprea, Tudor

    2008-01-01

    invest substantial effort in the assessment of cardiac toxicity of drugs. The development of in silico tools to filter out potential hERG channel inhibitors in earlystages of the drug discovery process is of considerable interest. Here, we describe binary classification models based on a large...

  19. Learning Item-Attribute Relationship in Q-Matrix Based Diagnostic Classification Models

    CERN Document Server

    Liu, Jingchen; Ying, Zhiliang

    2011-01-01

    Recent surge of interests in cognitive assessment has led to the developments of novel statistical models for diagnostic classification. Central to many such models is the well-known Q-matrix, which specifies the item-attribute relationship. This paper proposes a principled estimation procedure for the Q-matrix and related model parameters. Desirable theoretic properties are established through large sample analysis. The proposed method also provides a platform under which important statistical issues, such as hypothesis testing and model selection, can be addressed.

  20. BClass: A Bayesian Approach Based on Mixture Models for Clustering and Classification of Heterogeneous Biological Data

    Directory of Open Access Journals (Sweden)

    Arturo Medrano-Soto

    2004-12-01

    Full Text Available Based on mixture models, we present a Bayesian method (called BClass to classify biological entities (e.g. genes when variables of quite heterogeneous nature are analyzed. Various statistical distributions are used to model the continuous/categorical data commonly produced by genetic experiments and large-scale genomic projects. We calculate the posterior probability of each entry to belong to each element (group in the mixture. In this way, an original set of heterogeneous variables is transformed into a set of purely homogeneous characteristics represented by the probabilities of each entry to belong to the groups. The number of groups in the analysis is controlled dynamically by rendering the groups as 'alive' and 'dormant' depending upon the number of entities classified within them. Using standard Metropolis-Hastings and Gibbs sampling algorithms, we constructed a sampler to approximate posterior moments and grouping probabilities. Since this method does not require the definition of similarity measures, it is especially suitable for data mining and knowledge discovery in biological databases. We applied BClass to classify genes in RegulonDB, a database specialized in information about the transcriptional regulation of gene expression in the bacterium Escherichia coli. The classification obtained is consistent with current knowledge and allowed prediction of missing values for a number of genes. BClass is object-oriented and fully programmed in Lisp-Stat. The output grouping probabilities are analyzed and interpreted using graphical (dynamically linked plots and query-based approaches. We discuss the advantages of using Lisp-Stat as a programming language as well as the problems we faced when the data volume increased exponentially due to the ever-growing number of genomic projects.

  1. High speed classification of individual bacterial cells using a model-based light scatter system and multivariate statistics

    Science.gov (United States)

    Venkatapathi, Murugesan; Rajwa, Bartek; Ragheb, Kathy; Banada, Padmapriya P.; Lary, Todd; Robinson, J. Paul; Hirleman, E. Daniel

    2008-02-01

    We describe a model-based instrument design combined with a statistical classification approach for the development and realization of high speed cell classification systems based on light scatter. In our work, angular light scatter from cells of four bacterial species of interest, Bacillus subtilis, Escherichia coli, Listeria innocua, and Enterococcus faecalis, was modeled using the discrete dipole approximation. We then optimized a scattering detector array design subject to some hardware constraints, configured the instrument, and gathered experimental data from the relevant bacterial cells. Using these models and experiments, it is shown that optimization using a nominal bacteria model (i.e., using a representative size and refractive index) is insufficient for classification of most bacteria in realistic applications. Hence the computational predictions were constituted in the form of scattering-data-vector distributions that accounted for expected variability in the physical properties between individual bacteria within the four species. After the detectors were optimized using the numerical results, they were used to measure scatter from both the known control samples and unknown bacterial cells. A multivariate statistical method based on a support vector machine (SVM) was used to classify the bacteria species based on light scatter signatures. In our final instrument, we realized correct classification of B. subtilis in the presence of E. coli,L. innocua, and E. faecalis using SVM at 99.1%, 99.6%, and 98.5%, respectively, in the optimal detector array configuration. For comparison, the corresponding values for another set of angles were only 69.9%, 71.7%, and 70.2% using SVM, and more importantly, this improved performance is consistent with classification predictions.

  2. Modeling Wood Fibre Length in Black Spruce (Picea mariana (Mill. BSP Based on Ecological Land Classification

    Directory of Open Access Journals (Sweden)

    Elisha Townshend

    2015-09-01

    Full Text Available Effective planning to optimize the forest value chain requires accurate and detailed information about the resource; however, estimates of the distribution of fibre properties on the landscape are largely unavailable prior to harvest. Our objective was to fit a model of the tree-level average fibre length related to ecosite classification and other forest inventory variables depicted at the landscape scale. A series of black spruce increment cores were collected at breast height from trees in nine different ecosite groups within the boreal forest of northeastern Ontario, and processed using standard techniques for maceration and fibre length measurement. Regression tree analysis and random forests were used to fit hierarchical classification models and find the most important predictor variables for the response variable area-weighted mean stem-level fibre length. Ecosite group was the best predictor in the regression tree. Longer mean fibre-length was associated with more productive ecosites that supported faster growth. The explanatory power of the model of fitted data was good; however, random forests simulations indicated poor generalizability. These results suggest the potential to develop localized models linking wood fibre length in black spruce to landscape-level attributes, and improve the sustainability of forest management by identifying ideal locations to harvest wood that has desirable fibre characteristics.

  3. Application of the probability-based covering algorithm model in text classification

    Institute of Scientific and Technical Information of China (English)

    ZHOU; Ying

    2009-01-01

    The probability-based covering algorithm(PBCA)is a new algorithm based on probability distribution.It decides,by voting,the class of the tested samples on the border of the coverage area,based on the probability of training samples.When using the original covering algorithm(CA),many tested samples that are located on the border of the coverage cannot be classified by the spherical neighborhood gained.The network structure of PBCA is a mixed structure composed of both a feed-forward network and a feedback network.By using this method of adding some heterogeneous samples and enlarging the coverage radius,it is possible to decrease the number of rejected samples and improve the rate of recognition accuracy.Relevant computer experiments indicate that the algorithm improves the study precision and achieves reasonably good results in text classification.

  4. Vineyard parcel identification from Worldview-2 images using object-based classification model

    Science.gov (United States)

    Sertel, Elif; Yay, Irmak

    2014-01-01

    Accurate identification of spatial distribution and characteristics of vineyard parcels is an important task for the effective management of vineyard areas, precision viticulture, and farmer registries. This study aimed to develop rule sets to be used in object-based classification of Worldview-2 satellite images to accurately delineate the boundaries of vineyards having different plantation styles. Multilevel segmentation was applied to Worldview-2 images to create different sizes of image objects representing different land cover categories with respect to scale parameter. Texture analysis and several new spectral indices were applied to objects at different segmentation levels to accurately classify land cover classes of forest, cultivated areas, harvested areas, impervious, bareland, and vineyards. A specific attention was given to vineyard class to identify vine areas at the parcel level considering their different plantation styles. The results illustrated that the combined usage of a newly developed decision tree and image segmentation during the object-based classification process could provide highly accurate results for the identification of vineyard parcels. Linearly planted vineyards could be classified with 100% producer's accuracy due to their regular textural characteristics, whereas regular gridwise and irregular gridwise (distributed) vineyard parcels could be classified with 94.87% producer's accuracy in this research.

  5. Emotion models for textual emotion classification

    Science.gov (United States)

    Bruna, O.; Avetisyan, H.; Holub, J.

    2016-11-01

    This paper deals with textual emotion classification which gained attention in recent years. Emotion classification is used in user experience, product evaluation, national security, and tutoring applications. It attempts to detect the emotional content in the input text and based on different approaches establish what kind of emotional content is present, if any. Textual emotion classification is the most difficult to handle, since it relies mainly on linguistic resources and it introduces many challenges to assignment of text to emotion represented by a proper model. A crucial part of each emotion detector is emotion model. Focus of this paper is to introduce emotion models used for classification. Categorical and dimensional models of emotion are explained and some more advanced approaches are mentioned.

  6. Biogeography based Satellite Image Classification

    CERN Document Server

    Panchal, V K; Kaur, Navdeep; Kundra, Harish

    2009-01-01

    Biogeography is the study of the geographical distribution of biological organisms. The mindset of the engineer is that we can learn from nature. Biogeography Based Optimization is a burgeoning nature inspired technique to find the optimal solution of the problem. Satellite image classification is an important task because it is the only way we can know about the land cover map of inaccessible areas. Though satellite images have been classified in past by using various techniques, the researchers are always finding alternative strategies for satellite image classification so that they may be prepared to select the most appropriate technique for the feature extraction task in hand. This paper is focused on classification of the satellite image of a particular land cover using the theory of Biogeography based Optimization. The original BBO algorithm does not have the inbuilt property of clustering which is required during image classification. Hence modifications have been proposed to the original algorithm and...

  7. Fuzzy One-Class Classification Model Using Contamination Neighborhoods

    Directory of Open Access Journals (Sweden)

    Lev V. Utkin

    2012-01-01

    Full Text Available A fuzzy classification model is studied in the paper. It is based on the contaminated (robust model which produces fuzzy expected risk measures characterizing classification errors. Optimal classification parameters of the models are derived by minimizing the fuzzy expected risk. It is shown that an algorithm for computing the classification parameters is reduced to a set of standard support vector machine tasks with weighted data points. Experimental results with synthetic data illustrate the proposed fuzzy model.

  8. Development of an object-based classification model for mapping mountainous forest cover at high elevation using aerial photography

    Science.gov (United States)

    Lateb, Mustapha; Kalaitzidis, Chariton; Tompoulidou, Maria; Gitas, Ioannis

    2016-08-01

    Climate change and overall temperature increase results in changes in forest cover in high elevations. Due to the long life cycle of trees, these changes are very gradual and can be observed over long periods of time. In order to use remote sensing imagery for this purpose it needs to have very high spatial resolution and to have been acquired at least 50 years ago. At the moment, the only type of remote sensing imagery with these characteristics is historical black and white aerial photographs. This study used an aerial photograph from 1945 in order to map the forest cover at the Olympus National Park, at that date. An object-based classification (OBC) model was developed in order to classify forest and discriminate it from other types of vegetation. Due to the lack of near-infrared information, the model had to rely solely on the tone of the objects, as well as their geometric characteristics. The model functioned on three segmentation levels, using sub-/super-objects relationships and utilising vegetation density to discriminate forest and non-forest vegetation. The accuracy of the classification was assessed using 503 visually interpreted and randomly distributed points, resulting in a 92% overall accuracy. The model is using unbiased parameters that are important for differentiating between forest and non-forest vegetation and should be transferrable to other study areas of mountainous forests at high elevations.

  9. Multinomial mixture model with heterogeneous classification probabilities

    Science.gov (United States)

    Holland, M.D.; Gray, B.R.

    2011-01-01

    Royle and Link (Ecology 86(9):2505-2512, 2005) proposed an analytical method that allowed estimation of multinomial distribution parameters and classification probabilities from categorical data measured with error. While useful, we demonstrate algebraically and by simulations that this method yields biased multinomial parameter estimates when the probabilities of correct category classifications vary among sampling units. We address this shortcoming by treating these probabilities as logit-normal random variables within a Bayesian framework. We use Markov chain Monte Carlo to compute Bayes estimates from a simulated sample from the posterior distribution. Based on simulations, this elaborated Royle-Link model yields nearly unbiased estimates of multinomial and correct classification probability estimates when classification probabilities are allowed to vary according to the normal distribution on the logit scale or according to the Beta distribution. The method is illustrated using categorical submersed aquatic vegetation data. ?? 2010 Springer Science+Business Media, LLC.

  10. Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study

    Directory of Open Access Journals (Sweden)

    Sun Youting

    2009-01-01

    Full Text Available Many missing-value (MV imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in previous studies. Six popular imputation algorithms, two feature selection methods, and three classification rules are considered. The results suggest that it is beneficial to apply MV imputation when the noise level is high, variance is small, or gene-cluster correlation is strong, under small to moderate MV rates. In these cases, if data quality metrics are available, then it may be helpful to consider the data point with poor quality as missing and apply one of the most robust imputation algorithms to estimate the true signal based on the available high-quality data points. However, at large MV rates, we conclude that imputation methods are not recommended. Regarding the MV rate, our results indicate the presence of a peaking phenomenon: performance of imputation methods actually improves initially as the MV rate increases, but after an optimum point, performance quickly deteriorates with increasing MV rates.

  11. Quantitative structure-activity relationship modeling of polycyclic aromatic hydrocarbon mutagenicity by classification methods based on holistic theoretical molecular descriptors.

    Science.gov (United States)

    Gramatica, Paola; Papa, Ester; Marrocchi, Assunta; Minuti, Lucio; Taticchi, Aldo

    2007-03-01

    Various polycyclic aromatic hydrocarbons (PAHs), ubiquitous environmental pollutants, are recognized mutagens and carcinogens. A homogeneous set of mutagenicity data (TA98 and TA100,+S9) for 32 benzocyclopentaphenanthrenes/chrysenes was modeled by the quantitative structure-activity relationship classification methods k-nearest neighbor and classification and regression tree, using theoretical holistic molecular descriptors. Genetic algorithm provided the selection of the best subset of variables for modeling mutagenicity. The models were validated by leave-one-out and leave-50%-out approaches and have good performance, with sensitivity and specificity ranges of 90-100%. Mutagenicity assessment for these PAHs requires only a few theoretical descriptors of their molecular structure.

  12. Sentiment classification technology based on Markov logic networks

    Science.gov (United States)

    He, Hui; Li, Zhigang; Yao, Chongchong; Zhang, Weizhe

    2016-07-01

    With diverse online media emerging, there is a growing concern of sentiment classification problem. At present, text sentiment classification mainly utilizes supervised machine learning methods, which feature certain domain dependency. On the basis of Markov logic networks (MLNs), this study proposed a cross-domain multi-task text sentiment classification method rooted in transfer learning. Through many-to-one knowledge transfer, labeled text sentiment classification, knowledge was successfully transferred into other domains, and the precision of the sentiment classification analysis in the text tendency domain was improved. The experimental results revealed the following: (1) the model based on a MLN demonstrated higher precision than the single individual learning plan model. (2) Multi-task transfer learning based on Markov logical networks could acquire more knowledge than self-domain learning. The cross-domain text sentiment classification model could significantly improve the precision and efficiency of text sentiment classification.

  13. Modulation classification based on spectrogram

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    The aim of modulation classification (MC) is to identify the modulation type of a communication signal. It plays an important role in many cooperative or noncooperative communication applications. Three spectrogram-based modulation classification methods are proposed. Their reccgnition scope and performance are investigated or evaluated by theoretical analysis and extensive simulation studies. The method taking moment-like features is robust to frequency offset while the other two, which make use of principal component analysis (PCA) with different transformation inputs,can achieve satisfactory accuracy even at low SNR (as low as 2 dB). Due to the properties of spectrogram, the statistical pattern recognition techniques, and the image preprocessing steps, all of our methods are insensitive to unknown phase and frequency offsets, timing errors, and the arriving sequence of symbols.

  14. Predicting student satisfaction with courses based on log data from a virtual learning environment – a neural network and classification tree model

    Directory of Open Access Journals (Sweden)

    Ivana Đurđević Babić

    2015-03-01

    Full Text Available Student satisfaction with courses in academic institutions is an important issue and is recognized as a form of support in ensuring effective and quality education, as well as enhancing student course experience. This paper investigates whether there is a connection between student satisfaction with courses and log data on student courses in a virtual learning environment. Furthermore, it explores whether a successful classification model for predicting student satisfaction with course can be developed based on course log data and compares the results obtained from implemented methods. The research was conducted at the Faculty of Education in Osijek and included analysis of log data and course satisfaction on a sample of third and fourth year students. Multilayer Perceptron (MLP with different activation functions and Radial Basis Function (RBF neural networks as well as classification tree models were developed, trained and tested in order to classify students into one of two categories of course satisfaction. Type I and type II errors, and input variable importance were used for model comparison and classification accuracy. The results indicate that a successful classification model using tested methods can be created. The MLP model provides the highest average classification accuracy and the lowest preference in misclassification of students with a low level of course satisfaction, although a t-test for the difference in proportions showed that the difference in performance between the compared models is not statistically significant. Student involvement in forum discussions is recognized as a valuable predictor of student satisfaction with courses in all observed models.

  15. The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

    DEFF Research Database (Denmark)

    Udatha, D.B.R.K. Gupta; Kouskoumvekaki, Irene; Olsson, Lisbeth

    2011-01-01

    classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified...... on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training...... sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new...

  16. Models for concurrency: towards a classification

    DEFF Research Database (Denmark)

    Sassone, Vladimiro; Nielsen, Mogens; Winskel, Glynn

    1996-01-01

    Models for concurrency can be classified with respect to three relevant parameters: behaviour/ system, interleaving/noninterleaving, linear/branching time. When modelling a process, a choice concerning such parameters corresponds to choosing the level of abstraction of the resulting semantics....... In this paper, we move a step towards a classification of models for concurrency based on the parameters above. Formally, we choose a representative of any of the eight classes of models obtained by varying the three parameters, and we study the formal relationships between them using the language of category...

  17. SQL based cardiovascular ultrasound image classification.

    Science.gov (United States)

    Nandagopalan, S; Suryanarayana, Adiga B; Sudarshan, T S B; Chandrashekar, Dhanalakshmi; Manjunath, C N

    2013-01-01

    This paper proposes a novel method to analyze and classify the cardiovascular ultrasound echocardiographic images using Naïve-Bayesian model via database OLAP-SQL. Efficient data mining algorithms based on tightly-coupled model is used to extract features. Three algorithms are proposed for classification namely Naïve-Bayesian Classifier for Discrete variables (NBCD) with SQL, NBCD with OLAP-SQL, and Naïve-Bayesian Classifier for Continuous variables (NBCC) using OLAP-SQL. The proposed model is trained with 207 patient images containing normal and abnormal categories. Out of the three proposed algorithms, a high classification accuracy of 96.59% was achieved from NBCC which is better than the earlier methods.

  18. General regression and representation model for classification.

    Directory of Open Access Journals (Sweden)

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  19. Classification based polynomial image interpolation

    Science.gov (United States)

    Lenke, Sebastian; Schröder, Hartmut

    2008-02-01

    Due to the fast migration of high resolution displays for home and office environments there is a strong demand for high quality picture scaling. This is caused on the one hand by large picture sizes and on the other hand due to an enhanced visibility of picture artifacts on these displays [1]. There are many proposals for an enhanced spatial interpolation adaptively matched to picture contents like e.g. edges. The drawback of these approaches is the normally integer and often limited interpolation factor. In order to achieve rational factors there exist combinations of adaptive and non adaptive linear filters, but due to the non adaptive step the overall quality is notably limited. We present in this paper a content adaptive polyphase interpolation method which uses "offline" trained filter coefficients and an "online" linear filtering depending on a simple classification of the input situation. Furthermore we present a new approach to a content adaptive interpolation polynomial, which allows arbitrary polyphase interpolation factors at runtime and further improves the overall interpolation quality. The main goal of our new approach is to optimize interpolation quality by adapting higher order polynomials directly to the image content. In addition we derive filter constraints for enhanced picture quality. Furthermore we extend the classification based filtering to the temporal dimension in order to use it for an intermediate image interpolation.

  20. 基于SAS的web文本分类模型研究%WEB TEXT CLASSIFICATION MODEL STUDY BASED ON SAS

    Institute of Scientific and Technical Information of China (English)

    向来生; 孙威; 刘希玉

    2016-01-01

    通过建立模型对电商企业的客户查询信息进行文本分类分析,帮助企业掌握用户的消费习惯,同时帮助用户及时找到需要的商品。本文首先获取客户查询数据并对该文本数据进行预处理,利用改进的TF-IDF方法获得文本特征向量,最后结合朴素贝叶斯文本分类及半监督的EM迭代算法建立分类模型,并应用各种标准对模型进行评估,验证模型的有效性。多类别文本集选取文本特征时,关键词权值容易产生波动,本研究改进关键词权值计算公式来改善分类结果。实验结果表明分类器具有良好的分类效果。%In this paper,we establish a model to analysis business enterprise customer query information for text classification to help e-commerce companies control the userˊs spending habits,and help users to find their needed goods. This study accesses to customer inquiry data and preprocesses these text data firstly. And then,the improved TF - IDF principle is applied to obtain the text feature vectors. Finally,this study establishes the classification model combining the Naive Bayes text classification and the semi-supervised EM iterative algorithm, and uses various criteria to evaluate the model. When facing multi - class text classification feature selection, keyword weights prone to great volatility. This study improves the keyword weight calculation formula to perfect the classification results. The experimental results show that classification has good classification effect.

  1. Impact of distance-based metric learning on classification and visualization model performance and structure-activity landscapes

    Science.gov (United States)

    Kireeva, Natalia V.; Ovchinnikova, Svetlana I.; Kuznetsov, Sergey L.; Kazennov, Andrey M.; Tsivadze, Aslan Yu.

    2014-02-01

    This study concerns large margin nearest neighbors classifier and its multi-metric extension as the efficient approaches for metric learning which aimed to learn an appropriate distance/similarity function for considered case studies. In recent years, many studies in data mining and pattern recognition have demonstrated that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. The paper describes application of the metric learning approach to in silico assessment of chemical liabilities. Chemical liabilities, such as adverse effects and toxicity, play a significant role in drug discovery process, in silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Here, to our knowledge for the first time, a distance-based metric learning procedures have been applied for in silico assessment of chemical liabilities, the impact of metric learning on structure-activity landscapes and predictive performance of developed models has been analyzed, the learned metric was used in support vector machines. The metric learning results have been illustrated using linear and non-linear data visualization techniques in order to indicate how the change of metrics affected nearest neighbors relations and descriptor space.

  2. Nonlinear Inertia Classification Model and Application

    Directory of Open Access Journals (Sweden)

    Mei Wang

    2014-01-01

    Full Text Available Classification model of support vector machine (SVM overcomes the problem of a big number of samples. But the kernel parameter and the punishment factor have great influence on the quality of SVM model. Particle swarm optimization (PSO is an evolutionary search algorithm based on the swarm intelligence, which is suitable for parameter optimization. Accordingly, a nonlinear inertia convergence classification model (NICCM is proposed after the nonlinear inertia convergence (NICPSO is developed in this paper. The velocity of NICPSO is firstly defined as the weighted velocity of the inertia PSO, and the inertia factor is selected to be a nonlinear function. NICPSO is used to optimize the kernel parameter and a punishment factor of SVM. Then, NICCM classifier is trained by using the optical punishment factor and the optical kernel parameter that comes from the optimal particle. Finally, NICCM is applied to the classification of the normal state and fault states of online power cable. It is experimentally proved that the iteration number for the proposed NICPSO to reach the optimal position decreases from 15 to 5 compared with PSO; the training duration is decreased by 0.0052 s and the recognition precision is increased by 4.12% compared with SVM.

  3. A Model for Classification Secondary School Student Enrollment Approval Based on E-Learning Management System and E-Games

    Directory of Open Access Journals (Sweden)

    Hany Mohamed El-katary

    2016-02-01

    Full Text Available Student is the key of the educational process, where students’ creativity and interactions are strongly encouraged. There are many tools embedded in Learning Management Systems (LMS that considered as a goal evaluation of learners. A problem that currently appeared is that assessment process is not always fair or accurate in classifying students according to accumulated knowledge. Therefore, there is a need to apply a new model for better decision making for students’ enrollment and assessments. The proposed model may run along with an assessment tool within a LMS. The proposed model performs analysis and obtains knowledge regarding the classification capability of the assessment process. It offers knowledge for course managers regarding the course materials, quizzes, activities and e-games. The proposed model is an accurate assessment tool and thus better classification among learners. The proposed model was developed for learning management systems, which are commonly used in e-learning in Egyptian language schools. The proposed model demonstrated good accuracy compared to real sample data (250 students.

  4. Waste Classification based on Waste Form Heat Generation in Advanced Nuclear Fuel Cycles Using the Fuel-Cycle Integration and Tradeoffs (FIT) Model

    Energy Technology Data Exchange (ETDEWEB)

    Denia Djokic; Steven J. Piet; Layne F. Pincock; Nick R. Soelberg

    2013-02-01

    This study explores the impact of wastes generated from potential future fuel cycles and the issues presented by classifying these under current classification criteria, and discusses the possibility of a comprehensive and consistent characteristics-based classification framework based on new waste streams created from advanced fuel cycles. A static mass flow model, Fuel-Cycle Integration and Tradeoffs (FIT), was used to calculate the composition of waste streams resulting from different nuclear fuel cycle choices. This analysis focuses on the impact of waste form heat load on waste classification practices, although classifying by metrics of radiotoxicity, mass, and volume is also possible. The value of separation of heat-generating fission products and actinides in different fuel cycles is discussed. It was shown that the benefits of reducing the short-term fission-product heat load of waste destined for geologic disposal are neglected under the current source-based radioactive waste classification system , and that it is useful to classify waste streams based on how favorable the impact of interim storage is in increasing repository capacity.

  5. Digital image-based classification of biodiesel.

    Science.gov (United States)

    Costa, Gean Bezerra; Fernandes, David Douglas Sousa; Almeida, Valber Elias; Araújo, Thomas Souto Policarpo; Melo, Jessica Priscila; Diniz, Paulo Henrique Gonçalves Dias; Véras, Germano

    2015-07-01

    This work proposes a simple, rapid, inexpensive, and non-destructive methodology based on digital images and pattern recognition techniques for classification of biodiesel according to oil type (cottonseed, sunflower, corn, or soybean). For this, differing color histograms in RGB (extracted from digital images), HSI, Grayscale channels, and their combinations were used as analytical information, which was then statistically evaluated using Soft Independent Modeling by Class Analogy (SIMCA), Partial Least Squares Discriminant Analysis (PLS-DA), and variable selection using the Successive Projections Algorithm associated with Linear Discriminant Analysis (SPA-LDA). Despite good performances by the SIMCA and PLS-DA classification models, SPA-LDA provided better results (up to 95% for all approaches) in terms of accuracy, sensitivity, and specificity for both the training and test sets. The variables selected Successive Projections Algorithm clearly contained the information necessary for biodiesel type classification. This is important since a product may exhibit different properties, depending on the feedstock used. Such variations directly influence the quality, and consequently the price. Moreover, intrinsic advantages such as quick analysis, requiring no reagents, and a noteworthy reduction (the avoidance of chemical characterization) of waste generation, all contribute towards the primary objective of green chemistry.

  6. Cluster-based adaptive metric classification

    NARCIS (Netherlands)

    Giotis, Ioannis; Petkov, Nicolai

    2012-01-01

    Introducing adaptive metric has been shown to improve the results of distance-based classification algorithms. Existing methods are often computationally intensive, either in the training or in the classification phase. We present a novel algorithm that we call Cluster-Based Adaptive Metric (CLAM) c

  7. Ontology-Based Classification System Development Methodology

    Directory of Open Access Journals (Sweden)

    Grabusts Peter

    2015-12-01

    Full Text Available The aim of the article is to analyse and develop an ontology-based classification system methodology that uses decision tree learning with statement propositionalized attributes. Classical decision tree learning algorithms, as well as decision tree learning with taxonomy and propositionalized attributes have been observed. Thus, domain ontology can be extracted from the data sets and can be used for data classification with the help of a decision tree. The use of ontology methods in decision tree-based classification systems has been researched. Using such methodologies, the classification accuracy in some cases can be improved.

  8. Computerized classification testing with the Rasch model

    NARCIS (Netherlands)

    Eggen, Theo J.H.M.

    2011-01-01

    If classification in a limited number of categories is the purpose of testing, computerized adaptive tests (CATs) with algorithms based on sequential statistical testing perform better than estimation-based CATs (e.g., Eggen & Straetmans, 2000). In these computerized classification tests (CCTs), the

  9. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils.

    Science.gov (United States)

    Devos, Olivier; Downey, Gerard; Duponchel, Ludovic

    2014-04-01

    Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates.

  10. A semi-empirical library of galaxy spectra for Gaia classification based on SDSS data and PÉGASE models

    Science.gov (United States)

    Tsalmantza, P.; Karampelas, A.; Kontizas, M.; Bailer-Jones, C. A. L.; Rocca-Volmerange, B.; Livanou, E.; Bellas-Velidis, I.; Kontizas, E.; Vallenari, A.

    2012-01-01

    Aims: This paper is the third in a series implementing a classification system for Gaia observations of unresolved galaxies. The system makes use of template galaxy spectra in order to determine spectral classes and estimate intrinsic astrophysical parameters. In previous work we used synthetic galaxy spectra produced by PÉGASE.2 code to simulate Gaia observations and to test the performance of support vector machine (SVM) classifiers and parametrizers. Here we produce a semi-empirical library of galaxy spectra by fitting SDSS spectra with the previously produced synthetic libraries. We present (1) the semi-empirical library of galaxy spectra; (2) a comparison between the observed and synthetic spectra; and (3) first results of classification and parametrization experiments with simulated Gaia spectrophotometry of this library. Methods: We use χ2-fitting to fit SDSS galaxy spectra with the synthetic library in order to construct a semi-empirical library of galaxy spectra in which (1) the real spectra are extended by the synthetic ones in order to cover the full wavelength range of Gaia; and (2) astrophysical parameters are assigned to the SDSS spectra by the best fitting synthetic spectrum. The SVM models were trained with and applied to semi-empirical spectra. Tests were performed for the classification of spectral types and the estimation of the most significant galaxy parameters (in particular redshift, mass to light ratio and star formation history). Results: We produce a semi-empirical library of 33 670 galaxy spectra covering the wavelength range 250 to 1050 nm at a sampling of 1 nm or less. Using the results of the fitting of the SDSS spectra with our synthetic library, we investigate the range of the input model parameters that produces spectra which are in good agreement with observations. In general the results are very good for the majority of the synthetic spectra of early type, spiral and irregular galaxies, while they reveal problems in the models

  11. Knowledge-Based Classification in Automated Soil Mapping

    Institute of Scientific and Technical Information of China (English)

    ZHOU BIN; WANG RENCHAO

    2003-01-01

    A machine-learning approach was developed for automated building of knowledge bases for soil resourcesmapping by using a classification tree to generate knowledge from training data. With this method, buildinga knowledge base for automated soil mapping was easier than using the conventional knowledge acquisitionapproach. The knowledge base built by classification tree was used by the knowledge classifier to perform thesoil type classification of Longyou County, Zhejiang Province, China using Landsat TM bi-temporal imagesand GIS data. To evaluate the performance of the resultant knowledge bases, the classification results werecompared to existing soil map based on a field survey. The accuracy assessment and analysis of the resultantsoil maps suggested that the knowledge bases built by the machine-learning method was of good quality formapping distribution model of soil classes over the study area.

  12. Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

    Directory of Open Access Journals (Sweden)

    Wenbo Wang

    2012-09-01

    Full Text Available As an important task of data mining, Classification has been received considerable attention in many applications, such as information retrieval, web searching, etc. The enlarging volumes of information emerging by the progress of technology and the growing individual needs of data mining, makes classifying of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel classification algorithms. This paper introduces the classification algorithms and cloud computing briefly, based on it analyses the bad points of the present parallel classification algorithms, then addresses a new model of parallel classifying algorithms. And it mainly introduces a parallel Naïve Bayes classification algorithm based on MapReduce, which is a simple yet powerful parallel programming technique. The experimental results demonstrate that the proposed algorithm improves the original algorithm performance, and it can process large datasets efficiently on commodity hardware.

  13. Behavior Based Social Dimensions Extraction for Multi-Label Classification.

    Science.gov (United States)

    Li, Le; Xu, Junyi; Xiao, Weidong; Ge, Bin

    2016-01-01

    Classification based on social dimensions is commonly used to handle the multi-label classification task in heterogeneous networks. However, traditional methods, which mostly rely on the community detection algorithms to extract the latent social dimensions, produce unsatisfactory performance when community detection algorithms fail. In this paper, we propose a novel behavior based social dimensions extraction method to improve the classification performance in multi-label heterogeneous networks. In our method, nodes' behavior features, instead of community memberships, are used to extract social dimensions. By introducing Latent Dirichlet Allocation (LDA) to model the network generation process, nodes' connection behaviors with different communities can be extracted accurately, which are applied as latent social dimensions for classification. Experiments on various public datasets reveal that the proposed method can obtain satisfactory classification results in comparison to other state-of-the-art methods on smaller social dimensions.

  14. Classification using Hierarchical Naive Bayes models

    DEFF Research Database (Denmark)

    Langseth, Helge; Dyhre Nielsen, Thomas

    2006-01-01

    Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well-performing set of classifiers is the Naïve Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe...... an instance are conditionally independent given the class of that instance. When this assumption is violated (which is often the case in practice) it can reduce classification accuracy due to “information double-counting” and interaction omission. In this paper we focus on a relatively new set of models...... in the context of classification. Experimental results show that the learned models can significantly improve classification accuracy as compared to other frameworks....

  15. Fuzzy Rule Base System for Software Classification

    Directory of Open Access Journals (Sweden)

    Adnan Shaout

    2013-07-01

    Full Text Available Given the central role that software development plays in the delivery and application of informationtechnology, managers have been focusing on process improvement in the software development area. Thisimprovement has increased the demand for software measures, or metrics to manage the process. Thismetrics provide a quantitative basis for the development and validation of models during the softwaredevelopment process. In this paper a fuzzy rule-based system will be developed to classify java applicationsusing object oriented metrics. The system will contain the following features:Automated method to extract the OO metrics from the source code,Default/base set of rules that can be easily configured via XML file so companies, developers, teamleaders,etc, can modify the set of rules according to their needs,Implementation of a framework so new metrics, fuzzy sets and fuzzy rules can be added or removeddepending on the needs of the end user,General classification of the software application and fine-grained classification of the java classesbased on OO metrics, andTwo interfaces are provided for the system: GUI and command.

  16. Structural Equation Modeling of Classification Managers Based on the Communication Skills and Cultural Intelligence in Sport Organizations

    Directory of Open Access Journals (Sweden)

    Rasool NAZARI

    2015-03-01

    Full Text Available The purpose of this research is to develop structural equation model category managers on communication skills and cultural intelligence agencies had Isfahan Sports. Hence study was of structural equation modeling. The statistical population of this research formed the provincial sports administrators that according formal statistical was 550 people. Research sample size the sample of 207subjects was randomly selected. Cochran's sample size formula was used to determine. Measuring research and Communication Skills (0.81, Cultural Intelligence Scale (0.85 category manager's questionnaire (0.86, respectively. For analysis descriptive and inferential statistics SPSS and LISREL was used. Model results, communication skills, cultural intelligence and athletic directors classification of the fit was good (RMSEA=0.037, GFI= 0.902, AGFI= 0.910, NFT= 0.912. The prerequisite for proper planning to improve communication skills and cultural intelligence managers as influencing exercise essential while the authorial shave the right to choose directors analyst and intuitive strategies for management position shave because it looks better with the managers can be expected to exercise a clearer perspective.

  17. Texture Classification Using Sparse Frame-Based Representations

    Directory of Open Access Journals (Sweden)

    Skretting Karl

    2006-01-01

    Full Text Available A new method for supervised texture classification, denoted by frame texture classification method (FTCM, is proposed. The method is based on a deterministic texture model in which a small image block, taken from a texture region, is modeled as a sparse linear combination of frame elements. FTCM has two phases. In the design phase a frame is trained for each texture class based on given texture example images. The design method is an iterative procedure in which the representation error, given a sparseness constraint, is minimized. In the classification phase each pixel in a test image is labeled by analyzing its spatial neighborhood. This block is represented by each of the frames designed for the texture classes under consideration, and the frame giving the best representation gives the class. The FTCM is applied to nine test images of natural textures commonly used in other texture classification work, yielding excellent overall performance.

  18. Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177

  19. Knowledge discovery from patients' behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services.

    Science.gov (United States)

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.

  20. Obtaining Diagnostic Classification Model Estimates Using Mplus

    Science.gov (United States)

    Templin, Jonathan; Hoffman, Lesa

    2013-01-01

    Diagnostic classification models (aka cognitive or skills diagnosis models) have shown great promise for evaluating mastery on a multidimensional profile of skills as assessed through examinee responses, but continued development and application of these models has been hindered by a lack of readily available software. In this article we…

  1. Ontology-Based Classification System Development Methodology

    OpenAIRE

    2015-01-01

    The aim of the article is to analyse and develop an ontology-based classification system methodology that uses decision tree learning with statement propositionalized attributes. Classical decision tree learning algorithms, as well as decision tree learning with taxonomy and propositionalized attributes have been observed. Thus, domain ontology can be extracted from the data sets and can be used for data classification with the help of a decision tree. The use of ontology methods in decision ...

  2. An Authentication Technique Based on Classification

    Institute of Scientific and Technical Information of China (English)

    李钢; 杨杰

    2004-01-01

    We present a novel watermarking approach based on classification for authentication, in which a watermark is embedded into the host image. When the marked image is modified, the extracted watermark is also different to the original watermark, and different kinds of modification lead to different extracted watermarks. In this paper, different kinds of modification are considered as classes, and we used classification algorithm to recognize the modifications with high probability. Simulation results show that the proposed method is potential and effective.

  3. Use of topographic and climatological models in a geographical data base to improve Landsat MSS classification for Olympic National Park

    Science.gov (United States)

    Cibula, William G.; Nyquist, Maurice O.

    1987-01-01

    An unsupervised computer classification of vegetation/landcover of Olympic National Park and surrounding environs was initially carried out using four bands of Landsat MSS data. The primary objective of the project was to derive a level of landcover classifications useful for park management applications while maintaining an acceptably high level of classification accuracy. Initially, nine generalized vegetation/landcover classes were derived. Overall classification accuracy was 91.7 percent. In an attempt to refine the level of classification, a geographic information system (GIS) approach was employed. Topographic data and watershed boundaries (inferred precipitation/temperature) data were registered with the Landsat MSS data. The resultant boolean operations yielded 21 vegetation/landcover classes while maintaining the same level of classification accuracy. The final classification provided much better identification and location of the major forest types within the park at the same high level of accuracy, and these met the project objective. This classification could now become inputs into a GIS system to help provide answers to park management coupled with other ancillary data programs such as fire management.

  4. Text Opinion Classification Method Based on Emotion Model%基于情感模型的文本意见分类方法

    Institute of Scientific and Technical Information of China (English)

    罗邦慧; 曾剑平; 段江娇; 吴承荣

    2015-01-01

    基于向量空间模型、潜在语义分析等传统文本意见分类模型将文本映射到词汇或语义空间中,侧重于词汇的辨别能力,无法对映像空间给出明确的语义说明,导致其扩展性、准确率等方面的性能受到限制。为此,在人类情感分类理论的基础上,假设文本中的意见表达与人们的情感存在较强的关联,结合词汇语义扩展、特征选择等方法构造3种情感表示模型,把表达人类情感倾向的文本转换到情感空间中,利用情感模型对国外股票论坛信息提取情感特征,构建情感模型,并设计文本意见分类方法。针对实际股票论坛的数据进行实验,结果表明,该分类方法能获得较高的分类准确率。%Traditional text classification models and latent semantic analysis model map text to vocabulary text or semantic space,focusing on the ability to distinguish words. But it can not give a clear image of semantic description of the space. As a result,the scalability and accuracy of a text classification algorithm is limited. In this paper,based on the classification of human emotions in psychology, it assumes that there is a strong association between emotions and opinions. It uses lexical semantic extension and feature selection methods to build three emotional representation model, and maps documents which can express human emotions tended to the emotional space. Using emotion features in stock message board obtained by the emotional representation model,it builds the emotion space model and designs opinion classification method. Experimental results on actual stock forum show that the classification accuracy of this method is high.

  5. Key-phrase based classification of public health web pages.

    Science.gov (United States)

    Dolamic, Ljiljana; Boyer, Célia

    2013-01-01

    This paper describes and evaluates the public health web pages classification model based on key phrase extraction and matching. Easily extendible both in terms of new classes as well as the new language this method proves to be a good solution for text classification faced with the total lack of training data. To evaluate the proposed solution we have used a small collection of public health related web pages created by a double blind manual classification. Our experiments have shown that by choosing the adequate threshold value the desired value for either precision or recall can be achieved.

  6. Support vector classification algorithm based on variable parameter linear programming

    Institute of Scientific and Technical Information of China (English)

    Xiao Jianhua; Lin Jian

    2007-01-01

    To solve the problems of SVM in dealing with large sample size and asymmetric distributed samples, a support vector classification algorithm based on variable parameter linear programming is proposed.In the proposed algorithm, linear programming is employed to solve the optimization problem of classification to decrease the computation time and to reduce its complexity when compared with the original model.The adjusted punishment parameter greatly reduced the classification error resulting from asymmetric distributed samples and the detailed procedure of the proposed algorithm is given.An experiment is conducted to verify whether the proposed algorithm is suitable for asymmetric distributed samples.

  7. Distance-based features in pattern classification

    Directory of Open Access Journals (Sweden)

    Lin Wei-Yang

    2011-01-01

    Full Text Available Abstract In data mining and pattern classification, feature extraction and representation methods are a very important step since the extracted features have a direct and significant impact on the classification accuracy. In literature, numbers of novel feature extraction and representation methods have been proposed. However, many of them only focus on specific domain problems. In this article, we introduce a novel distance-based feature extraction method for various pattern classification problems. Specifically, two distances are extracted, which are based on (1 the distance between the data and its intra-cluster center and (2 the distance between the data and its extra-cluster centers. Experiments based on ten datasets containing different numbers of classes, samples, and dimensions are examined. The experimental results using naïve Bayes, k-NN, and SVM classifiers show that concatenating the original features provided by the datasets to the distance-based features can improve classification accuracy except image-related datasets. In particular, the distance-based features are suitable for the datasets which have smaller numbers of classes, numbers of samples, and the lower dimensionality of features. Moreover, two datasets, which have similar characteristics, are further used to validate this finding. The result is consistent with the first experiment result that adding the distance-based features can improve the classification performance.

  8. 基于建模仿真的战车分类算法研究%Research on the Military Vehicle Classification Algorithm Based on Modeling and Simulation

    Institute of Scientific and Technical Information of China (English)

    马云飞

    2014-01-01

    Recognition and classification of military vehicle is an important research content of information acquirement in battlefield. In order to collect data and study military vehicle classification algorithm, real external field experiment mode is popularly used, however it needs long time and expensive costs. In this paper, the tank model, armored vehicle model and truck model are built in the virtual battlefield simulation platform. The noise signals, magnetic field signals and vibration signals of the military vehicles in simulation environment are collected and used as the sample data for the research of military vehicle classification algorithm. In the same time this paper designs a classification algorithm of military vehicle based on the one-to-one multi-class SVM, and gives out an adjustment strategy for classifier parameters based on the cross-validation method. The experiment results show that, compared to AdaBoost algorithm, the present algorithm has higher classification accuracy on military vehicles.%战车类型的识别分类是现代情报获取的重要研究内容。为了获得数据并研究战车分类算法,常进行外场真实实验,但其时间长、耗资巨大。本文在虚拟战场仿真平台上建立坦克、装甲车、运兵车三种战车模型。利用仿真环境中的战车噪声、磁场、振动特征信号作为样本数据,进行战车的分类算法研究。同时基于一对一多分类支持向量机,设计了一种战车分类算法,并给出了分类器交叉验证参数调整策略。实验表明,相比于AdaBoost算法,文章提出的战车分类算法的分类准确率较高。

  9. Texture Classification based on Gabor Wavelet

    Directory of Open Access Journals (Sweden)

    Amandeep Kaur

    2012-07-01

    Full Text Available This paper presents the comparison of Texture classification algorithms based on Gabor Wavelets. The focus of this paper is on feature extraction scheme for texture classification. The texture feature for an image can be classified using texture descriptors. In this paper we have used Homogeneous texture descriptor that uses Gabor Wavelets concept. For texture classification, we have used online texture database that is Brodatz’s database and three advanced well known classifiers: Support Vector Machine, K-nearest neighbor method and decision tree induction method. The results shows that classification using Support vector machines gives better results as compare to the other classifiers. It can accurately discriminate between a testing image data and training data.

  10. Inventory classification based on decoupling points

    Directory of Open Access Journals (Sweden)

    Joakim Wikner

    2015-01-01

    Full Text Available The ideal state of continuous one-piece flow may never be achieved. Still the logistics manager can improve the flow by carefully positioning inventory to buffer against variations. Strategies such as lean, postponement, mass customization, and outsourcing all rely on strategic positioning of decoupling points to separate forecast-driven from customer-order-driven flows. Planning and scheduling of the flow are also based on classification of decoupling points as master scheduled or not. A comprehensive classification scheme for these types of decoupling points is introduced. The approach rests on identification of flows as being either demand based or supply based. The demand or supply is then combined with exogenous factors, classified as independent, or endogenous factors, classified as dependent. As a result, eight types of strategic as well as tactical decoupling points are identified resulting in a process-based framework for inventory classification that can be used for flow design.

  11. Blood-based gene expression profiles models for classification of subsyndromal symptomatic depression and major depressive disorder.

    Science.gov (United States)

    Yi, Zhenghui; Li, Zezhi; Yu, Shunying; Yuan, Chengmei; Hong, Wu; Wang, Zuowei; Cui, Jian; Shi, Tieliu; Fang, Yiru

    2012-01-01

    Subsyndromal symptomatic depression (SSD) is a subtype of subthreshold depressive and also lead to significant psychosocial functional impairment as same as major depressive disorder (MDD). Several studies have suggested that SSD is a transitory phenomena in the depression spectrum and is thus considered a subtype of depression. However, the pathophysioloy of depression remain largely obscure and studies on SSD are limited. The present study compared the expression profile and made the classification with the leukocytes by using whole-genome cRNA microarrays among drug-free first-episode subjects with SSD, MDD, and matched controls (8 subjects in each group). Support vector machines (SVMs) were utilized for training and testing on candidate signature expression profiles from signature selection step. Firstly, we identified 63 differentially expressed SSD signatures in contrast to control (Pbiomarkers for SSD and MDD together, we selected top gene signatures from each group of pair-wise comparison results, and merged the signatures together to generate better profiles used for clearly classify SSD and MDD sets in the same time. In details, we tried different combination of signatures from the three pair-wise compartmental results and finally determined 48 gene expression signatures with 100% accuracy. Our finding suggested that SSD and MDD did not exhibit the same expressed genome signature with peripheral blood leukocyte, and blood cell-derived RNA of these 48 gene models may have significant value for performing diagnostic functions and classifying SSD, MDD, and healthy controls.

  12. Full-polarization radar remote sensing and data mining for tropical crops mapping: a successful SVM-based classification model

    Science.gov (United States)

    Denize, J.; Corgne, S.; Todoroff, P.; LE Mezo, L.

    2015-12-01

    In Reunion, a tropical island of 2,512 km², 700 km east of Madagascar in the Indian Ocean, constrained by a rugged relief, agricultural sectors are competing in highly fragmented agricultural land constituted by heterogeneous farming systems from corporate to small-scale farming. Policymakers, planners and institutions are in dire need of reliable and updated land use references. Actually conventional land use mapping methods are inefficient under the tropic with frequent cloud cover and loosely synchronous vegetative cycles of the crops due to a constant temperature. This study aims to provide an appropriate method for the identification and mapping of tropical crops by remote sensing. For this purpose, we assess the potential of polarimetric SAR imagery associated with associated with machine learning algorithms. The method has been developed and tested on a study area of 25*25 km thanks to 6 RADARSAT-2 images in 2014 in full-polarization. A set of radar indicators (backscatter coefficient, bands ratios, indices, polarimetric decompositions (Freeman-Durden, Van zyl, Yamaguchi, Cloude and Pottier, Krogager), texture, etc.) was calculated from the coherency matrix. A random forest procedure allowed the selection of the most important variables on each images to reduce the dimension of the dataset and the processing time. Support Vector Machines (SVM), allowed the classification of these indicators based on a learning database created from field observations in 2013. The method shows an overall accuracy of 88% with a Kappa index of 0.82 for the identification of four major crops.

  13. Visual words based approach for tissue classification in mammograms

    Science.gov (United States)

    Diamant, Idit; Goldberger, Jacob; Greenspan, Hayit

    2013-02-01

    The presence of Microcalcifications (MC) is an important indicator for developing breast cancer. Additional indicators for cancer risk exist, such as breast tissue density type. Different methods have been developed for breast tissue classification for use in Computer-aided diagnosis systems. Recently, the visual words (VW) model has been successfully applied for different classification tasks. The goal of our work is to explore VW based methodologies for various mammography classification tasks. We start with the challenge of classifying breast density and then focus on classification of normal tissue versus Microcalcifications. The presented methodology is based on patch-based visual words model which includes building a dictionary for a training set using local descriptors and representing the image using a visual word histogram. Classification is then performed using k-nearest-neighbour (KNN) and Support vector machine (SVM) classifiers. We tested our algorithm on the MIAS and DDSM publicly available datasets. The input is a representative region-of-interest per mammography image, manually selected and labelled by expert. In the tissue density task, classification accuracy reached 85% using KNN and 88% using SVM, which competes with the state-of-the-art results. For MC vs. normal tissue, accuracy reached 95.6% using SVM. Results demonstrate the feasibility to classify breast tissue using our model. Currently, we are improving the results further while also investigating VW capability to classify additional important mammogram classification problems. We expect that the methodology presented will enable high levels of classification, suggesting new means for automated tools for mammography diagnosis support.

  14. Blood-based gene expression profiles models for classification of subsyndromal symptomatic depression and major depressive disorder.

    Directory of Open Access Journals (Sweden)

    Zhenghui Yi

    Full Text Available Subsyndromal symptomatic depression (SSD is a subtype of subthreshold depressive and also lead to significant psychosocial functional impairment as same as major depressive disorder (MDD. Several studies have suggested that SSD is a transitory phenomena in the depression spectrum and is thus considered a subtype of depression. However, the pathophysioloy of depression remain largely obscure and studies on SSD are limited. The present study compared the expression profile and made the classification with the leukocytes by using whole-genome cRNA microarrays among drug-free first-episode subjects with SSD, MDD, and matched controls (8 subjects in each group. Support vector machines (SVMs were utilized for training and testing on candidate signature expression profiles from signature selection step. Firstly, we identified 63 differentially expressed SSD signatures in contrast to control (P< = 5.0E-4 and 30 differentially expressed MDD signatures in contrast to control, respectively. Then, 123 gene signatures were identified with significantly differential expression level between SSD and MDD. Secondly, in order to conduct priority selection for biomarkers for SSD and MDD together, we selected top gene signatures from each group of pair-wise comparison results, and merged the signatures together to generate better profiles used for clearly classify SSD and MDD sets in the same time. In details, we tried different combination of signatures from the three pair-wise compartmental results and finally determined 48 gene expression signatures with 100% accuracy. Our finding suggested that SSD and MDD did not exhibit the same expressed genome signature with peripheral blood leukocyte, and blood cell-derived RNA of these 48 gene models may have significant value for performing diagnostic functions and classifying SSD, MDD, and healthy controls.

  15. Music genre classification via likelihood fusion from multiple feature models

    Science.gov (United States)

    Shiu, Yu; Kuo, C.-C. J.

    2005-01-01

    Music genre provides an efficient way to index songs in a music database, and can be used as an effective means to retrieval music of a similar type, i.e. content-based music retrieval. A new two-stage scheme for music genre classification is proposed in this work. At the first stage, we examine a couple of different features, construct their corresponding parametric models (e.g. GMM and HMM) and compute their likelihood functions to yield soft classification results. In particular, the timbre, rhythm and temporal variation features are considered. Then, at the second stage, these soft classification results are integrated to result in a hard decision for final music genre classification. Experimental results are given to demonstrate the performance of the proposed scheme.

  16. Texture Image Classification Based on Gabor Wavelet

    Institute of Scientific and Technical Information of China (English)

    DENG Wei-bing; LI Hai-fei; SHI Ya-li; YANG Xiao-hui

    2014-01-01

    For a texture image, by recognizining the class of every pixel of the image, it can be partitioned into disjoint regions of uniform texture. This paper proposed a texture image classification algorithm based on Gabor wavelet. In this algorithm, characteristic of every image is obtained through every pixel and its neighborhood of this image. And this algorithm can achieve the information transform between different sizes of neighborhood. Experiments on standard Brodatz texture image dataset show that our proposed algorithm can achieve good classification rates.

  17. Density Based Support Vector Machines for Classification

    Directory of Open Access Journals (Sweden)

    Zahra Nazari

    2015-04-01

    Full Text Available Support Vector Machines (SVM is the most successful algorithm for classification problems. SVM learns the decision boundary from two classes (for Binary Classification of training points. However, sometimes there are some less meaningful samples amongst training points, which are corrupted by noises or misplaced in wrong side, called outliers. These outliers are affecting on margin and classification performance, and machine should better to discard them. SVM as a popular and widely used classification algorithm is very sensitive to these outliers and lacks the ability to discard them. Many research results prove this sensitivity which is a weak point for SVM. Different approaches are proposed to reduce the effect of outliers but no method is suitable for all types of data sets. In this paper, the new method of Density Based SVM (DBSVM is introduced. Population Density is the basic concept which is used in this method for both linear and non-linear SVM to detect outliers. Experiments on artificial data sets, real high-dimensional benchmark data sets of Liver disorder and Heart disease, and data sets of new and fatigued banknotes’ acoustic signals can prove the efficiency of this method on noisy data classification and the better generalization that it can provide compared to the standard SVM.

  18. Classification of signature-only signature models

    Institute of Scientific and Technical Information of China (English)

    CAO ZhengJun; LIU MuLan

    2008-01-01

    We introduce a set of criterions for classifying signature-only signature models. By the criterions, we classify signature models into 5 basic types and 69 general classes. Theoretically, 21141 kinds of signature models can be derived by appro-priately combining different general classes. The result comprises almost existing signature models. It will be helpful for exploring new signature models. To the best of our knowledge, it is the first time for investigation of the problem of classifica-tion of signature-only signature models.

  19. Classification of Base Sequences (+1,

    Directory of Open Access Journals (Sweden)

    Dragomir Ž. Ðoković

    2010-01-01

    Full Text Available Base sequences BS(+1, are quadruples of {±1}-sequences (;;;, with A and B of length +1 and C and D of length n, such that the sum of their nonperiodic autocor-relation functions is a -function. The base sequence conjecture, asserting that BS(+1, exist for all n, is stronger than the famous Hadamard matrix conjecture. We introduce a new definition of equivalence for base sequences BS(+1, and construct a canonical form. By using this canonical form, we have enumerated the equivalence classes of BS(+1, for ≤30. As the number of equivalence classes grows rapidly (but not monotonically with n, the tables in the paper cover only the cases ≤13.

  20. Structure-Based Algorithms for Microvessel Classification

    KAUST Repository

    Smith, Amy F.

    2015-02-01

    © 2014 The Authors. Microcirculation published by John Wiley & Sons Ltd. Objective: Recent developments in high-resolution imaging techniques have enabled digital reconstruction of three-dimensional sections of microvascular networks down to the capillary scale. To better interpret these large data sets, our goal is to distinguish branching trees of arterioles and venules from capillaries. Methods: Two novel algorithms are presented for classifying vessels in microvascular anatomical data sets without requiring flow information. The algorithms are compared with a classification based on observed flow directions (considered the gold standard), and with an existing resistance-based method that relies only on structural data. Results: The first algorithm, developed for networks with one arteriolar and one venular tree, performs well in identifying arterioles and venules and is robust to parameter changes, but incorrectly labels a significant number of capillaries as arterioles or venules. The second algorithm, developed for networks with multiple inlets and outlets, correctly identifies more arterioles and venules, but is more sensitive to parameter changes. Conclusions: The algorithms presented here can be used to classify microvessels in large microvascular data sets lacking flow information. This provides a basis for analyzing the distinct geometrical properties and modelling the functional behavior of arterioles, capillaries, and venules.

  1. Fault Diagnosis for Fuel Cell Based on Naive Bayesian Classification

    Directory of Open Access Journals (Sweden)

    Liping Fan

    2013-07-01

    Full Text Available Many kinds of uncertain factors may exist in the process of fault diagnosis and affect diagnostic results. Bayesian network is one of the most effective theoretical models for uncertain knowledge expression and reasoning. The method of naive Bayesian classification is used in this paper in fault diagnosis of a proton exchange membrane fuel cell (PEMFC system. Based on the model of PEMFC, fault data are obtained through simulation experiment, learning and training of the naive Bayesian classification are finished, and some testing samples are selected to validate this method. Simulation results demonstrate that the method is feasible.    

  2. Adaptive stellar spectral subclass classification based on Bayesian SVMs

    Science.gov (United States)

    Du, Changde; Luo, Ali; Yang, Haifeng

    2017-02-01

    Stellar spectral classification is one of the most fundamental tasks in survey astronomy. Many automated classification methods have been applied to spectral data. However, their main limitation is that the model parameters must be tuned repeatedly to deal with different data sets. In this paper, we utilize the Bayesian support vector machines (BSVM) to classify the spectral subclass data. Based on Gibbs sampling, BSVM can infer all model parameters adaptively according to different data sets, which allows us to circumvent the time-consuming cross validation for penalty parameter. We explored different normalization methods for stellar spectral data, and the best one has been suggested in this study. Finally, experimental results on several stellar spectral subclass classification problems show that the BSVM model not only possesses good adaptability but also provides better prediction performance than traditional methods.

  3. MRI-Based Classification Models in Prediction of Mild Cognitive Impairment and Dementia in Late-Life Depression

    Science.gov (United States)

    Lebedeva, Aleksandra K.; Westman, Eric; Borza, Tom; Beyer, Mona K.; Engedal, Knut; Aarsland, Dag; Selbaek, Geir; Haberg, Asta K.

    2017-01-01

    Objective: Late-life depression (LLD) is associated with development of different types of dementia. Identification of LLD patients, who will develop cognitive decline, i.e., the early stage of dementia would help to implement interventions earlier. The purpose of this study was to assess whether structural brain magnetic resonance imaging (MRI) in LLD patients can predict mild cognitive impairment (MCI) or dementia 1 year prior to the diagnosis. Methods: LLD patients underwent brain MRI at baseline and repeated clinical assessment after 1-year. Structural brain measurements were obtained using Freesurfer software (v. 5.1) from the T1W brain MRI images. MRI-based Random Forest classifier was used to discriminate between LLD who developed MCI or dementia after 1-year follow-up and cognitively stable LLD. Additionally, a previously established Random Forest model trained on 185 patients with Alzheimer’s disease (AD) vs. 225 cognitively normal elderly from the Alzheimer’s disease Neuroimaging Initiative was tested on the LLD data set (ADNI model). Results: MCI and dementia diagnoses were predicted in LLD patients with 76%/68%/84% accuracy/sensitivity/specificity. Adding the baseline Mini-Mental State Examination (MMSE) scores to the models improved accuracy/sensitivity/specificity to 81%/75%/86%. The best model predicted MCI status alone using MRI and baseline MMSE scores with accuracy/sensitivity/specificity of 89%/85%/90%. The most important region for all the models was right ventral diencephalon, including hypothalamus. Its volume correlated negatively with the number of depressive episodes. ADNI model trained on AD vs. Controls using SV could predict MCI-DEM patients with 67% accuracy. Conclusion: LDD patients developing MCI and dementia can be discriminated from LLD patients remaining cognitively stable with good accuracy based on baseline structural MRI alone. Baseline MMSE score improves prediction accuracy. Ventral diencephalon, including the hypothalamus

  4. Structural classification and a binary structure model for superconductors

    Institute of Scientific and Technical Information of China (English)

    Dong Cheng

    2006-01-01

    Based on structural and bonding features, a new classification scheme of superconductors is proposed to classify conductors can be partitioned into two parts, a superconducting active component and a supplementary component.Partially metallic covalent bonding is found to be a common feature in all superconducting active components, and the electron states of the atoms in the active components usually make a dominant contribution to the energy band near the Fermi surface. Possible directions to explore new superconductors are discussed based on the structural classification and the binary structure model.

  5. Image-based Vehicle Classification System

    CERN Document Server

    Ng, Jun Yee

    2012-01-01

    Electronic toll collection (ETC) system has been a common trend used for toll collection on toll road nowadays. The implementation of electronic toll collection allows vehicles to travel at low or full speed during the toll payment, which help to avoid the traffic delay at toll road. One of the major components of an electronic toll collection is the automatic vehicle detection and classification (AVDC) system which is important to classify the vehicle so that the toll is charged according to the vehicle classes. Vision-based vehicle classification system is one type of vehicle classification system which adopt camera as the input sensing device for the system. This type of system has advantage over the rest for it is cost efficient as low cost camera is used. The implementation of vision-based vehicle classification system requires lower initial investment cost and very suitable for the toll collection trend migration in Malaysia from single ETC system to full-scale multi-lane free flow (MLFF). This project ...

  6. Classification of LiDAR Data with Point Based Classification Methods

    Science.gov (United States)

    Yastikli, N.; Cetin, Z.

    2016-06-01

    LiDAR is one of the most effective systems for 3 dimensional (3D) data collection in wide areas. Nowadays, airborne LiDAR data is used frequently in various applications such as object extraction, 3D modelling, change detection and revision of maps with increasing point density and accuracy. The classification of the LiDAR points is the first step of LiDAR data processing chain and should be handled in proper way since the 3D city modelling, building extraction, DEM generation, etc. applications directly use the classified point clouds. The different classification methods can be seen in recent researches and most of researches work with the gridded LiDAR point cloud. In grid based data processing of the LiDAR data, the characteristic point loss in the LiDAR point cloud especially vegetation and buildings or losing height accuracy during the interpolation stage are inevitable. In this case, the possible solution is the use of the raw point cloud data for classification to avoid data and accuracy loss in gridding process. In this study, the point based classification possibilities of the LiDAR point cloud is investigated to obtain more accurate classes. The automatic point based approaches, which are based on hierarchical rules, have been proposed to achieve ground, building and vegetation classes using the raw LiDAR point cloud data. In proposed approaches, every single LiDAR point is analyzed according to their features such as height, multi-return, etc. then automatically assigned to the class which they belong to. The use of un-gridded point cloud in proposed point based classification process helped the determination of more realistic rule sets. The detailed parameter analyses have been performed to obtain the most appropriate parameters in the rule sets to achieve accurate classes. The hierarchical rule sets were created for proposed Approach 1 (using selected spatial-based and echo-based features) and Approach 2 (using only selected spatial-based features

  7. Text classification model framework based on social annotation quality%基于社会标注质量的文本分类模型框架

    Institute of Scientific and Technical Information of China (English)

    李劲; 张华; 吴浩雄; 向军; 辜希武

    2012-01-01

    Social annotation is a form of folksonomy, which allows Web users to categorize Web resource with text tags freely. It usually implicates fundamental and valuable semantic information of Web resources. Consequently, social annotation is helpful to improve the quality of information retrieval when applied to information retrieval system. This paper investigated and proposed an improved text classification algorithm based on social annotation. Because social annotation is a kind of folksonomy and social tags are usually generated arbitrarily without any control or expertise knowledge, there has been significant variance in the quality of social tags. Under this consideration, the paper firstly proposed a quantitative approach to measure the quality of social tags by utilizing the semantic similarity between Web pages and social tags. After that, the social tags with relatively low quality were filtered out based on the quality measurement and the remained social tags with high quality were applied to extend traditional vector space model. In the extended vector space model, a Web page was represented by a vector in which the components were the words in the Web page and tags tagged to the Web page. At last, the support vector machine algorithm was employed to perform the classification task. The experimental results show that the classification result can be improved after filtering out the social tags with low quality and embedding those high quality social tags into the traditional vector space model. Compared with other classification approaches, the classification result of F1 measurement has increased by 6.2% on average when using the proposed algorithm.%社会标注是一种用户对网络资源的大众分类,蕴含了丰富的语义信息,因此将社会标注应用到信息检索技术中有助于提高信息检索的质量.研究了一种基于社会标注的文本分类改进算法以提高网页分类的效果.由于社会标注属于大众分类,标注

  8. Extension of Companion Modeling Using Classification Learning

    Science.gov (United States)

    Torii, Daisuke; Bousquet, François; Ishida, Toru

    Companion Modeling is a methodology of refining initial models for understanding reality through a role-playing game (RPG) and a multiagent simulation. In this research, we propose a novel agent model construction methodology in which classification learning is applied to the RPG log data in Companion Modeling. This methodology enables a systematic model construction that handles multi-parameters, independent of the modelers ability. There are three problems in applying classification learning to the RPG log data: 1) It is difficult to gather enough data for the number of features because the cost of gathering data is high. 2) Noise data can affect the learning results because the amount of data may be insufficient. 3) The learning results should be explained as a human decision making model and should be recognized by the expert as being the result that reflects reality. We realized an agent model construction system using the following two approaches: 1) Using a feature selction method, the feature subset that has the best prediction accuracy is identified. In this process, the important features chosen by the expert are always included. 2) The expert eliminates irrelevant features from the learning results after evaluating the learning model through a visualization of the results. Finally, using the RPG log data from the Companion Modeling of agricultural economics in northeastern Thailand, we confirm the capability of this methodology.

  9. 高斯颜色模型在瓷片图像分类中的应用%Porcelain shard images classification based on Gaussian color model

    Institute of Scientific and Technical Information of China (English)

    郑霞; 胡浩基; 周明全; 樊亚春

    2012-01-01

    由于RGB颜色空间不能很好贴近人的视觉感知,同时也缺少对空间结构的描述,因此采用兼顾颜色信息和空间信息的高斯颜色模型以获取更全面的特征,提出了一种基于高斯颜色模型和多尺度滤波器组的彩色纹理图像分类法,用于瓷器碎片图像的分类.首先将原始图像的RGB颜色空间转换到高斯颜色模型;再用正规化多尺度LM滤波器组对高斯颜色模型的3个通道构造滤波图像,并借助主成分分析寻找主特征图,接着选取各通道的最大高斯拉普拉斯和最大高斯响应图像,与特征图联合构成特征图像组用以进行参数提取;最后以支持向量机作为分类器进行学习和分类.实验结果表明,与基于灰度的、基于RGB模型的和基于RGB_bior4.4小波的方法相比,本文方法具有更好的分类结果,其中在0utex纹理图像库上获得的分类准确率为96.7%,在瓷片图像集上获得的分类准确率为94.2%.此方法可推广应用到其他彩色纹理分类任务.%Since the RGB color space does not closely match the human visual perception and has no ability to describe the spatial structures, the Gaussian color model, which uses the spatial and color information in an integrated model, is used to obtain more complete image features. A color-texture approach based on the Gaussian color model and a multi-scale filter bank is introduced to classify the porcelain shard images. First, the RGB color space of the image is transformed into the Gaussian color model and then the normalized multi-scale LM filter bank is used to construct the filtered images on three channels. Afterwards, the primary feature images are found by using principal components analysis and the maximum responses of the Laplacian of Gaussian filters and Gaussian filters are separately selected. These images compose a feature image set, in which the feature parameters are extracted. Finally, a support vector machine is used to learning

  10. Identification of candidate categories of the International Classification of Functioning Disability and Health (ICF for a Generic ICF Core Set based on regression modelling

    Directory of Open Access Journals (Sweden)

    Üstün Bedirhan T

    2006-07-01

    Full Text Available Abstract Background The International Classification of Functioning, Disability and Health (ICF is the framework developed by WHO to describe functioning and disability at both the individual and population levels. While condition-specific ICF Core Sets are useful, a Generic ICF Core Set is needed to describe and compare problems in functioning across health conditions. Methods The aims of the multi-centre, cross-sectional study presented here were: a to propose a method to select ICF categories when a large amount of ICF-based data have to be handled, and b to identify candidate ICF categories for a Generic ICF Core Set by examining their explanatory power in relation to item one of the SF-36. The data were collected from 1039 patients using the ICF checklist, the SF-36 and a Comorbidity Questionnaire. ICF categories to be entered in an initial regression model were selected following systematic steps in accordance with the ICF structure. Based on an initial regression model, additional models were designed by systematically substituting the ICF categories included in it with ICF categories with which they were highly correlated. Results Fourteen different regression models were performed. The variance the performed models account for ranged from 22.27% to 24.0%. The ICF category that explained the highest amount of variance in all the models was sensation of pain. In total, thirteen candidate ICF categories for a Generic ICF Core Set were proposed. Conclusion The selection strategy based on the ICF structure and the examination of the best possible alternative models does not provide a final answer about which ICF categories must be considered, but leads to a selection of suitable candidates which needs further consideration and comparison with the results of other selection strategies in developing a Generic ICF Core Set.

  11. Classification Models for Symmetric Key Cryptosystem Identification

    Directory of Open Access Journals (Sweden)

    Shri Kant

    2012-01-01

    Full Text Available The present paper deals with the basic principle and theory behind prevalent classification models and their judicious application for symmetric key cryptosystem identification. These techniques have been implemented and verified on varieties of known and simulated data sets. After establishing the techniques the problems of cryptosystem identification have been addressed.Defence Science Journal, 2012, 62(1, pp.38-45, DOI:http://dx.doi.org/10.14429/dsj.62.1440

  12. Distance-based classification of keystroke dynamics

    Science.gov (United States)

    Tran Nguyen, Ngoc

    2016-07-01

    This paper uses the keystroke dynamics in user authentication. The relationship between the distance metrics and the data template, for the first time, was analyzed and new distance based algorithm for keystroke dynamics classification was proposed. The results of the experiments on the CMU keystroke dynamics benchmark dataset1 were evaluated with an equal error rate of 0.0614. The classifiers using the proposed distance metric outperform existing top performing keystroke dynamics classifiers which use traditional distance metrics.

  13. Hierarchical Real-time Network Traffic Classification Based on ECOC

    Directory of Open Access Journals (Sweden)

    Yaou Zhao

    2013-09-01

    Full Text Available Classification of network traffic is basic and essential for manynetwork researches and managements. With the rapid development ofpeer-to-peer (P2P application using dynamic port disguisingtechniques and encryption to avoid detection, port-based and simplepayload-based network traffic classification methods were diminished.An alternative method based on statistics and machine learning hadattracted researchers' attention in recent years. However, most ofthe proposed algorithms were off-line and usually used a single classifier.In this paper a new hierarchical real-time model was proposed which comprised of a three tuple (source ip, destination ip and destination portlook up table(TT-LUT part and layered milestone part. TT-LUT was used to quickly classify short flows whichneed not to pass the layered milestone part, and milestones in layered milestone partcould classify the other flows in real-time with the real-time feature selection and statistics.Every milestone was a ECOC(Error-Correcting Output Codes based model which was usedto improve classification performance. Experiments showed that the proposedmodel can improve the efficiency of real-time to 80%, and themulti-class classification accuracy encouragingly to 91.4% on the datasets which had been captured from the backbone router in our campus through a week.

  14. Atmospheric circulation classification comparison based on wildfires in Portugal

    Science.gov (United States)

    Pereira, M. G.; Trigo, R. M.

    2009-04-01

    Atmospheric circulation classifications are not a simple description of atmospheric states but a tool to understand and interpret the atmospheric processes and to model the relation between atmospheric circulation and surface climate and other related variables (Radan Huth et al., 2008). Classifications were initially developed with weather forecasting purposes, however with the progress in computer processing capability, new and more robust objective methods were developed and applied to large datasets prompting atmospheric circulation classification methods to one of the most important fields in synoptic and statistical climatology. Classification studies have been extensively used in climate change studies (e.g. reconstructed past climates, recent observed changes and future climates), in bioclimatological research (e.g. relating human mortality to climatic factors) and in a wide variety of synoptic climatological applications (e.g. comparison between datasets, air pollution, snow avalanches, wine quality, fish captures and forest fires). Likewise, atmospheric circulation classifications are important for the study of the role of weather in wildfire occurrence in Portugal because the daily synoptic variability is the most important driver of local weather conditions (Pereira et al., 2005). In particular, the objective classification scheme developed by Trigo and DaCamara (2000) to classify the atmospheric circulation affecting Portugal have proved to be quite useful in discriminating the occurrence and development of wildfires as well as the distribution over Portugal of surface climatic variables with impact in wildfire activity such as maximum and minimum temperature and precipitation. This work aims to present: (i) an overview the existing circulation classification for the Iberian Peninsula, and (ii) the results of a comparison study between these atmospheric circulation classifications based on its relation with wildfires and relevant meteorological

  15. A semi-empirical library of galaxy spectra for Gaia classification based on SDSS data and PEGASE models

    CERN Document Server

    Tsalmantza, P; Kontizas, M; Bailer-Jones, C A L; Rocca-Volmerange, B; Livanou, E; Bellas-Velidis, I; Kontizas, E; Vallenari, A

    2011-01-01

    Aims:This paper is the third in a series implementing a classification system for Gaia observations of unresolved galaxies. The system makes use of template galaxy spectra in order to determine spectral classes and estimate intrinsic astrophysical parameters. In previous work we used synthetic galaxy spectra produced by PEGASE.2 code to simulate Gaia observations and to test the performance of Support Vector Machine (SVM) classifiers and parametrizers. Here we produce a semi-empirical library of galaxy spectra by fitting SDSS spectra with the previously produced synthetic libraries. We present (1) the semi-empirical library of galaxy spectra, (2) a comparison between the observed and synthetic spectra, and (3) first results of claassification and parametrization experiments with simulated Gaia spectrophotometry of this library. Methods: We use chi2-fitting to fit SDSS galaxy spectra with the synthetic library in order to construct a semi-empirical library of galaxy spectra in which (1) the real spectra are ex...

  16. Collaborative Representation based Classification for Face Recognition

    CERN Document Server

    Zhang, Lei; Feng, Xiangchu; Ma, Yi; Zhang, David

    2012-01-01

    By coding a query sample as a sparse linear combination of all training samples and then classifying it by evaluating which class leads to the minimal coding residual, sparse representation based classification (SRC) leads to interesting results for robust face recognition. It is widely believed that the l1- norm sparsity constraint on coding coefficients plays a key role in the success of SRC, while its use of all training samples to collaboratively represent the query sample is rather ignored. In this paper we discuss how SRC works, and show that the collaborative representation mechanism used in SRC is much more crucial to its success of face classification. The SRC is a special case of collaborative representation based classification (CRC), which has various instantiations by applying different norms to the coding residual and coding coefficient. More specifically, the l1 or l2 norm characterization of coding residual is related to the robustness of CRC to outlier facial pixels, while the l1 or l2 norm c...

  17. Texture feature based liver lesion classification

    Science.gov (United States)

    Doron, Yeela; Mayer-Wolf, Nitzan; Diamant, Idit; Greenspan, Hayit

    2014-03-01

    Liver lesion classification is a difficult clinical task. Computerized analysis can support clinical workflow by enabling more objective and reproducible evaluation. In this paper, we evaluate the contribution of several types of texture features for a computer-aided diagnostic (CAD) system which automatically classifies liver lesions from CT images. Based on the assumption that liver lesions of various classes differ in their texture characteristics, a variety of texture features were examined as lesion descriptors. Although texture features are often used for this task, there is currently a lack of detailed research focusing on the comparison across different texture features, or their combinations, on a given dataset. In this work we investigated the performance of Gray Level Co-occurrence Matrix (GLCM), Local Binary Patterns (LBP), Gabor, gray level intensity values and Gabor-based LBP (GLBP), where the features are obtained from a given lesion`s region of interest (ROI). For the classification module, SVM and KNN classifiers were examined. Using a single type of texture feature, best result of 91% accuracy, was obtained with Gabor filtering and SVM classification. Combination of Gabor, LBP and Intensity features improved the results to a final accuracy of 97%.

  18. Texture classification based on EMD and FFT

    Institute of Scientific and Technical Information of China (English)

    XIONG Chang-zhen; XU Jun-yi; ZOU Jian-cheng; QI Dong-xu

    2006-01-01

    Empirical mode decomposition (EMD) is an adaptive and approximately orthogonal filtering process that reflects human's visual mechanism of differentiating textures. In this paper, we present a modified 2D EMD algorithm using the FastRBF and an appropriate number of iterations in the shifting process (SP), then apply it to texture classification. Rotation-invariant texture feature vectors are extracted using auto-registration and circular regions of magnitude spectra of 2D fast Fourier transform(FFT). In the experiments, we employ a Bayesion classifier to classify a set of 15 distinct natural textures selected from the Brodatz album. The experimental results, based on different testing datasets for images with different orientations, show the effectiveness of the proposed classification scheme.

  19. Feature-Based Classification of Networks

    CERN Document Server

    Barnett, Ian; Kuijjer, Marieke L; Mucha, Peter J; Onnela, Jukka-Pekka

    2016-01-01

    Network representations of systems from various scientific and societal domains are neither completely random nor fully regular, but instead appear to contain recurring structural building blocks. These features tend to be shared by networks belonging to the same broad class, such as the class of social networks or the class of biological networks. At a finer scale of classification within each such class, networks describing more similar systems tend to have more similar features. This occurs presumably because networks representing similar purposes or constructions would be expected to be generated by a shared set of domain specific mechanisms, and it should therefore be possible to classify these networks into categories based on their features at various structural levels. Here we describe and demonstrate a new, hybrid approach that combines manual selection of features of potential interest with existing automated classification methods. In particular, selecting well-known and well-studied features that ...

  20. NIM: A Node Influence Based Method for Cancer Classification

    Directory of Open Access Journals (Sweden)

    Yiwen Wang

    2014-01-01

    Full Text Available The classification of different cancer types owns great significance in the medical field. However, the great majority of existing cancer classification methods are clinical-based and have relatively weak diagnostic ability. With the rapid development of gene expression technology, it is able to classify different kinds of cancers using DNA microarray. Our main idea is to confront the problem of cancer classification using gene expression data from a graph-based view. Based on a new node influence model we proposed, this paper presents a novel high accuracy method for cancer classification, which is composed of four parts: the first is to calculate the similarity matrix of all samples, the second is to compute the node influence of training samples, the third is to obtain the similarity between every test sample and each class using weighted sum of node influence and similarity matrix, and the last is to classify each test sample based on its similarity between every class. The data sets used in our experiments are breast cancer, central nervous system, colon tumor, prostate cancer, acute lymphoblastic leukemia, and lung cancer. experimental results showed that our node influence based method (NIM is more efficient and robust than the support vector machine, K-nearest neighbor, C4.5, naive Bayes, and CART.

  1. NIM: a node influence based method for cancer classification.

    Science.gov (United States)

    Wang, Yiwen; Yao, Min; Yang, Jianhua

    2014-01-01

    The classification of different cancer types owns great significance in the medical field. However, the great majority of existing cancer classification methods are clinical-based and have relatively weak diagnostic ability. With the rapid development of gene expression technology, it is able to classify different kinds of cancers using DNA microarray. Our main idea is to confront the problem of cancer classification using gene expression data from a graph-based view. Based on a new node influence model we proposed, this paper presents a novel high accuracy method for cancer classification, which is composed of four parts: the first is to calculate the similarity matrix of all samples, the second is to compute the node influence of training samples, the third is to obtain the similarity between every test sample and each class using weighted sum of node influence and similarity matrix, and the last is to classify each test sample based on its similarity between every class. The data sets used in our experiments are breast cancer, central nervous system, colon tumor, prostate cancer, acute lymphoblastic leukemia, and lung cancer. experimental results showed that our node influence based method (NIM) is more efficient and robust than the support vector machine, K-nearest neighbor, C4.5, naive Bayes, and CART.

  2. 基于隶属度限幅特征VSM的文本分类模型%Modeling text classification based on membership degree limiting characteristic VSM

    Institute of Scientific and Technical Information of China (English)

    周菁; 戴冠中; 周婷婷

    2009-01-01

    Through the expression of text characteristic based on fuzzy qualifier and then definition of fuzzy feature using the fuzzy function, represented the text as the imposed membership degree limiting text feature vector. Constructuring membership degree limiting class feature matrix, and mapping each group of texts belong to the same class to its class expectation vector. All of class expectation vectors constructed the membership degree limiting characteristic VSM. Based on that, presented a new text-classification model and the experiment shows that the model is efficient.%通过文档基于模糊限定词的特征表达,定义特征的模糊函数,将文档表示为隶属度限幅的特征向量,构造文本集隶属度限幅的类特征矩阵,将每一类文本集映射为类期望向量,所有类期望向量便构成了隶属度限幅的特征VSM.在此基础上设计了一种新的文本分类模型.实验结果证明,该分类模型能有效实现文本分类.

  3. Optimizing Mining Association Rules for Artificial Immune System based Classification

    Directory of Open Access Journals (Sweden)

    SAMEER DIXIT

    2011-08-01

    Full Text Available The primary function of a biological immune system is to protect the body from foreign molecules known as antigens. It has great pattern recognition capability that may be used to distinguish between foreigncells entering the body (non-self or antigen and the body cells (self. Immune systems have many characteristics such as uniqueness, autonomous, recognition of foreigners, distributed detection, and noise tolerance . Inspired by biological immune systems, Artificial Immune Systems have emerged during the last decade. They are incited by many researchers to design and build immune-based models for a variety of application domains. Artificial immune systems can be defined as a computational paradigm that is inspired by theoretical immunology, observed immune functions, principles and mechanisms. Association rule mining is one of the most important and well researched techniques of data mining. The goal of association rules is to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in thetransaction databases or other data repositories. Association rules are widely used in various areas such as inventory control, telecommunication networks, intelligent decision making, market analysis and risk management etc. Apriori is the most widely used algorithm for mining the association rules. Other popular association rule mining algorithms are frequent pattern (FP growth, Eclat, dynamic itemset counting (DIC etc. Associative classification uses association rule mining in the rule discovery process to predict the class labels of the data. This technique has shown great promise over many other classification techniques. Associative classification also integrates the process of rule discovery and classification to build the classifier for the purpose of prediction. The main problem with the associative classification approach is the discovery of highquality association rules in a very large space of

  4. 基于语义模板的文档自动分类模型研究%Study on Automatic Classification Model of Documents Based on Semantics

    Institute of Scientific and Technical Information of China (English)

    李海蓉

    2012-01-01

    简要介绍语义模板的概念.提出基于语义模板向量空间的文档自动分类模型。利用支持向量机(SVM,Support Vector Machine)分类算法对文档测试集进行基于语义模板空间、词向量空间的分类实验,实验结果表明,基于语义模板空间的文本分类性能比基于词向量空间的分类性能要高。%Simply introduced the concepts of semantic pattern, and proposed an automatic classification of Documents based on semantic pattern vector space. With the SVM classification algorithm carrying out classification experiment to document test corpus based on semantic pattern vector space and based on word vector space. Experimental results show that text classification performance based on semantic pattern vector space is higher than that based on word vector space.

  5. REMOTE SENSING IMAGE CLASSIFICATION BASED ON LOGISTIC MODEL%基于逻辑斯蒂模型的遥感图像分类

    Institute of Scientific and Technical Information of China (English)

    刘庆生; 刘高焕; 蔺启忠; 王志刚

    2001-01-01

    逻辑斯蒂法是一种非线性的回归分析方法,因采用逻辑斯蒂模型而得名[1],可用来进行未知单元类别属性的预测和判定。不同于一般的分类方法,它可分别给出某一单元属于各已知类别的概率,进而对研究的未知区中所有单元进行分类和预测。本文首先阐述了该方法的基本原理,而后利用它对内蒙古自治区两个研究区的两种图像数据进行了分类,最后探讨了影响该方法用于遥感图像分类的几个因素。%Logistic method is a nonlinear regression analysis method, which is based on the Logistic model. Usually, It is used to forecast and classify the unknown units into the known types. Other than the common classification methods, it can respectively calculate the probabilities which one unit belongs to the different known types, then, classify and forecast all the units of the unknown research field. In this paper, firstly we introduce the keystone of Logistic method, then, classify the two different remote sensing image data of the two different fields in The Inner Mongolia Autonomous Region by this method, finally discuss about a few factors which affect remote sensing image classification using logistic method.

  6. Genome-based Taxonomic Classification of Bacteroidetes

    Directory of Open Access Journals (Sweden)

    Richard L. Hahnke

    2016-12-01

    Full Text Available The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.

  7. Genome-Based Taxonomic Classification of Bacteroidetes.

    Science.gov (United States)

    Hahnke, Richard L; Meier-Kolthoff, Jan P; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N; Woyke, Tanja; Kyrpides, Nikos C; Klenk, Hans-Peter; Göker, Markus

    2016-01-01

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.

  8. Fast rule-based bioactivity prediction using associative classification mining

    Directory of Open Access Journals (Sweden)

    Yu Pulan

    2012-11-01

    Full Text Available Abstract Relating chemical features to bioactivities is critical in molecular design and is used extensively in the lead discovery and optimization process. A variety of techniques from statistics, data mining and machine learning have been applied to this process. In this study, we utilize a collection of methods, called associative classification mining (ACM, which are popular in the data mining community, but so far have not been applied widely in cheminformatics. More specifically, classification based on predictive association rules (CPAR, classification based on multiple association rules (CMAR and classification based on association rules (CBA are employed on three datasets using various descriptor sets. Experimental evaluations on anti-tuberculosis (antiTB, mutagenicity and hERG (the human Ether-a-go-go-Related Gene blocker datasets show that these three methods are computationally scalable and appropriate for high speed mining. Additionally, they provide comparable accuracy and efficiency to the commonly used Bayesian and support vector machines (SVM methods, and produce highly interpretable models.

  9. Cirrhosis Classification Based on Texture Classification of Random Features

    Directory of Open Access Journals (Sweden)

    Hui Liu

    2014-01-01

    Full Text Available Accurate staging of hepatic cirrhosis is important in investigating the cause and slowing down the effects of cirrhosis. Computer-aided diagnosis (CAD can provide doctors with an alternative second opinion and assist them to make a specific treatment with accurate cirrhosis stage. MRI has many advantages, including high resolution for soft tissue, no radiation, and multiparameters imaging modalities. So in this paper, multisequences MRIs, including T1-weighted, T2-weighted, arterial, portal venous, and equilibrium phase, are applied. However, CAD does not meet the clinical needs of cirrhosis and few researchers are concerned with it at present. Cirrhosis is characterized by the presence of widespread fibrosis and regenerative nodules in the hepatic, leading to different texture patterns of different stages. So, extracting texture feature is the primary task. Compared with typical gray level cooccurrence matrix (GLCM features, texture classification from random features provides an effective way, and we adopt it and propose CCTCRF for triple classification (normal, early, and middle and advanced stage. CCTCRF does not need strong assumptions except the sparse character of image, contains sufficient texture information, includes concise and effective process, and makes case decision with high accuracy. Experimental results also illustrate the satisfying performance and they are also compared with typical NN with GLCM.

  10. Cirrhosis classification based on texture classification of random features.

    Science.gov (United States)

    Liu, Hui; Shao, Ying; Guo, Dongmei; Zheng, Yuanjie; Zhao, Zuowei; Qiu, Tianshuang

    2014-01-01

    Accurate staging of hepatic cirrhosis is important in investigating the cause and slowing down the effects of cirrhosis. Computer-aided diagnosis (CAD) can provide doctors with an alternative second opinion and assist them to make a specific treatment with accurate cirrhosis stage. MRI has many advantages, including high resolution for soft tissue, no radiation, and multiparameters imaging modalities. So in this paper, multisequences MRIs, including T1-weighted, T2-weighted, arterial, portal venous, and equilibrium phase, are applied. However, CAD does not meet the clinical needs of cirrhosis and few researchers are concerned with it at present. Cirrhosis is characterized by the presence of widespread fibrosis and regenerative nodules in the hepatic, leading to different texture patterns of different stages. So, extracting texture feature is the primary task. Compared with typical gray level cooccurrence matrix (GLCM) features, texture classification from random features provides an effective way, and we adopt it and propose CCTCRF for triple classification (normal, early, and middle and advanced stage). CCTCRF does not need strong assumptions except the sparse character of image, contains sufficient texture information, includes concise and effective process, and makes case decision with high accuracy. Experimental results also illustrate the satisfying performance and they are also compared with typical NN with GLCM.

  11. "Chromosome": a knowledge-based system for the chromosome classification.

    Science.gov (United States)

    Ramstein, G; Bernadet, M

    1993-01-01

    Chromosome, a knowledge-based analysis system has been designed for the classification of human chromosomes. Its aim is to perform an optimal classification by driving a tool box containing the procedures of image processing, pattern recognition and classification. This paper presents the general architecture of Chromosome, based on a multiagent system generator. The image processing tool box is described from the met aphasic enhancement to the fine classification. Emphasis is then put on the knowledge base intended for the chromosome recognition. The global classification process is also presented, showing how Chromosome proceeds to classify a given chromosome. Finally, we discuss further extensions of the system for the karyotype building.

  12. A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

    Directory of Open Access Journals (Sweden)

    Kopriva Ivica

    2011-12-01

    Full Text Available Abstract Background Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. Results The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%, 97.6% (sd = 2.8% and 90.8% (sd = 5.5% and average specificities of: 93.6% (sd = 4.1%, 99% (sd = 2.2% and 79.4% (sd = 9.8% in 100 independent two-fold cross-validations. Conclusions We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information as control specific, case specific and not differentially expressed (neutral. The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method

  13. A Fuzzy Rule-Base Model for Classification of Spirometric FVC Graphs in Chronical Obstructive Pulmonary Diseases

    Science.gov (United States)

    2007-11-02

    of distinguishing COPD group diseases (chronic bronchitis, emphysema and asthma ) by using fuzzy theory and to put into practice a “fuzzy rule-base...FVC Plots”. Keywords - asthma , chronic bronchitis, COPD (Chronic Obstructive Pulmonary Disease), emphysema , expert systems, FVC (forced vital...the group of chronic bronchitis, emphysema and asthma because of these reasons [4-7]. Additionally, similar symptoms may cause fuzziness in

  14. Network Traffic Anomalies Identification Based on Classification Methods

    Directory of Open Access Journals (Sweden)

    Donatas Račys

    2015-07-01

    Full Text Available A problem of network traffic anomalies detection in the computer networks is analyzed. Overview of anomalies detection methods is given then advantages and disadvantages of the different methods are analyzed. Model for the traffic anomalies detection was developed based on IBM SPSS Modeler and is used to analyze SNMP data of the router. Investigation of the traffic anomalies was done using three classification methods and different sets of the learning data. Based on the results of investigation it was determined that C5.1 decision tree method has the largest accuracy and performance and can be successfully used for identification of the network traffic anomalies.

  15. Classification model of arousal and valence mental states by EEG signals analysis and Brodmann correlations

    Directory of Open Access Journals (Sweden)

    Adrian Rodriguez Aguinaga

    2015-06-01

    Full Text Available This paper proposes a methodology to perform emotional states classification by the analysis of EEG signals, wavelet decomposition and an electrode discrimination process, that associates electrodes of a 10/20 model to Brodmann regions and reduce computational burden. The classification process were performed by a Support Vector Machines Classification process, achieving a 81.46 percent of classification rate for a multi-class problem and the emotions modeling are based in an adjusted space from the Russell Arousal Valence Space and the Geneva model.

  16. Malware Classification based on Call Graph Clustering

    CERN Document Server

    Kinable, Joris

    2010-01-01

    Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations away, and enable the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and DB...

  17. Age Classification Based On Integrated Approach

    Directory of Open Access Journals (Sweden)

    Pullela. SVVSR Kumar

    2014-05-01

    Full Text Available The present paper presents a new age classification method by integrating the features derived from Grey Level Co-occurrence Matrix (GLCM with a new structural approach derived from four distinct LBP's (4-DLBP on a 3 x 3 image. The present paper derived four distinct patterns called Left Diagonal (LD, Right diagonal (RD, vertical centre (VC and horizontal centre (HC LBP's. For all the LBP's the central pixel value of the 3 x 3 neighbourhood is significant. That is the reason in the present research LBP values are evaluated by comparing all 9 pixels of the 3 x 3 neighbourhood with the average value of the neighbourhood. The four distinct LBP's are grouped into two distinct LBP's. Based on these two distinct LBP's GLCM is computed and features are evaluated to classify the human age into four age groups i.e: Child (0-15, Young adult (16-30, Middle aged adult (31-50 and senior adult (>50. The co-occurrence features extracted from the 4-DLBP provides complete texture information about an image which is useful for classification. The proposed 4-DLBP reduces the size of the LBP from 6561 to 79 in the case of original texture spectrum and 2020 to 79 in the case of Fuzzy Texture approach.

  18. Automatic web services classification based on rough set theory

    Institute of Scientific and Technical Information of China (English)

    陈立; 张英; 宋自林; 苗壮

    2013-01-01

    With development of web services technology, the number of existing services in the internet is growing day by day. In order to achieve automatic and accurate services classification which can be beneficial for service related tasks, a rough set theory based method for services classification was proposed. First, the services descriptions were preprocessed and represented as vectors. Elicited by the discernibility matrices based attribute reduction in rough set theory and taking into account the characteristic of decision table of services classification, a method based on continuous discernibility matrices was proposed for dimensionality reduction. And finally, services classification was processed automatically. Through the experiment, the proposed method for services classification achieves approving classification result in all five testing categories. The experiment result shows that the proposed method is accurate and could be used in practical web services classification.

  19. SPEECH/MUSIC CLASSIFICATION USING WAVELET BASED FEATURE EXTRACTION TECHNIQUES

    Directory of Open Access Journals (Sweden)

    Thiruvengatanadhan Ramalingam

    2014-01-01

    Full Text Available Audio classification serves as the fundamental step towards the rapid growth in audio data volume. Due to the increasing size of the multimedia sources speech and music classification is one of the most important issues for multimedia information retrieval. In this work a speech/music discrimination system is developed which utilizes the Discrete Wavelet Transform (DWT as the acoustic feature. Multi resolution analysis is the most significant statistical way to extract the features from the input signal and in this study, a method is deployed to model the extracted wavelet feature. Support Vector Machines (SVM are based on the principle of structural risk minimization. SVM is applied to classify audio into their classes namely speech and music, by learning from training data. Then the proposed method extends the application of Gaussian Mixture Models (GMM to estimate the probability density function using maximum likelihood decision methods. The system shows significant results with an accuracy of 94.5%.

  20. Data Classification Based on Confidentiality in Virtual Cloud Environment

    Directory of Open Access Journals (Sweden)

    Munwar Ali Zardari

    2014-10-01

    Full Text Available The aim of this study is to provide suitable security to data based on the security needs of data. It is very difficult to decide (in cloud which data need what security and which data do not need security. However it will be easy to decide the security level for data after data classification according to their security level based on the characteristics of the data. In this study, we have proposed a data classification cloud model to solve data confidentiality issue in cloud computing environment. The data are classified into two major classes: sensitive and non-sensitive. The K-Nearest Neighbour (K-NN classifier is used for data classification and the Rivest, Shamir and Adelman (RSA algorithm is used to encrypt sensitive data. After implementing the proposed model, it is found that the confidentiality level of data is increased and this model is proved to be more cost and memory friendly for the users as well as for the cloud services providers. The data storage service is one of the cloud services where data servers are virtualized of all users. In a cloud server, the data are stored in two ways. First encrypt the received data and store on cloud servers. Second store data on the cloud servers without encryption. Both of these data storage methods can face data confidentiality issue, because the data have different values and characteristics that must be identified before sending to cloud severs.

  1. Credal Classification based on AODE and compression coefficients

    CERN Document Server

    Corani, Giorgio

    2012-01-01

    Bayesian model averaging (BMA) is an approach to average over alternative models; yet, it usually gets excessively concentrated around the single most probable model, therefore achieving only sub-optimal classification performance. The compression-based approach (Boulle, 2007) overcomes this problem, averaging over the different models by applying a logarithmic smoothing over the models' posterior probabilities. This approach has shown excellent performances when applied to ensembles of naive Bayes classifiers. AODE is another ensemble of models with high performance (Webb, 2005), based on a collection of non-naive classifiers (called SPODE) whose probabilistic predictions are aggregated by simple arithmetic mean. Aggregating the SPODEs via BMA rather than by arithmetic mean deteriorates the performance; instead, we aggregate the SPODEs via the compression coefficients and we show that the resulting classifier obtains a slight but consistent improvement over AODE. However, an important issue in any Bayesian e...

  2. Graph-based Methods for Orbit Classification

    Energy Technology Data Exchange (ETDEWEB)

    Bagherjeiran, A; Kamath, C

    2005-09-29

    An important step in the quest for low-cost fusion power is the ability to perform and analyze experiments in prototype fusion reactors. One of the tasks in the analysis of experimental data is the classification of orbits in Poincare plots. These plots are generated by the particles in a fusion reactor as they move within the toroidal device. In this paper, we describe the use of graph-based methods to extract features from orbits. These features are then used to classify the orbits into several categories. Our results show that existing machine learning algorithms are successful in classifying orbits with few points, a situation which can arise in data from experiments.

  3. Hierarchical diagnostic classification models morphing into unidimensional 'diagnostic' classification models-a commentary.

    Science.gov (United States)

    von Davier, Matthias; Haberman, Shelby J

    2014-04-01

    This commentary addresses the modeling and final analytical path taken, as well as the terminology used, in the paper "Hierarchical diagnostic classification models: a family of models for estimating and testing attribute hierarchies" by Templin and Bradshaw (Psychometrika, doi: 10.1007/s11336-013-9362-0, 2013). It raises several issues concerning use of cognitive diagnostic models that either assume attribute hierarchies or assume a certain form of attribute interactions. The issues raised are illustrated with examples, and references are provided for further examination.

  4. An Efficient Method for Landscape Image Classification and Matching Based on MPEG-7 Descriptors

    OpenAIRE

    2011-01-01

    In this thesis, an efficient approach for landscape image classification and matching system based on the MPEG-7 (Moving Picture Expert group) color and shape descriptor. Image classification is the task of deciding whether an image landscape or not. These classifications use the dominant color descriptor method for finding the dominant color in the image. In DCD we examine whole image pixel values. The pixel value contains Red, Green and Blue color values in the RGB color model. After calcul...

  5. 多源信息分级优化备件需求预测模型%Multi-source information classification optimization based spare parts demand prediction model

    Institute of Scientific and Technical Information of China (English)

    索海龙; 高建民; 高智勇; 刘元浩

    2015-01-01

    In order to solve the difficult demand prediction problem of main key spare parts in large power equipment manufactur-ing supply enterprises,the multi-source heterogeneous information from multisectoral departments was trimmed,classified and analyzed,and a spare parts demand prediction model based on multi-source information classification optimization was proposed. This model mainly included the establishment of the basic spare parts inventory,the model optimization of customer satisfaction rate,spare parts reserve strategy and the product service status.The spare parts results from hierarchical optimization prediction model,combined with time series forecasting method and enterprise actual forecasting methods respectively were analyzed by an actural example.Model actual satisfied rate is improved from 90.32% and 98.81% respectively to 98.87%.Meanwhile,practi-cal feasibility and economical efficiency were verified for large equipment main key spare parts demand prediction.%为了解决大型动力装备制造供应企业主关键备件需求预测难的问题,采用来自企业多部门的多源异构信息,对其进行整理、归类和分析,建立了一种基于多源信息分级优化备件需求预测模型。该模型主要包括备件基本库的建立、基于客户满足率的模型优化、基于备件储备策略的模型优化和基于产品服役状态的模型优化。分级优化备件需求预测方法分别与时序预测方法、企业实际预测方法得到的备件数量通过实例进行对比验证分析,该模型实际满足率分别由90.32%和98.81%提高到98.87%,对大型装备主关键备件的需求预测具有实际可行性和良好经济性。

  6. Structure-based classification and ontology in chemistry

    Directory of Open Access Journals (Sweden)

    Hastings Janna

    2012-04-01

    Full Text Available Abstract Background Recent years have seen an explosion in the availability of data in the chemistry domain. With this information explosion, however, retrieving relevant results from the available information, and organising those results, become even harder problems. Computational processing is essential to filter and organise the available resources so as to better facilitate the work of scientists. Ontologies encode expert domain knowledge in a hierarchically organised machine-processable format. One such ontology for the chemical domain is ChEBI. ChEBI provides a classification of chemicals based on their structural features and a role or activity-based classification. An example of a structure-based class is 'pentacyclic compound' (compounds containing five-ring structures, while an example of a role-based class is 'analgesic', since many different chemicals can act as analgesics without sharing structural features. Structure-based classification in chemistry exploits elegant regularities and symmetries in the underlying chemical domain. As yet, there has been neither a systematic analysis of the types of structural classification in use in chemistry nor a comparison to the capabilities of available technologies. Results We analyze the different categories of structural classes in chemistry, presenting a list of patterns for features found in class definitions. We compare these patterns of class definition to tools which allow for automation of hierarchy construction within cheminformatics and within logic-based ontology technology, going into detail in the latter case with respect to the expressive capabilities of the Web Ontology Language and recent extensions for modelling structured objects. Finally we discuss the relationships and interactions between cheminformatics approaches and logic-based approaches. Conclusion Systems that perform intelligent reasoning tasks on chemistry data require a diverse set of underlying computational

  7. Sparse multivariate autoregressive modeling for mild cognitive impairment classification.

    Science.gov (United States)

    Li, Yang; Wee, Chong-Yaw; Jie, Biao; Peng, Ziwen; Shen, Dinggang

    2014-07-01

    Brain connectivity network derived from functional magnetic resonance imaging (fMRI) is becoming increasingly prevalent in the researches related to cognitive and perceptual processes. The capability to detect causal or effective connectivity is highly desirable for understanding the cooperative nature of brain network, particularly when the ultimate goal is to obtain good performance of control-patient classification with biological meaningful interpretations. Understanding directed functional interactions between brain regions via brain connectivity network is a challenging task. Since many genetic and biomedical networks are intrinsically sparse, incorporating sparsity property into connectivity modeling can make the derived models more biologically plausible. Accordingly, we propose an effective connectivity modeling of resting-state fMRI data based on the multivariate autoregressive (MAR) modeling technique, which is widely used to characterize temporal information of dynamic systems. This MAR modeling technique allows for the identification of effective connectivity using the Granger causality concept and reducing the spurious causality connectivity in assessment of directed functional interaction from fMRI data. A forward orthogonal least squares (OLS) regression algorithm is further used to construct a sparse MAR model. By applying the proposed modeling to mild cognitive impairment (MCI) classification, we identify several most discriminative regions, including middle cingulate gyrus, posterior cingulate gyrus, lingual gyrus and caudate regions, in line with results reported in previous findings. A relatively high classification accuracy of 91.89 % is also achieved, with an increment of 5.4 % compared to the fully-connected, non-directional Pearson-correlation-based functional connectivity approach.

  8. Intrusion Awareness Based on Data Fusion and SVM Classification

    Directory of Open Access Journals (Sweden)

    Ramnaresh Sharma

    2012-06-01

    Full Text Available Network intrusion awareness is important factor for risk analysis of network security. In the current decade various method and framework are available for intrusion detection and security awareness. Some method based on knowledge discovery process and some framework based on neural network. These entire model take rule based decision for the generation of security alerts. In this paper we proposed a novel method for intrusion awareness using data fusion and SVM classification. Data fusion work on the biases of features gathering of event. Support vector machine is super classifier of data. Here we used SVM for the detection of closed item of ruled based technique. Our proposed method simulate on KDD1999 DARPA data set and get better empirical evaluation result in comparison of rule based technique and neural network model.

  9. Intrusion Awareness Based on Data Fusion and SVM Classification

    Directory of Open Access Journals (Sweden)

    Ramnaresh Sharma

    2012-06-01

    Full Text Available Network intrusion awareness is important factor forrisk analysis of network security. In the currentdecade various method and framework are availablefor intrusion detection and security awareness.Some method based on knowledge discovery processand some framework based on neural network.These entire model take rule based decision for thegeneration of security alerts. In this paper weproposed a novel method for intrusion awarenessusing data fusion and SVM classification. Datafusion work on the biases of features gathering ofevent. Support vector machine is super classifier ofdata. Here we used SVM for the detection of closeditem of ruled based technique. Our proposedmethod simulate on KDD1999 DARPA data set andget better empirical evaluation result in comparisonof rule based technique and neural network model.

  10. A novel classification method based on membership function

    Science.gov (United States)

    Peng, Yaxin; Shen, Chaomin; Wang, Lijia; Zhang, Guixu

    2011-03-01

    We propose a method for medical image classification using membership function. Our aim is to classify the image as several classes based on a prior knowledge. For every point, we calculate its membership function, i.e., the probability that the point belongs to each class. The point is finally labeled as the class with the highest value of membership function. The classification is reduced to a minimization problem of a functional with arguments of membership functions. Three novelties are in our paper. First, bias correction and Rudin-Osher-Fatemi (ROF) model are adopted to the input image to enhance the image quality. Second, unconstrained functional is used. We use variable substitution to avoid the constraints that membership functions should be positive and with sum one. Third, several techniques are used to fasten the computation. The experimental result of ventricle shows the validity of this approach.

  11. Bearing Fault Classification Based on Conditional Random Field

    Directory of Open Access Journals (Sweden)

    Guofeng Wang

    2013-01-01

    Full Text Available Condition monitoring of rolling element bearing is paramount for predicting the lifetime and performing effective maintenance of the mechanical equipment. To overcome the drawbacks of the hidden Markov model (HMM and improve the diagnosis accuracy, conditional random field (CRF model based classifier is proposed. In this model, the feature vectors sequences and the fault categories are linked by an undirected graphical model in which their relationship is represented by a global conditional probability distribution. In comparison with the HMM, the main advantage of the CRF model is that it can depict the temporal dynamic information between the observation sequences and state sequences without assuming the independence of the input feature vectors. Therefore, the interrelationship between the adjacent observation vectors can also be depicted and integrated into the model, which makes the classifier more robust and accurate than the HMM. To evaluate the effectiveness of the proposed method, four kinds of bearing vibration signals which correspond to normal, inner race pit, outer race pit and roller pit respectively are collected from the test rig. And the CRF and HMM models are built respectively to perform fault classification by taking the sub band energy features of wavelet packet decomposition (WPD as the observation sequences. Moreover, K-fold cross validation method is adopted to improve the evaluation accuracy of the classifier. The analysis and comparison under different fold times show that the accuracy rate of classification using the CRF model is higher than the HMM. This method brings some new lights on the accurate classification of the bearing faults.

  12. Using Discrete Loss Functions and Weighted Kappa for Classification: An Illustration Based on Bayesian Network Analysis

    Science.gov (United States)

    Zwick, Rebecca; Lenaburg, Lubella

    2009-01-01

    In certain data analyses (e.g., multiple discriminant analysis and multinomial log-linear modeling), classification decisions are made based on the estimated posterior probabilities that individuals belong to each of several distinct categories. In the Bayesian network literature, this type of classification is often accomplished by assigning…

  13. Constructing Customer Consumption Classification Models Based on Rough Sets and Neural Network%基于粗糙神经网络的客户消费分类模型研究

    Institute of Scientific and Technical Information of China (English)

    万映红; 胡万平; 曹小鹏

    2011-01-01

    针对客户消费属性的多维、相关及不确定的特点,提出了基于粗糙神经网络(RS-NN)的客户消费分类模型.在揭示了客户消费分类问题的粗糙集特性基础上,设计出由预处理分类知识空间、建立消费分类模型、分类模型应用构成的研究框架,系统阐述了基于粗糙集的约简消费属性、提取分类规则、构建粗糙集神经网络初始拓扑结构、训练和检验网络模型等一系列关键技术,最后以某地区电信客户管理为建模示例.结果表明:RS-NN模型在模型结构、模型效率、分类预测精度方面均优于BP-NN算法,是一种有效和实用的客户分类新方法.%The customer consumption classification topic is receiving increasing attention from researchers in the field of customer relationship management.The current research on customer consumption classification can be funber improved in many areas. For instance, customer consumption classification models should take into consideration multidimensional and other related consumption attributes into classificaticu analysis,avoidance of attribute redundancy, and selection of core classification attributes. Customer consumption models should identify input neurons,hidden layers and hidden neurons in order to reduce the complexity of classification structure and improve model's explanatory power. Existing classification methods are not effective at representing the inconsistency of consumption attributes and classes.JPThis paper proposed a customer consumption classification model by integrating rough set and neural networks based on the rough setneural network (RS-NN) model. Rongh set is the core theory underpinning this study. This paper reduced attribute values and adopted core consumption attributes in order to solve attribute redundancy and inconsistency problems. This paper also used customer classification roles and solved attribute inconsistency problems. In addition, by integrating classification

  14. Classification/Categorization Model of Instruction for Learning Disabled Students.

    Science.gov (United States)

    Freund, Lisa A.

    1987-01-01

    Learning-disabled students deficient in classification and categorization require specific instruction in these skills. Use of a classification/categorization instructional model improved the questioning strategies of 60 learning-disabled students, aged 10 to 12. The use of similar models is discussed as a basis for instruction in science, social…

  15. Spectrum-based kernel length estimation for Gaussian process classification.

    Science.gov (United States)

    Wang, Liang; Li, Chuan

    2014-06-01

    Recent studies have shown that Gaussian process (GP) classification, a discriminative supervised learning approach, has achieved competitive performance in real applications compared with most state-of-the-art supervised learning methods. However, the problem of automatic model selection in GP classification, involving the kernel function form and the corresponding parameter values (which are unknown in advance), remains a challenge. To make GP classification a more practical tool, this paper presents a novel spectrum analysis-based approach for model selection by refining the GP kernel function to match the given input data. Specifically, we target the problem of GP kernel length scale estimation. Spectrums are first calculated analytically from the kernel function itself using the autocorrelation theorem as well as being estimated numerically from the training data themselves. Then, the kernel length scale is automatically estimated by equating the two spectrum values, i.e., the kernel function spectrum equals to the estimated training data spectrum. Compared with the classical Bayesian method for kernel length scale estimation via maximizing the marginal likelihood (which is time consuming and could suffer from multiple local optima), extensive experimental results on various data sets show that our proposed method is both efficient and accurate.

  16. RECURSIVE CLASSIFICATION OF MQAM SIGNALS BASED ON HIGHER ORDER CUMULANTS

    Institute of Scientific and Technical Information of China (English)

    Chen Weidong; Yang Shaoquan

    2002-01-01

    A new feature based on higher order cumulants is proposed for classification of MQAM signals. Theoretical analysis justify that the new feature is invariant with respect to translation (shift), scale and rotation transform of signal constellations, and can suppress color or white additive Gaussian noise. Computer simulation shows that the proposed recursive orderreduction based classification algorithm can classify MQAM signals with any order.

  17. Spectral-Spatial Hyperspectral Image Classification Based on KNN

    Science.gov (United States)

    Huang, Kunshan; Li, Shutao; Kang, Xudong; Fang, Leyuan

    2016-12-01

    Fusion of spectral and spatial information is an effective way in improving the accuracy of hyperspectral image classification. In this paper, a novel spectral-spatial hyperspectral image classification method based on K nearest neighbor (KNN) is proposed, which consists of the following steps. First, the support vector machine is adopted to obtain the initial classification probability maps which reflect the probability that each hyperspectral pixel belongs to different classes. Then, the obtained pixel-wise probability maps are refined with the proposed KNN filtering algorithm that is based on matching and averaging nonlocal neighborhoods. The proposed method does not need sophisticated segmentation and optimization strategies while still being able to make full use of the nonlocal principle of real images by using KNN, and thus, providing competitive classification with fast computation. Experiments performed on two real hyperspectral data sets show that the classification results obtained by the proposed method are comparable to several recently proposed hyperspectral image classification methods.

  18. Integrating Globality and Locality for Robust Representation Based Classification

    Directory of Open Access Journals (Sweden)

    Zheng Zhang

    2014-01-01

    Full Text Available The representation based classification method (RBCM has shown huge potential for face recognition since it first emerged. Linear regression classification (LRC method and collaborative representation classification (CRC method are two well-known RBCMs. LRC and CRC exploit training samples of each class and all the training samples to represent the testing sample, respectively, and subsequently conduct classification on the basis of the representation residual. LRC method can be viewed as a “locality representation” method because it just uses the training samples of each class to represent the testing sample and it cannot embody the effectiveness of the “globality representation.” On the contrary, it seems that CRC method cannot own the benefit of locality of the general RBCM. Thus we propose to integrate CRC and LRC to perform more robust representation based classification. The experimental results on benchmark face databases substantially demonstrate that the proposed method achieves high classification accuracy.

  19. MODELING DYNAMIC VEGETATION RESPONSE TO RAPID CLIMATE CHANGE USING BIOCLIMATIC CLASSIFICATION

    Science.gov (United States)

    Modeling potential global redistribution of terrestrial vegetation frequently is based on bioclimatic classifications which relate static regional vegetation zones (biomes) to a set of static climate parameters. The equilibrium character of the relationships limits our confidence...

  20. Semi-supervised classification of remote sensing image based on probabilistic topic model%利用概率主题模型的遥感影像半监督分类

    Institute of Scientific and Technical Information of China (English)

    易文斌; 冒亚明; 慎利

    2013-01-01

    Land cover is the center of the interaction of the natural environment and human activities and the acquisition of land cover information are obtained through the classification of remote sensing images, so the image classification is one of the most basic issues of remote sensing image analysis. Based on the image clustering analysis of high-resolution remote sensing image through the probabilistic topic model, the generated model which is a typical method in the semi-supervised learning is analyzed and a classification method based on probabilistic topic model and semi-supervised learning(SS-LDA)is formed in the paper. The process of SS-LDA model used in the text recognition applications is relearned and a basic image classification process of high-resolution remote sensing image is constructed. Comparing to traditional unsupervised classification and supervised classi-fication algorithm, the SS-LDA algorithm will get more accuracy of image classification results through experiments.%  土地覆盖是自然环境与人类活动相互作用的中心,而土地覆盖信息主要是通过遥感影像分类来获取,因此影像分类是遥感影像分析的最基本问题之一。在参考基于概率主题模型的高分辨率遥感影像聚类分析的基础上,通过半监督学习最典型的生成模型方法引出了基于概率主题模型的半监督分类(SS-LDA)算法。借鉴SS-LDA模型在文本识别应用的流程,构建了基于SS-LDA算法的高分辨率遥感影像分类的基本流程。通过实验证明,相对于传统的非监督分类与监督分类算法,SS-LDA算法能够获取较高精度的影像分类结果。

  1. A new classification algorithm based on RGH-tree search

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    In this paper, we put forward a new classification algorithm based on RGH-Tree search and perform the classification analysis and comparison study. This algorithm can save computing resource and increase the classification efficiency. The experiment shows that this algorithm can get better effect in dealing with three dimensional multi-kind data. We find that the algorithm has better generalization ability for small training set and big testing result.

  2. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

    KAUST Repository

    Najibi, Seyed Morteza

    2017-02-08

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  3. A deep learning approach to the classification of 3D CAD models

    Institute of Scientific and Technical Information of China (English)

    Fei-wei QIN; Lu-ye LI; Shu-ming GAO; Xiao-ling YANG; Xiang CHEN

    2014-01-01

    Model classification is essential to the management and reuse of 3D CAD models. Manual model classification is laborious and error prone. At the same time, the automatic classification methods are scarce due to the intrinsic complexity of 3D CAD models. In this paper, we propose an automatic 3D CAD model classification approach based on deep neural networks. According to prior knowledge of the CAD domain, features are selected and extracted from 3D CAD models first, and then pre-processed as high dimensional input vectors for category recognition. By analogy with the thinking process of engineers, a deep neural network classifier for 3D CAD models is constructed with the aid of deep learning techniques. To obtain an optimal solution, multiple strategies are appropriately chosen and applied in the training phase, which makes our classifier achieve better per-formance. We demonstrate the efficiency and effectiveness of our approach through experiments on 3D CAD model datasets.

  4. Hierarchical structure for audio-video based semantic classification of sports video sequences

    Science.gov (United States)

    Kolekar, M. H.; Sengupta, S.

    2005-07-01

    A hierarchical structure for sports event classification based on audio and video content analysis is proposed in this paper. Compared to the event classifications in other games, those of cricket are very challenging and yet unexplored. We have successfully solved cricket video classification problem using a six level hierarchical structure. The first level performs event detection based on audio energy and Zero Crossing Rate (ZCR) of short-time audio signal. In the subsequent levels, we classify the events based on video features using a Hidden Markov Model implemented through Dynamic Programming (HMM-DP) using color or motion as a likelihood function. For some of the game-specific decisions, a rule-based classification is also performed. Our proposed hierarchical structure can easily be applied to any other sports. Our results are very promising and we have moved a step forward towards addressing semantic classification problems in general.

  5. Quality-Oriented Classification of Aircraft Material Based on SVM

    Directory of Open Access Journals (Sweden)

    Hongxia Cai

    2014-01-01

    Full Text Available The existing material classification is proposed to improve the inventory management. However, different materials have the different quality-related attributes, especially in the aircraft industry. In order to reduce the cost without sacrificing the quality, we propose a quality-oriented material classification system considering the material quality character, Quality cost, and Quality influence. Analytic Hierarchy Process helps to make feature selection and classification decision. We use the improved Kraljic Portfolio Matrix to establish the three-dimensional classification model. The aircraft materials can be divided into eight types, including general type, key type, risk type, and leveraged type. Aiming to improve the classification accuracy of various materials, the algorithm of Support Vector Machine is introduced. Finally, we compare the SVM and BP neural network in the application. The results prove that the SVM algorithm is more efficient and accurate and the quality-oriented material classification is valuable.

  6. Development Of An Econometric Model Case Study: Romanian Classification System

    Directory of Open Access Journals (Sweden)

    Savescu Roxana

    2015-08-01

    Full Text Available The purpose of the paper is to illustrate an econometric model used to predict the lean meat content in pig carcasses, based on the muscle thickness and back fat thickness measured by the means of an optical probe (OptiGrade PRO.The analysis goes through all steps involved in the development of the model: statement of theory, specification of the mathematical model, sampling and collection of data, estimation of the parameters of the chosen econometric model, tests of the hypothesis derived from the model and prediction equations. The data have been in a controlled experiment conducted by the Romanian Carcass Classification Commission in 2007. The purpose of the experiment was to develop the prediction formulae to be used in the implementation of SEUROP classification system, imposed by European Union legislation. The research methodology used by the author in this study consisted in reviewing the existing literature and normative acts, analyzing the primary data provided by and organization conducting the experiment and interviewing the representatives of the working team that participated in the trial.

  7. Geographical classification of apple based on hyperspectral imaging

    Science.gov (United States)

    Guo, Zhiming; Huang, Wenqian; Chen, Liping; Zhao, Chunjiang; Peng, Yankun

    2013-05-01

    Attribute of apple according to geographical origin is often recognized and appreciated by the consumers. It is usually an important factor to determine the price of a commercial product. Hyperspectral imaging technology and supervised pattern recognition was attempted to discriminate apple according to geographical origins in this work. Hyperspectral images of 207 Fuji apple samples were collected by hyperspectral camera (400-1000nm). Principal component analysis (PCA) was performed on hyperspectral imaging data to determine main efficient wavelength images, and then characteristic variables were extracted by texture analysis based on gray level co-occurrence matrix (GLCM) from dominant waveband image. All characteristic variables were obtained by fusing the data of images in efficient spectra. Support vector machine (SVM) was used to construct the classification model, and showed excellent performance in classification results. The total classification rate had the high classify accuracy of 92.75% in the training set and 89.86% in the prediction sets, respectively. The overall results demonstrated that the hyperspectral imaging technique coupled with SVM classifier can be efficiently utilized to discriminate Fuji apple according to geographical origins.

  8. A Novel Imbalanced Data Classification Approach Based on Logistic Regression and Fisher Discriminant

    Directory of Open Access Journals (Sweden)

    Baofeng Shi

    2015-01-01

    Full Text Available We introduce an imbalanced data classification approach based on logistic regression significant discriminant and Fisher discriminant. First of all, a key indicators extraction model based on logistic regression significant discriminant and correlation analysis is derived to extract features for customer classification. Secondly, on the basis of the linear weighted utilizing Fisher discriminant, a customer scoring model is established. And then, a customer rating model where the customer number of all ratings follows normal distribution is constructed. The performance of the proposed model and the classical SVM classification method are evaluated in terms of their ability to correctly classify consumers as default customer or nondefault customer. Empirical results using the data of 2157 customers in financial engineering suggest that the proposed approach better performance than the SVM model in dealing with imbalanced data classification. Moreover, our approach contributes to locating the qualified customers for the banks and the bond investors.

  9. Model classification rate control algorithm for video coding

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    A model classification rate control method for video coding is proposed. The macro-blocks are classified according to their prediction errors, and different parameters are used in the rate-quantization and distortion-quantization model.The different model parameters are calculated from the previous frame of the same type in the process of coding. These models are used to estimate the relations among rate, distortion and quantization of the current frame. Further steps,such as R-D optimization based quantization adjustment and smoothing of quantization of adjacent macroblocks, are used to improve the quality. The results of the experiments prove that the technique is effective and can be realized easily. The method presented in the paper can be a good way for MPEG and H. 264 rate control.

  10. Music Genre Classification using the multivariate AR feature integration model

    DEFF Research Database (Denmark)

    Ahrendt, Peter; Meng, Anders

    2005-01-01

    informative decisions about musical genre. For the MIREX music genre contest several authors derive long time features based either on statistical moments and/or temporal structure in the short time features. In our contribution we model a segment (1.2 s) of short time features (texture) using a multivariate......Music genre classification systems are normally build as a feature extraction module followed by a classifier. The features are often short-time features with time frames of 10-30ms, although several characteristics of music require larger time scales. Thus, larger time frames are needed to take...... autoregressive model. Other authors have applied simpler statistical models such as the mean-variance model, which also has been included in several of this years MIREX submissions, see e.g. Tzanetakis (2005); Burred (2005); Bergstra et al. (2005); Lidy and Rauber (2005)....

  11. AN OBJECT-BASED METHOD FOR CHINESE LANDFORM TYPES CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    H. Ding

    2016-06-01

    Full Text Available Landform classification is a necessary task for various fields of landscape and regional planning, for example for landscape evaluation, erosion studies, hazard prediction, et al. This study proposes an improved object-based classification for Chinese landform types using the factor importance analysis of random forest and the gray-level co-occurrence matrix (GLCM. In this research, based on 1km DEM of China, the combination of the terrain factors extracted from DEM are selected by correlation analysis and Sheffield's entropy method. Random forest classification tree is applied to evaluate the importance of the terrain factors, which are used as multi-scale segmentation thresholds. Then the GLCM is conducted for the knowledge base of classification. The classification result was checked by using the 1:4,000,000 Chinese Geomorphological Map as reference. And the overall classification accuracy of the proposed method is 5.7% higher than ISODATA unsupervised classification, and 15.7% higher than the traditional object-based classification method.

  12. An Object-Based Method for Chinese Landform Types Classification

    Science.gov (United States)

    Ding, Hu; Tao, Fei; Zhao, Wufan; Na, Jiaming; Tang, Guo'an

    2016-06-01

    Landform classification is a necessary task for various fields of landscape and regional planning, for example for landscape evaluation, erosion studies, hazard prediction, et al. This study proposes an improved object-based classification for Chinese landform types using the factor importance analysis of random forest and the gray-level co-occurrence matrix (GLCM). In this research, based on 1km DEM of China, the combination of the terrain factors extracted from DEM are selected by correlation analysis and Sheffield's entropy method. Random forest classification tree is applied to evaluate the importance of the terrain factors, which are used as multi-scale segmentation thresholds. Then the GLCM is conducted for the knowledge base of classification. The classification result was checked by using the 1:4,000,000 Chinese Geomorphological Map as reference. And the overall classification accuracy of the proposed method is 5.7% higher than ISODATA unsupervised classification, and 15.7% higher than the traditional object-based classification method.

  13. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    Science.gov (United States)

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  14. Fast Wavelet-Based Visual Classification

    CERN Document Server

    Yu, Guoshen

    2008-01-01

    We investigate a biologically motivated approach to fast visual classification, directly inspired by the recent work of Serre et al. Specifically, trading-off biological accuracy for computational efficiency, we explore using wavelet and grouplet-like transforms to parallel the tuning of visual cortex V1 and V2 cells, alternated with max operations to achieve scale and translation invariance. A feature selection procedure is applied during learning to accelerate recognition. We introduce a simple attention-like feedback mechanism, significantly improving recognition and robustness in multiple-object scenes. In experiments, the proposed algorithm achieves or exceeds state-of-the-art success rate on object recognition, texture and satellite image classification, language identification and sound classification.

  15. Shape classification based on singular value decomposition transform

    Institute of Scientific and Technical Information of China (English)

    SHAABAN Zyad; ARIF Thawar; BABA Sami; KREKOR Lala

    2009-01-01

    In this paper, a new shape classification system based on singular value decomposition (SVD) transform using nearest neighbour classifier was proposed. The gray scale image of the shape object was converted into a black and white image. The squared Euclidean distance transform on binary image was applied to extract the boundary image of the shape. SVD transform features were extracted from the the boundary of the object shapes. In this paper, the proposed classification system based on SVD transform feature extraction method was compared with classifier based on moment invariants using nearest neighbour classifier. The experimental results showed the advantage of our proposed classification system.

  16. Multiclass Classification Based on the Analytical Center of Version Space

    Institute of Scientific and Technical Information of China (English)

    ZENGFanzi; QIUZhengding; YUEJianhai; LIXiangqian

    2005-01-01

    Analytical center machine, based on the analytical center of version space, outperforms support vector machine, especially when the version space is elongated or asymmetric. While analytical center machine for binary classification is well understood, little is known about corresponding multiclass classification.Moreover, considering that the current multiclass classification method: “one versus all” needs repeatedly constructing classifiers to separate a single class from all the others, which leads to daunting computation and low efficiency of classification, and that though multiclass support vector machine corresponds to a simple quadratic optimization, it is not very effective when the version spaceis asymmetric or elongated, Thus, the multiclass classification approach based on the analytical center of version space is proposed to address the above problems. Experiments on wine recognition and glass identification dataset demonstrate validity of the approach proposed.

  17. An Efficient Audio Classification Approach Based on Support Vector Machines

    Directory of Open Access Journals (Sweden)

    Lhoucine Bahatti

    2016-05-01

    Full Text Available In order to achieve an audio classification aimed to identify the composer, the use of adequate and relevant features is important to improve performance especially when the classification algorithm is based on support vector machines. As opposed to conventional approaches that often use timbral features based on a time-frequency representation of the musical signal using constant window, this paper deals with a new audio classification method which improves the features extraction according the Constant Q Transform (CQT approach and includes original audio features related to the musical context in which the notes appear. The enhancement done by this work is also lay on the proposal of an optimal features selection procedure which combines filter and wrapper strategies. Experimental results show the accuracy and efficiency of the adopted approach in the binary classification as well as in the multi-class classification.

  18. A NEW WASTE CLASSIFYING MODEL: HOW WASTE CLASSIFICATION CAN BECOME MORE OBJECTIVE?

    Directory of Open Access Journals (Sweden)

    Burcea Stefan Gabriel

    2015-07-01

    Full Text Available The waste management specialist must be able to identify and analyze waste generation sources and to propose proper solutions to prevent the waste generation and encurage the waste minimisation. In certain situations like implementing an integrated waste management sustem and configure the waste collection methods and capacities, practitioners can face the challenge to classify the generated waste. This will tend to be the more demanding as the literature does not provide a coherent system of criteria required for an objective waste classification process. The waste incineration will determine no doubt a different waste classification than waste composting or mechanical and biological treatment. In this case the main question is what are the proper classification criteria witch can be used to realise an objective waste classification? The article provide a short critical literature review of the existing waste classification criteria and suggests the conclusion that the literature can not provide unitary waste classification system which is unanimously accepted and assumed by ideologists and practitioners. There are various classification criteria and more interesting perspectives in the literature regarding the waste classification, but the most common criteria based on which specialists classify waste into several classes, categories and types are the generation source, physical and chemical features, aggregation state, origin or derivation, hazardous degree etc. The traditional classification criteria divided waste into various categories, subcategories and types; such an approach is a conjectural one because is inevitable that according to the context in which the waste classification is required the used criteria to differ significantly; hence the need to uniformizating the waste classification systems. For the first part of the article it has been used indirect observation research method by analyzing the literature and the various

  19. Compensatory neurofuzzy model for discrete data classification in biomedical

    Science.gov (United States)

    Ceylan, Rahime

    2015-03-01

    Biomedical data is separated to two main sections: signals and discrete data. So, studies in this area are about biomedical signal classification or biomedical discrete data classification. There are artificial intelligence models which are relevant to classification of ECG, EMG or EEG signals. In same way, in literature, many models exist for classification of discrete data taken as value of samples which can be results of blood analysis or biopsy in medical process. Each algorithm could not achieve high accuracy rate on classification of signal and discrete data. In this study, compensatory neurofuzzy network model is presented for classification of discrete data in biomedical pattern recognition area. The compensatory neurofuzzy network has a hybrid and binary classifier. In this system, the parameters of fuzzy systems are updated by backpropagation algorithm. The realized classifier model is conducted to two benchmark datasets (Wisconsin Breast Cancer dataset and Pima Indian Diabetes dataset). Experimental studies show that compensatory neurofuzzy network model achieved 96.11% accuracy rate in classification of breast cancer dataset and 69.08% accuracy rate was obtained in experiments made on diabetes dataset with only 10 iterations.

  20. Robust Pedestrian Classification Based on Hierarchical Kernel Sparse Representation

    Directory of Open Access Journals (Sweden)

    Rui Sun

    2016-08-01

    Full Text Available Vision-based pedestrian detection has become an active topic in computer vision and autonomous vehicles. It aims at detecting pedestrians appearing ahead of the vehicle using a camera so that autonomous vehicles can assess the danger and take action. Due to varied illumination and appearance, complex background and occlusion pedestrian detection in outdoor environments is a difficult problem. In this paper, we propose a novel hierarchical feature extraction and weighted kernel sparse representation model for pedestrian classification. Initially, hierarchical feature extraction based on a CENTRIST descriptor is used to capture discriminative structures. A max pooling operation is used to enhance the invariance of varying appearance. Then, a kernel sparse representation model is proposed to fully exploit the discrimination information embedded in the hierarchical local features, and a Gaussian weight function as the measure to effectively handle the occlusion in pedestrian images. Extensive experiments are conducted on benchmark databases, including INRIA, Daimler, an artificially generated dataset and a real occluded dataset, demonstrating the more robust performance of the proposed method compared to state-of-the-art pedestrian classification methods.

  1. Prediction of Breast Cancer using Rule Based Classification

    Directory of Open Access Journals (Sweden)

    Nagendra Kumar SINGH

    2015-12-01

    Full Text Available The current work proposes a model for prediction of breast cancer using the classification approach in data mining. The proposed model is based on various parameters, including symptoms of breast cancer, gene mutation and other risk factors causing breast cancer. Mutations have been predicted in breast cancer causing genes with the help of alignment of normal and abnormal gene sequences; then predicting the class label of breast cancer (risky or safe on the basis of IF-THEN rules, using Genetic Algorithm (GA. In this work, GA has used variable gene encoding mechanisms for chromosomes encoding, uniform population generations and selects two chromosomes by Roulette-Wheel selection technique for two-point crossover, which gives better solutions. The performance of the model is evaluated using the F score measure, Matthews Correlation Coefficient (MCC and Receiver Operating Characteristic (ROC by plotting points (Sensitivity V/s 1- Specificity.

  2. Automated glioblastoma segmentation based on a multiparametric structured unsupervised classification.

    Science.gov (United States)

    Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V; Robles, Montserrat; Aparici, F; Martí-Bonmatí, L; García-Gómez, Juan M

    2015-01-01

    Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation.

  3. Automated glioblastoma segmentation based on a multiparametric structured unsupervised classification.

    Directory of Open Access Journals (Sweden)

    Javier Juan-Albarracín

    Full Text Available Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM, whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF. An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation.

  4. Dihedral-Based Segment Identification and Classification of Biopolymers II: Polynucleotides

    Science.gov (United States)

    2013-01-01

    In an accompanying paper (Nagy, G.; Oostenbrink, C. Dihedral-based segment identification and classification of biopolymers I: Proteins. J. Chem. Inf. Model. 2013, DOI: 10.1021/ci400541d), we introduce a new algorithm for structure classification of biopolymeric structures based on main-chain dihedral angles. The DISICL algorithm (short for DIhedral-based Segment Identification and CLassification) classifies segments of structures containing two central residues. Here, we introduce the DISICL library for polynucleotides, which is based on the dihedral angles ε, ζ, and χ for the two central residues of a three-nucleotide segment of a single strand. Seventeen distinct structural classes are defined for nucleotide structures, some of which—to our knowledge—were not described previously in other structure classification algorithms. In particular, DISICL also classifies noncanonical single-stranded structural elements. DISICL is applied to databases of DNA and RNA structures containing 80,000 and 180,000 segments, respectively. The classifications according to DISICL are compared to those of another popular classification scheme in terms of the amount of classified nucleotides, average occurrence and length of structural elements, and pairwise matches of the classifications. While the detailed classification of DISICL adds sensitivity to a structure analysis, it can be readily reduced to eight simplified classes providing a more general overview of the secondary structure in polynucleotides. PMID:24364355

  5. Dihedral-based segment identification and classification of biopolymers II: polynucleotides.

    Science.gov (United States)

    Nagy, Gabor; Oostenbrink, Chris

    2014-01-27

    In an accompanying paper (Nagy, G.; Oostenbrink, C. Dihedral-based segment identification and classification of biopolymers I: Proteins. J. Chem. Inf. Model. 2013, DOI: 10.1021/ci400541d), we introduce a new algorithm for structure classification of biopolymeric structures based on main-chain dihedral angles. The DISICL algorithm (short for DIhedral-based Segment Identification and CLassification) classifies segments of structures containing two central residues. Here, we introduce the DISICL library for polynucleotides, which is based on the dihedral angles ε, ζ, and χ for the two central residues of a three-nucleotide segment of a single strand. Seventeen distinct structural classes are defined for nucleotide structures, some of which--to our knowledge--were not described previously in other structure classification algorithms. In particular, DISICL also classifies noncanonical single-stranded structural elements. DISICL is applied to databases of DNA and RNA structures containing 80,000 and 180,000 segments, respectively. The classifications according to DISICL are compared to those of another popular classification scheme in terms of the amount of classified nucleotides, average occurrence and length of structural elements, and pairwise matches of the classifications. While the detailed classification of DISICL adds sensitivity to a structure analysis, it can be readily reduced to eight simplified classes providing a more general overview of the secondary structure in polynucleotides.

  6. Classification

    Science.gov (United States)

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  7. Models for warehouse management: classification and examples

    NARCIS (Netherlands)

    Berg, van den J.P.; Zijm, W.H.M.

    1999-01-01

    In this paper we discuss warehousing systems and present a classification of warehouse management problems. We start with a typology and a brief description of several types of warehousing systems. Next, we present a hierarchy of decision problems encountered in setting up warehousing systems, inclu

  8. Maximum-margin based representation learning from multiple atlases for Alzheimer's disease classification.

    Science.gov (United States)

    Min, Rui; Cheng, Jian; Price, True; Wu, Guorong; Shen, Dinggang

    2014-01-01

    In order to establish the correspondences between different brains for comparison, spatial normalization based morphometric measurements have been widely used in the analysis of Alzheimer's disease (AD). In the literature, different subjects are often compared in one atlas space, which may be insufficient in revealing complex brain changes. In this paper, instead of deploying one atlas for feature extraction and classification, we propose a maximum-margin based representation learning (MMRL) method to learn the optimal representation from multiple atlases. Unlike traditional methods that perform the representation learning separately from the classification, we propose to learn the new representation jointly with the classification model, which is more powerful in discriminating AD patients from normal controls (NC). We evaluated the proposed method on the ADNI database, and achieved 90.69% for AD/NC classification and 73.69% for p-MCI/s-MCI classification.

  9. Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model

    Science.gov (United States)

    Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.

    2017-03-01

    This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.

  10. Models of parallel computation :a survey and classification

    Institute of Scientific and Technical Information of China (English)

    ZHANG Yunquan; CHEN Guoliang; SUN Guangzhong; MIAO Qiankun

    2007-01-01

    In this paper,the state-of-the-art parallel computational model research is reviewed.We will introduce various models that were developed during the past decades.According to their targeting architecture features,especially memory organization,we classify these parallel computational models into three generations.These models and their characteristics are discussed based on three generations classification.We believe that with the ever increasing speed gap between the CPU and memory systems,incorporating non-uniform memory hierarchy into computational models will become unavoidable.With the emergence of multi-core CPUs,the parallelism hierarchy of current computing platforms becomes more and more complicated.Describing this complicated parallelism hierarchy in future computational models becomes more and more important.A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity,thus allowing more complicated models with more parameters to be adopted.Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.

  11. Computer vision-based limestone rock-type classification using probabilistic neural network

    Institute of Scientific and Technical Information of China (English)

    Ashok Kumar Patel; Snehamoy Chatterjee

    2016-01-01

    Proper quality planning of limestone raw materials is an essential job of maintaining desired feed in cement plant. Rock-type identification is an integrated part of quality planning for limestone mine. In this paper, a computer vision-based rock-type classification algorithm is proposed for fast and reliable identification without human intervention. A laboratory scale vision-based model was developed using probabilistic neural network (PNN) where color histogram features are used as input. The color image histogram-based features that include weighted mean, skewness and kurtosis features are extracted for all three color space red, green, and blue. A total nine features are used as input for the PNN classification model. The smoothing parameter for PNN model is selected judicially to develop an optimal or close to the optimum classification model. The developed PPN is validated using the test data set and results reveal that the proposed vision-based model can perform satisfactorily for classifying limestone rock-types. Overall the error of mis-classification is below 6%. When compared with other three classifica-tion algorithms, it is observed that the proposed method performs substantially better than all three classification algorithms.

  12. Computer vision-based limestone rock-type classification using probabilistic neural network

    Directory of Open Access Journals (Sweden)

    Ashok Kumar Patel

    2016-01-01

    Full Text Available Proper quality planning of limestone raw materials is an essential job of maintaining desired feed in cement plant. Rock-type identification is an integrated part of quality planning for limestone mine. In this paper, a computer vision-based rock-type classification algorithm is proposed for fast and reliable identification without human intervention. A laboratory scale vision-based model was developed using probabilistic neural network (PNN where color histogram features are used as input. The color image histogram-based features that include weighted mean, skewness and kurtosis features are extracted for all three color space red, green, and blue. A total nine features are used as input for the PNN classification model. The smoothing parameter for PNN model is selected judicially to develop an optimal or close to the optimum classification model. The developed PPN is validated using the test data set and results reveal that the proposed vision-based model can perform satisfactorily for classifying limestone rock-types. Overall the error of mis-classification is below 6%. When compared with other three classification algorithms, it is observed that the proposed method performs substantially better than all three classification algorithms.

  13. Power Disturbances Classification Using S-Transform Based GA-PNN

    Science.gov (United States)

    Manimala, K.; Selvi, K.

    2015-09-01

    The significance of detection and classification of power quality events that disturb the voltage and/or current waveforms in the electrical power distribution networks is well known. Consequently, in spite of a large number of research reports in this area, a research on the selection of proper parameter for specific classifiers was so far not explored. The parameter selection is very important for successful modelling of input-output relationship in a function approximation model. In this study, probabilistic neural network (PNN) has been used as a function approximation tool for power disturbance classification and genetic algorithm (GA) is utilised for optimisation of the smoothing parameter of the PNN. The important features extracted from raw power disturbance signal using S-Transform are given to the PNN for effective classification. The choice of smoothing parameter for PNN classifier will significantly impact the classification accuracy. Hence, GA based parameter optimization is done to ensure good classification accuracy by selecting suitable parameter of the PNN classifier. Testing results show that the proposed S-Transform based GA-PNN model has better classification ability than classifiers based on conventional grid search method for parameter selection. The noisy and practical signals are considered for the classification process to show the effectiveness of the proposed method in comparison with existing methods.

  14. Hydrological landscape classification: investigating the performance of HAND based landscape classifications in a central European meso-scale catchment

    Directory of Open Access Journals (Sweden)

    S. Gharari

    2011-11-01

    Full Text Available This paper presents a detailed performance and sensitivity analysis of a recently developed hydrological landscape classification method based on dominant runoff mechanisms. Three landscape classes are distinguished: wetland, hillslope and plateau, corresponding to three dominant hydrological regimes: saturation excess overland flow, storage excess sub-surface flow, and deep percolation. Topography, geology and land use hold the key to identifying these landscapes. The height above the nearest drainage (HAND and the surface slope, which can be easily obtained from a digital elevation model, appear to be the dominant topographical controls for hydrological classification. In this paper several indicators for classification are tested as well as their sensitivity to scale and resolution of observed points (sample size. The best results are obtained by the simple use of HAND and slope. The results obtained compared well with the topographical wetness index. The HAND based landscape classification appears to be an efficient method to ''read the landscape'' on the basis of which conceptual models can be developed.

  15. 一种基于模糊模型相似测量的字符无监督分类法%An Approach to Unsupervised Character Classification Based on Similarity Measure in Fuzzy Model

    Institute of Scientific and Technical Information of China (English)

    卢达; 钱忆平; 谢铭培; 浦炜

    2002-01-01

    提出了一种能有效完成对无监督字符分类的模糊逻辑方法,以提高字符识别系统的速度,正确性和鲁棒性.字符首先被分为8种印刷结构类,然后采用模式匹配方法将各类字符分别转换成基于一非线性加权相似函数的模糊样板集合.模糊无监督字符的分类是字符匹配的一种自然范例并发展了加权模糊相似测量的研究.本文讨论了该模糊模型的特性并用以加快字符分类处理,经过字符分类,在字符识别时由于只需针对较小的模糊样板集合而变得容易和快速.%This paper presents a fuzzy logic approach to efficiently perform unsupervised character classification for improvement in robustness, correctness and speed of a character recognition system. The characters are first split into eight typographical categories. The classification scheme uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. The fuzzy unsupervised character classification, which is natural in the representation of prototypes for character matching, is developed and a weighted fuzzy similarity measure is explored.The characteristics of the fuzzy model are discussed and used in speeding up the classification process. After classification, the character recognition which is simply applied on a smaller set of the fuzzy prototypes, becomes much easier and less time-consuming.

  16. Speech Segregation based on Binary Classification

    Science.gov (United States)

    2016-07-15

    other provision of law, no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a...to the adoption of the ideal ratio mask (IRM). A subsequent listening evaluation shows increased intelligibility in noise for human listeners...15. SUBJECT TERMS Binary classification, time-frequency masking, supervised speech segregation, speech intelligibility , room reverberation 16

  17. Radar Image Texture Classification based on Gabor Filter Bank

    Directory of Open Access Journals (Sweden)

    Mbainaibeye Jérôme

    2014-01-01

    Full Text Available The aim of this paper is to design and develop a filter bank for the detection and classification of radar image texture with 4.6m resolution obtained by airborne synthetic Aperture Radar. The textures of this kind of images are more correlated and contain forms with random disposition. The design and the developing of the filter bank is based on Gabor filter. We have elaborated a set of filters applied to each set of feature texture allowing its identification and enhancement in comparison with other textures. The filter bank which we have elaborated is represented by a combination of different texture filters. After processing, the selected filter bank is the filter bank which allows the identification of all the textures of an image with a significant identification rate. This developed filter is applied to radar image and the obtained results are compared with those obtained by using filter banks issue from the generalized Gaussian models (GGM. We have shown that Gabor filter developed in this work gives the classification rate greater than the results obtained by Generalized Gaussian model. The main contribution of this work is the generation of the filter banks able to give an optimal filter bank for a given texture and in particular for radar image textures

  18. Neighborhood Hypergraph Based Classification Algorithm for Incomplete Information System

    Directory of Open Access Journals (Sweden)

    Feng Hu

    2015-01-01

    Full Text Available The problem of classification in incomplete information system is a hot issue in intelligent information processing. Hypergraph is a new intelligent method for machine learning. However, it is hard to process the incomplete information system by the traditional hypergraph, which is due to two reasons: (1 the hyperedges are generated randomly in traditional hypergraph model; (2 the existing methods are unsuitable to deal with incomplete information system, for the sake of missing values in incomplete information system. In this paper, we propose a novel classification algorithm for incomplete information system based on hypergraph model and rough set theory. Firstly, we initialize the hypergraph. Second, we classify the training set by neighborhood hypergraph. Third, under the guidance of rough set, we replace the poor hyperedges. After that, we can obtain a good classifier. The proposed approach is tested on 15 data sets from UCI machine learning repository. Furthermore, it is compared with some existing methods, such as C4.5, SVM, NavieBayes, and KNN. The experimental results show that the proposed algorithm has better performance via Precision, Recall, AUC, and F-measure.

  19. Vertebrae classification models - Validating classification models that use morphometrics to identify ancient salmonid (Oncorhynchus spp.) vertebrae to species

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Using morphometric characteristics of modern salmonid (Oncorhynchus spp.) vertebrae, we have developed classification models to identify salmonid vertebrae to the...

  20. Highly comparative, feature-based time-series classification

    CERN Document Server

    Fulcher, Ben D

    2014-01-01

    A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on ve...

  1. Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis

    Directory of Open Access Journals (Sweden)

    S. Muthurajkumar

    2014-05-01

    Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.

  2. A Soft Intelligent Risk Evaluation Model for Credit Scoring Classification

    Directory of Open Access Journals (Sweden)

    Mehdi Khashei

    2015-09-01

    Full Text Available Risk management is one of the most important branches of business and finance. Classification models are the most popular and widely used analytical group of data mining approaches that can greatly help financial decision makers and managers to tackle credit risk problems. However, the literature clearly indicates that, despite proposing numerous classification models, credit scoring is often a difficult task. On the other hand, there is no universal credit-scoring model in the literature that can be accurately and explanatorily used in all circumstances. Therefore, the research for improving the efficiency of credit-scoring models has never stopped. In this paper, a hybrid soft intelligent classification model is proposed for credit-scoring problems. In the proposed model, the unique advantages of the soft computing techniques are used in order to modify the performance of the traditional artificial neural networks in credit scoring. Empirical results of Australian credit card data classifications indicate that the proposed hybrid model outperforms its components, and also other classification models presented for credit scoring. Therefore, the proposed model can be considered as an appropriate alternative tool for binary decision making in business and finance, especially in high uncertainty conditions.

  3. EEG Signal Classification With Super-Dirichlet Mixture Model

    DEFF Research Database (Denmark)

    Ma, Zhanyu; Tan, Zheng-Hua; Prasad, Swati

    2012-01-01

    Classification of the Electroencephalogram (EEG) signal is a challengeable task in the brain-computer interface systems. The marginalized discrete wavelet transform (mDWT) coefficients extracted from the EEG signals have been frequently used in researches since they reveal features related to the...... vector machine (SVM) based classifier, the SDMM based classifier performs more stable and shows a promising improvement, with both channel selection strategies....... by the Dirichlet distribution and the distribution of the mDWT coefficients from more than one channels is described by a super-Dirichletmixture model (SDMM). The Fisher ratio and the generalization error estimation are applied to select relevant channels, respectively. Compared to the state-of-the-art support...

  4. Hybrid Support Vector Machines-Based Multi-fault Classification

    Institute of Scientific and Technical Information of China (English)

    GAO Guo-hua; ZHANG Yong-zhong; ZHU Yu; DUAN Guang-huang

    2007-01-01

    Support Vector Machines (SVM) is a new general machine-learning tool based on structural risk minimization principle. This characteristic is very signific ant for the fault diagnostics when the number of fault samples is limited. Considering that SVM theory is originally designed for a two-class classification, a hybrid SVM scheme is proposed for multi-fault classification of rotating machinery in our paper. Two SVM strategies, 1-v-1 (one versus one) and 1-v-r (one versus rest), are respectively adopted at different classification levels. At the parallel classification level, using 1-v-1 strategy, the fault features extracted by various signal analysis methods are transferred into the multiple parallel SVM and the local classification results are obtained. At the serial classification level, these local results values are fused by one serial SVM based on 1-v-r strategy. The hybrid SVM scheme introduced in our paper not only generalizes the performance of signal binary SVMs but improves the precision and reliability of the fault classification results. The actually testing results show the availability suitability of this new method.

  5. Three-Class EEG-Based Motor Imagery Classification Using Phase-Space Reconstruction Technique

    Science.gov (United States)

    Djemal, Ridha; Bazyed, Ayad G.; Belwafi, Kais; Gannouni, Sofien; Kaaniche, Walid

    2016-01-01

    Over the last few decades, brain signals have been significantly exploited for brain-computer interface (BCI) applications. In this paper, we study the extraction of features using event-related desynchronization/synchronization techniques to improve the classification accuracy for three-class motor imagery (MI) BCI. The classification approach is based on combining the features of the phase and amplitude of the brain signals using fast Fourier transform (FFT) and autoregressive (AR) modeling of the reconstructed phase space as well as the modification of the BCI parameters (trial length, trial frequency band, classification method). We report interesting results compared with those present in the literature by utilizing sequential forward floating selection (SFFS) and a multi-class linear discriminant analysis (LDA), our findings showed superior classification results, a classification accuracy of 86.06% and 93% for two BCI competition datasets, with respect to results from previous studies. PMID:27563927

  6. Words semantic orientation classification based on HowNet

    Institute of Scientific and Technical Information of China (English)

    LI Dun; MA Yong-tao; GUO Jian-li

    2009-01-01

    Based on the text orientation classification, a new measurement approach to semantic orientation of words was proposed. According to the integrated and detailed definition of words in HowNet, seed sets including the words with intense orientations were built up. The orientation similarity between the seed words and the given word was then calculated using the sentiment weight priority to recognize the semantic orientation of common words. Finally, the words' semantic orientation and the context were combined to recognize the given words' orientation. The experiments show that the measurement approach achieves better results for common words' orientation classification and contributes particularly to the text orientation classification of large granularities.

  7. Radar Target Classification using Recursive Knowledge-Based Methods

    DEFF Research Database (Denmark)

    Jochumsen, Lars Wurtz

    The topic of this thesis is target classification of radar tracks from a 2D mechanically scanning coastal surveillance radar. The measurements provided by the radar are position data and therefore the classification is mainly based on kinematic data, which is deduced from the position. The target...... been terminated. Therefore, an update of the classification results must be made for each measurement of the target. The data for this work are collected throughout the PhD and are both collected from radars and other sensors such as GPS....

  8. Cancer classification based on gene expression using neural networks.

    Science.gov (United States)

    Hu, H P; Niu, Z J; Bai, Y P; Tan, X H

    2015-12-21

    Based on gene expression, we have classified 53 colon cancer patients with UICC II into two groups: relapse and no relapse. Samples were taken from each patient, and gene information was extracted. Of the 53 samples examined, 500 genes were considered proper through analyses by S-Kohonen, BP, and SVM neural networks. Classification accuracy obtained by S-Kohonen neural network reaches 91%, which was more accurate than classification by BP and SVM neural networks. The results show that S-Kohonen neural network is more plausible for classification and has a certain feasibility and validity as compared with BP and SVM neural networks.

  9. Analysis of uncertainty in multi-temporal object-based classification

    Science.gov (United States)

    Löw, Fabian; Knöfel, Patrick; Conrad, Christopher

    2015-07-01

    Agricultural management increasingly uses crop maps based on classification of remotely sensed data. However, classification errors can translate to errors in model outputs, for instance agricultural production monitoring (yield, water demand) or crop acreage calculation. Hence, knowledge on the spatial variability of the classier performance is important information for the user. But this is not provided by traditional assessments of accuracy, which are based on the confusion matrix. In this study, classification uncertainty was analyzed, based on the support vector machines (SVM) algorithm. SVM was applied to multi-spectral time series data of RapidEye from different agricultural landscapes and years. Entropy was calculated as a measure of classification uncertainty, based on the per-object class membership estimations from the SVM algorithm. Permuting all possible combinations of available images allowed investigating the impact of the image acquisition frequency and timing, respectively, on the classification uncertainty. Results show that multi-temporal datasets decrease classification uncertainty for different crops compared to single data sets, but there was no "one-image-combination-fits-all" solution. The number and acquisition timing of the images, for which a decrease in uncertainty could be realized, proved to be specific to a given landscape, and for each crop they differed across different landscapes. For some crops, an increase of uncertainty was observed when increasing the quantity of images, even if classification accuracy was improved. Random forest regression was employed to investigate the impact of different explanatory variables on the observed spatial pattern of classification uncertainty. It was strongly influenced by factors related with the agricultural management and training sample density. Lower uncertainties were revealed for fields close to rivers or irrigation canals. This study demonstrates that classification uncertainty estimates

  10. Fuzzy Aspect Based Opinion Classification System for Mining Tourist Reviews

    Directory of Open Access Journals (Sweden)

    Muhammad Afzaal

    2016-01-01

    Full Text Available Due to the large amount of opinions available on the websites, tourists are often overwhelmed with information and find it extremely difficult to use the available information to make a decision about the tourist places to visit. A number of opinion mining methods have been proposed in the past to identify and classify an opinion into positive or negative. Recently, aspect based opinion mining has been introduced which targets the various aspects present in the opinion text. A number of existing aspect based opinion classification methods are available in the literature but very limited research work has targeted the automatic aspect identification and extraction of implicit, infrequent, and coreferential aspects. Aspect based classification suffers from the presence of irrelevant sentences in a typical user review. Such sentences make the data noisy and degrade the classification accuracy of the machine learning algorithms. This paper presents a fuzzy aspect based opinion classification system which efficiently extracts aspects from user opinions and perform near to accurate classification. We conducted experiments on real world datasets to evaluate the effectiveness of our proposed system. Experimental results prove that the proposed system not only is effective in aspect extraction but also improves the classification accuracy.

  11. A Syntactic Classification based Web Page Ranking Algorithm

    CERN Document Server

    Mukhopadhyay, Debajyoti; Kim, Young-Chon

    2011-01-01

    The existing search engines sometimes give unsatisfactory search result for lack of any categorization of search result. If there is some means to know the preference of user about the search result and rank pages according to that preference, the result will be more useful and accurate to the user. In the present paper a web page ranking algorithm is being proposed based on syntactic classification of web pages. Syntactic Classification does not bother about the meaning of the content of a web page. The proposed approach mainly consists of three steps: select some properties of web pages based on user's demand, measure them, and give different weightage to each property during ranking for different types of pages. The existence of syntactic classification is supported by running fuzzy c-means algorithm and neural network classification on a set of web pages. The change in ranking for difference in type of pages but for same query string is also being demonstrated.

  12. Feature Extraction based Face Recognition, Gender and Age Classification

    Directory of Open Access Journals (Sweden)

    Venugopal K R

    2010-01-01

    Full Text Available The face recognition system with large sets of training sets for personal identification normally attains good accuracy. In this paper, we proposed Feature Extraction based Face Recognition, Gender and Age Classification (FEBFRGAC algorithm with only small training sets and it yields good results even with one image per person. This process involves three stages: Pre-processing, Feature Extraction and Classification. The geometric features of facial images like eyes, nose, mouth etc. are located by using Canny edge operator and face recognition is performed. Based on the texture and shape information gender and age classification is done using Posteriori Class Probability and Artificial Neural Network respectively. It is observed that the face recognition is 100%, the gender and age classification is around 98% and 94% respectively.

  13. Analysis of Kernel Approach in Fuzzy-Based Image Classifications

    Directory of Open Access Journals (Sweden)

    Mragank Singhal

    2013-03-01

    Full Text Available This paper presents a framework of kernel approach in the field of fuzzy based image classification in remote sensing. The goal of image classification is to separate images according to their visual content into two or more disjoint classes. Fuzzy logic is relatively young theory. Major advantage of this theory is that it allows the natural description, in linguistic terms, of problems that should be solved rather than in terms of relationships between precise numerical values. This paper describes how remote sensing data with uncertainty are handled with fuzzy based classification using Kernel approach for land use/land cover maps generation. The introduction to fuzzification using Kernel approach provides the basis for the development of more robust approaches to the remote sensing classification problem. The kernel explicitly defines a similarity measure between two samples and implicitly represents the mapping of the input space to the feature space.

  14. Count data modeling and classification using finite mixtures of distributions.

    Science.gov (United States)

    Bouguila, Nizar

    2011-02-01

    In this paper, we consider the problem of constructing accurate and flexible statistical representations for count data, which we often confront in many areas such as data mining, computer vision, and information retrieval. In particular, we analyze and compare several generative approaches widely used for count data clustering, namely multinomial, multinomial Dirichlet, and multinomial generalized Dirichlet mixture models. Moreover, we propose a clustering approach via a mixture model based on a composition of the Liouville family of distributions, from which we select the Beta-Liouville distribution, and the multinomial. The novel proposed model, which we call multinomial Beta-Liouville mixture, is optimized by deterministic annealing expectation-maximization and minimum description length, and strives to achieve a high accuracy of count data clustering and model selection. An important feature of the multinomial Beta-Liouville mixture is that it has fewer parameters than the recently proposed multinomial generalized Dirichlet mixture. The performance evaluation is conducted through a set of extensive empirical experiments, which concern text and image texture modeling and classification and shape modeling, and highlights the merits of the proposed models and approaches.

  15. Statistical model of the classification of shale in a hydrocyclone

    Energy Technology Data Exchange (ETDEWEB)

    Lopachenok, L.V.; Punin, A.E.; Belyanin, Yu.I.; Proskuryakov, V.A.

    1977-10-01

    The mathematical model obtained by experimental and statistical methods for the classification of shale in a hydrocyclone is adequate for a real industrial-scale process, as indicated by the statistical analysis carried out for it, and together with the material-balance relationships it permits the calculation of the engineering parameters for any classification conditions within the region of the factor space investigated, as well as the search for the optimum conditions for the industrial realization of the process.

  16. Tweet-based Target Market Classification Using Ensemble Method

    Directory of Open Access Journals (Sweden)

    Muhammad Adi Khairul Anshary

    2016-09-01

    Full Text Available Target market classification is aimed at focusing marketing activities on the right targets. Classification of target markets can be done through data mining and by utilizing data from social media, e.g. Twitter. The end result of data mining are learning models that can classify new data. Ensemble methods can improve the accuracy of the models and therefore provide better results. In this study, classification of target markets was conducted on a dataset of 3000 tweets in order to extract features. Classification models were constructed to manipulate the training data using two ensemble methods (bagging and boosting. To investigate the effectiveness of the ensemble methods, this study used the CART (classification and regression tree algorithm for comparison. Three categories of consumer goods (computers, mobile phones and cameras and three categories of sentiments (positive, negative and neutral were classified towards three target-market categories. Machine learning was performed using Weka 3.6.9. The results of the test data showed that the bagging method improved the accuracy of CART with 1.9% (to 85.20%. On the other hand, for sentiment classification, the ensemble methods were not successful in increasing the accuracy of CART. The results of this study may be taken into consideration by companies who approach their customers through social media, especially Twitter.

  17. Co-occurrence Models in Music Genre Classification

    DEFF Research Database (Denmark)

    Ahrendt, Peter; Goutte, Cyril; Larsen, Jan

    2005-01-01

    Music genre classification has been investigated using many different methods, but most of them build on probabilistic models of feature vectors x\\_r which only represent the short time segment with index r of the song. Here, three different co-occurrence models are proposed which instead consider...... genre data set with a variety of modern music. The basis was a so-called AR feature representation of the music. Besides the benefit of having proper probabilistic models of the whole song, the lowest classification test errors were found using one of the proposed models....

  18. A tool for urban soundscape evaluation applying Support Vector Machines for developing a soundscape classification model.

    Science.gov (United States)

    Torija, Antonio J; Ruiz, Diego P; Ramos-Ridao, Angel F

    2014-06-01

    To ensure appropriate soundscape management in urban environments, the urban-planning authorities need a range of tools that enable such a task to be performed. An essential step during the management of urban areas from a sound standpoint should be the evaluation of the soundscape in such an area. In this sense, it has been widely acknowledged that a subjective and acoustical categorization of a soundscape is the first step to evaluate it, providing a basis for designing or adapting it to match people's expectations as well. In this sense, this work proposes a model for automatic classification of urban soundscapes. This model is intended for the automatic classification of urban soundscapes based on underlying acoustical and perceptual criteria. Thus, this classification model is proposed to be used as a tool for a comprehensive urban soundscape evaluation. Because of the great complexity associated with the problem, two machine learning techniques, Support Vector Machines (SVM) and Support Vector Machines trained with Sequential Minimal Optimization (SMO), are implemented in developing model classification. The results indicate that the SMO model outperforms the SVM model in the specific task of soundscape classification. With the implementation of the SMO algorithm, the classification model achieves an outstanding performance (91.3% of instances correctly classified).

  19. Object Based and Pixel Based Classification Using Rapideye Satellite Imager of ETI-OSA, Lagos, Nigeria

    Directory of Open Access Journals (Sweden)

    Esther Oluwafunmilayo Makinde

    2016-12-01

    Full Text Available Several studies have been carried out to find an appropriate method to classify the remote sensing data. Traditional classification approaches are all pixel-based, and do not utilize the spatial information within an object which is an important source of information to image classification. Thus, this study compared the pixel based and object based classification algorithms using RapidEye satellite image of Eti-Osa LGA, Lagos. In the object-oriented approach, the image was segmented to homogenous area by suitable parameters such as scale parameter, compactness, shape etc. Classification based on segments was done by a nearest neighbour classifier. In the pixel-based classification, the spectral angle mapper was used to classify the images. The user accuracy for each class using object based classification were 98.31% for waterbody, 92.31% for vegetation, 86.67% for bare soil and 90.57% for Built up while the user accuracy for the pixel based classification were 98.28% for waterbody, 84.06% for Vegetation 86.36% and 79.41% for Built up. These classification techniques were subjected to accuracy assessment and the overall accuracy of the Object based classification was 94.47%, while that of Pixel based classification yielded 86.64%. The result of classification and accuracy assessment show that the object-based approach gave more accurate and satisfying results

  20. Classification of Airline Passengers Based on Latent Class Model%基于潜在类别模型的航空旅客分类

    Institute of Scientific and Technical Information of China (English)

    顾兆军; 王伟; 李晓红

    2012-01-01

    潜在类别模型是用潜在的类别变量来解释外显的类别变量之间的关联,使外显变量之间的关系通过潜在类别变量来估计,进而维持其局部独立性.为研究民航旅客的选择行为偏好、改进航空公司收益管理策略,文中引进潜在类别模型,然后从PNR数据中选择合适的观察变量,概率参数化后带入模型进行建模,利用Mplus软件对模型进行求解评价,最终得到了最合理的民航旅客分类.由于是基于订票数据的,所以与以往研究相比,此方法从根本上避免了因可靠性带来的偏差风险.%Latent class model uses the categorical latent variables to explain the connection between categorical manifest variables, makes the relationship between categorical manifest variables estimated by the categorical latent variables, and maintains its local independence. In order to study the airline passengers' choice behavior and improve strategy of revenue management of the airline, the latent class model for the PNR data is used to set up a model, and the Mplus software is also used to solve and evaluate, finally the best and reasonable airline passengers' classification is obtained. Compared with the previous researches, this method fundamentally avoids the risk of response bias.

  1. Land Cover Classification from Full-Waveform LIDAR Data Based on Support Vector Machines

    Science.gov (United States)

    Zhou, M.; Li, C. R.; Ma, L.; Guan, H. C.

    2016-06-01

    In this study, a land cover classification method based on multi-class Support Vector Machines (SVM) is presented to predict the types of land cover in Miyun area. The obtained backscattered full-waveforms were processed following a workflow of waveform pre-processing, waveform decomposition and feature extraction. The extracted features, which consist of distance, intensity, Full Width at Half Maximum (FWHM) and back scattering cross-section, were corrected and used as attributes for training data to generate the SVM prediction model. The SVM prediction model was applied to predict the types of land cover in Miyun area as ground, trees, buildings and farmland. The classification results of these four types of land covers were obtained based on the ground truth information according to the CCD image data of Miyun area. It showed that the proposed classification algorithm achieved an overall classification accuracy of 90.63%. In order to better explain the SVM classification results, the classification results of SVM method were compared with that of Artificial Neural Networks (ANNs) method and it showed that SVM method could achieve better classification results.

  2. LAND COVER CLASSIFICATION FROM FULL-WAVEFORM LIDAR DATA BASED ON SUPPORT VECTOR MACHINES

    Directory of Open Access Journals (Sweden)

    M. Zhou

    2016-06-01

    Full Text Available In this study, a land cover classification method based on multi-class Support Vector Machines (SVM is presented to predict the types of land cover in Miyun area. The obtained backscattered full-waveforms were processed following a workflow of waveform pre-processing, waveform decomposition and feature extraction. The extracted features, which consist of distance, intensity, Full Width at Half Maximum (FWHM and back scattering cross-section, were corrected and used as attributes for training data to generate the SVM prediction model. The SVM prediction model was applied to predict the types of land cover in Miyun area as ground, trees, buildings and farmland. The classification results of these four types of land covers were obtained based on the ground truth information according to the CCD image data of Miyun area. It showed that the proposed classification algorithm achieved an overall classification accuracy of 90.63%. In order to better explain the SVM classification results, the classification results of SVM method were compared with that of Artificial Neural Networks (ANNs method and it showed that SVM method could achieve better classification results.

  3. Tomato classification based on laser metrology and computer algorithms

    Science.gov (United States)

    Igno Rosario, Otoniel; Muñoz Rodríguez, J. Apolinar; Martínez Hernández, Haydeé P.

    2011-08-01

    An automatic technique for tomato classification is presented based on size and color. The size is determined based on surface contouring by laser line scanning. Here, a Bezier network computes the tomato height based on the line position. The tomato color is determined by CIELCH color space and the components red and green. Thus, the tomato size is classified in large, medium and small. Also, the tomato is classified into six colors associated with its maturity. The performance and accuracy of the classification system is evaluated based on methods reported in the recent years. The technique is tested and experimental results are presented.

  4. PERFORMANCE EVALUATION OF DISTANCE MEASURES IN PROPOSED FUZZY TEXTURE MODEL FOR LAND COVER CLASSIFICATION OF REMOTELY SENSED IMAGE

    Directory of Open Access Journals (Sweden)

    S. Jenicka

    2014-04-01

    Full Text Available Land cover classification is a vital application area in satellite image processing domain. Texture is a useful feature in land cover classification. The classification accuracy obtained always depends on the effectiveness of the texture model, distance measure and classification algorithm used. In this work, texture features are extracted using the proposed multivariate descriptor, MFTM/MVAR that uses Multivariate Fuzzy Texture Model (MFTM supplemented with Multivariate Variance (MVAR. The K_Nearest Neighbour (KNN algorithm is used for classification due to its simplicity coupled with efficiency. The distance measures such as Log likelihood, Manhattan, Chi squared, Kullback Leibler and Bhattacharyya were used and the experiments were conducted on IRS P6 LISS-IV data. The classified images were evaluated based on error matrix, classification accuracy and Kappa statistics. From the experiments, it is found that log likelihood distance with MFTM/MVAR descriptor and KNN classifier gives 95.29% classification accuracy.

  5. Construction and analysis of tree models for chromosomal classification of diffuse large B-cell lymphomas

    Institute of Scientific and Technical Information of China (English)

    Hui-Yong Jiang; Zhong-Xi Huang; Xue-Feng Zhang; Richard Desper; Tong Zhao

    2007-01-01

    AIM: To construct tree models for classification of diffuse large B-cell lymphomas (DLBCL) by chromosome copy numbers, to compare them with cDNA microarray classification, and to explore models of multi-gene, multi-step and multi-pathway processes of DLBCL tumorigenesis.METHODS: Maximum-weight branching and distance based models were constructed based on the comparative genomic hybridization (CGH) data of 123 DLBCL samples using the established methods and software of Desper et al. A maximum likelihood tree model was also used to analyze the data. By comparing with the results reported in literature, values of tree models in the classification of DLBCL were elucidated.RESULTS: Both the branching and the distance-based trees classified DLBCL into three groups. We combined the classification methods of the two models and classified DLBCL into three categories according to their characteristics. The first group was marked by +Xq, +Xp, -17p and +13q; the second group by +3q, +18q and +18p; and the third group was marked by -6q and +6p. This chromosomal classification was consistent with cDNA classification. It indicated that -6q and +3q were two main events in the tumorigenesis of lymphoma.CONCLUSION: Tree models of lymphoma established from CGH data can be used in the classification of DLBCL. These models can suggest multi-gene, multi-step and multi-pathway processes of tumorigenesis.Two pathways, -6q preceding +6q and +3q preceding +18q, may be important in understanding tumorigenesis of DLBCL. The pathway, -6q preceding +6q, may have a close relationship with the tumorigenesis of non-GCB DLBCL.

  6. A Spectral Signature Shape-Based Algorithm for Landsat Image Classification

    Directory of Open Access Journals (Sweden)

    Yuanyuan Chen

    2016-08-01

    Full Text Available Land-cover datasets are crucial for earth system modeling and human-nature interaction research at local, regional and global scales. They can be obtained from remotely sensed data using image classification methods. However, in processes of image classification, spectral values have received considerable attention for most classification methods, while the spectral curve shape has seldom been used because it is difficult to be quantified. This study presents a classification method based on the observation that the spectral curve is composed of segments and certain extreme values. The presented classification method quantifies the spectral curve shape and takes full use of the spectral shape differences among land covers to classify remotely sensed images. Using this method, classification maps from TM (Thematic mapper data were obtained with an overall accuracy of 0.834 and 0.854 for two respective test areas. The approach presented in this paper, which differs from previous image classification methods that were mostly concerned with spectral “value” similarity characteristics, emphasizes the "shape" similarity characteristics of the spectral curve. Moreover, this study will be helpful for classification research on hyperspectral and multi-temporal images.

  7. Dihedral-Based Segment Identification and Classification of Biopolymers I: Proteins

    Science.gov (United States)

    2013-01-01

    A new structure classification scheme for biopolymers is introduced, which is solely based on main-chain dihedral angles. It is shown that by dividing a biopolymer into segments containing two central residues, a local classification can be performed. The method is referred to as DISICL, short for Dihedral-based Segment Identification and Classification. Compared to other popular secondary structure classification programs, DISICL is more detailed as it offers 18 distinct structural classes, which may be simplified into a classification in terms of seven more general classes. It was designed with an eye to analyzing subtle structural changes as observed in molecular dynamics simulations of biomolecular systems. Here, the DISICL algorithm is used to classify two databases of protein structures, jointly containing more than 10 million segments. The data is compared to two alternative approaches in terms of the amount of classified residues, average occurrence and length of structural elements, and pair wise matches of the classifications by the different programs. In an accompanying paper (Nagy, G.; Oostenbrink, C. Dihedral-based segment identification and classification of biopolymers II: Polynucleotides. J. Chem. Inf. Model. 2013, DOI: 10.1021/ci400542n), the analysis of polynucleotides is described and applied. Overall, DISICL represents a potentially useful tool to analyze biopolymer structures at a high level of detail. PMID:24364820

  8. Dihedral-based segment identification and classification of biopolymers I: proteins.

    Science.gov (United States)

    Nagy, Gabor; Oostenbrink, Chris

    2014-01-27

    A new structure classification scheme for biopolymers is introduced, which is solely based on main-chain dihedral angles. It is shown that by dividing a biopolymer into segments containing two central residues, a local classification can be performed. The method is referred to as DISICL, short for Dihedral-based Segment Identification and Classification. Compared to other popular secondary structure classification programs, DISICL is more detailed as it offers 18 distinct structural classes, which may be simplified into a classification in terms of seven more general classes. It was designed with an eye to analyzing subtle structural changes as observed in molecular dynamics simulations of biomolecular systems. Here, the DISICL algorithm is used to classify two databases of protein structures, jointly containing more than 10 million segments. The data is compared to two alternative approaches in terms of the amount of classified residues, average occurrence and length of structural elements, and pair wise matches of the classifications by the different programs. In an accompanying paper (Nagy, G.; Oostenbrink, C. Dihedral-based segment identification and classification of biopolymers II: Polynucleotides. J. Chem. Inf. Model. 2013, DOI: 10.1021/ci400542n), the analysis of polynucleotides is described and applied. Overall, DISICL represents a potentially useful tool to analyze biopolymer structures at a high level of detail.

  9. Semantic Document Image Classification Based on Valuable Text Pattern

    Directory of Open Access Journals (Sweden)

    Hossein Pourghassem

    2011-01-01

    Full Text Available Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.

  10. Indoor scene classification of robot vision based on cloud computing

    Science.gov (United States)

    Hu, Tao; Qi, Yuxiao; Li, Shipeng

    2016-07-01

    For intelligent service robots, indoor scene classification is an important issue. To overcome the weak real-time performance of conventional algorithms, a new method based on Cloud computing is proposed for global image features in indoor scene classification. With MapReduce method, global PHOG feature of indoor scene image is extracted in parallel. And, feature eigenvector is used to train the decision classifier through SVM concurrently. Then, the indoor scene is validly classified by decision classifier. To verify the algorithm performance, we carried out an experiment with 350 typical indoor scene images from MIT LabelMe image library. Experimental results show that the proposed algorithm can attain better real-time performance. Generally, it is 1.4 2.1 times faster than traditional classification methods which rely on single computation, while keeping stable classification correct rate as 70%.

  11. Classification approach based on association rules mining for unbalanced data

    CERN Document Server

    Ndour, Cheikh

    2012-01-01

    This paper deals with the supervised classification when the response variable is binary and its class distribution is unbalanced. In such situation, it is not possible to build a powerful classifier by using standard methods such as logistic regression, classification tree, discriminant analysis, etc. To overcome this short-coming of these methods that provide classifiers with low sensibility, we tackled the classification problem here through an approach based on the association rules learning because this approach has the advantage of allowing the identification of the patterns that are well correlated with the target class. Association rules learning is a well known method in the area of data-mining. It is used when dealing with large database for unsupervised discovery of local patterns that expresses hidden relationships between variables. In considering association rules from a supervised learning point of view, a relevant set of weak classifiers is obtained from which one derives a classification rule...

  12. Ensemble polarimetric SAR image classification based on contextual sparse representation

    Science.gov (United States)

    Zhang, Lamei; Wang, Xiao; Zou, Bin; Qiao, Zhijun

    2016-05-01

    Polarimetric SAR image interpretation has become one of the most interesting topics, in which the construction of the reasonable and effective technique of image classification is of key importance. Sparse representation represents the data using the most succinct sparse atoms of the over-complete dictionary and the advantages of sparse representation also have been confirmed in the field of PolSAR classification. However, it is not perfect, like the ordinary classifier, at different aspects. So ensemble learning is introduced to improve the issue, which makes a plurality of different learners training and obtained the integrated results by combining the individual learner to get more accurate and ideal learning results. Therefore, this paper presents a polarimetric SAR image classification method based on the ensemble learning of sparse representation to achieve the optimal classification.

  13. Classification of types of stuttering symptoms based on brain activity.

    Science.gov (United States)

    Jiang, Jing; Lu, Chunming; Peng, Danling; Zhu, Chaozhe; Howell, Peter

    2012-01-01

    Among the non-fluencies seen in speech, some are more typical (MT) of stuttering speakers, whereas others are less typical (LT) and are common to both stuttering and fluent speakers. No neuroimaging work has evaluated the neural basis for grouping these symptom types. Another long-debated issue is which type (LT, MT) whole-word repetitions (WWR) should be placed in. In this study, a sentence completion task was performed by twenty stuttering patients who were scanned using an event-related design. This task elicited stuttering in these patients. Each stuttered trial from each patient was sorted into the MT or LT types with WWR put aside. Pattern classification was employed to train a patient-specific single trial model to automatically classify each trial as MT or LT using the corresponding fMRI data. This model was then validated by using test data that were independent of the training data. In a subsequent analysis, the classification model, just established, was used to determine which type the WWR should be placed in. The results showed that the LT and the MT could be separated with high accuracy based on their brain activity. The brain regions that made most contribution to the separation of the types were: the left inferior frontal cortex and bilateral precuneus, both of which showed higher activity in the MT than in the LT; and the left putamen and right cerebellum which showed the opposite activity pattern. The results also showed that the brain activity for WWR was more similar to that of the LT and fluent speech than to that of the MT. These findings provide a neurological basis for separating the MT and the LT types, and support the widely-used MT/LT symptom grouping scheme. In addition, WWR play a similar role as the LT, and thus should be placed in the LT type.

  14. Classification of types of stuttering symptoms based on brain activity.

    Directory of Open Access Journals (Sweden)

    Jing Jiang

    Full Text Available Among the non-fluencies seen in speech, some are more typical (MT of stuttering speakers, whereas others are less typical (LT and are common to both stuttering and fluent speakers. No neuroimaging work has evaluated the neural basis for grouping these symptom types. Another long-debated issue is which type (LT, MT whole-word repetitions (WWR should be placed in. In this study, a sentence completion task was performed by twenty stuttering patients who were scanned using an event-related design. This task elicited stuttering in these patients. Each stuttered trial from each patient was sorted into the MT or LT types with WWR put aside. Pattern classification was employed to train a patient-specific single trial model to automatically classify each trial as MT or LT using the corresponding fMRI data. This model was then validated by using test data that were independent of the training data. In a subsequent analysis, the classification model, just established, was used to determine which type the WWR should be placed in. The results showed that the LT and the MT could be separated with high accuracy based on their brain activity. The brain regions that made most contribution to the separation of the types were: the left inferior frontal cortex and bilateral precuneus, both of which showed higher activity in the MT than in the LT; and the left putamen and right cerebellum which showed the opposite activity pattern. The results also showed that the brain activity for WWR was more similar to that of the LT and fluent speech than to that of the MT. These findings provide a neurological basis for separating the MT and the LT types, and support the widely-used MT/LT symptom grouping scheme. In addition, WWR play a similar role as the LT, and thus should be placed in the LT type.

  15. Pathological Bases for a Robust Application of Cancer Molecular Classification

    Directory of Open Access Journals (Sweden)

    Salvador J. Diaz-Cano

    2015-04-01

    Full Text Available Any robust classification system depends on its purpose and must refer to accepted standards, its strength relying on predictive values and a careful consideration of known factors that can affect its reliability. In this context, a molecular classification of human cancer must refer to the current gold standard (histological classification and try to improve it with key prognosticators for metastatic potential, staging and grading. Although organ-specific examples have been published based on proteomics, transcriptomics and genomics evaluations, the most popular approach uses gene expression analysis as a direct correlate of cellular differentiation, which represents the key feature of the histological classification. RNA is a labile molecule that varies significantly according with the preservation protocol, its transcription reflect the adaptation of the tumor cells to the microenvironment, it can be passed through mechanisms of intercellular transference of genetic information (exosomes, and it is exposed to epigenetic modifications. More robust classifications should be based on stable molecules, at the genetic level represented by DNA to improve reliability, and its analysis must deal with the concept of intratumoral heterogeneity, which is at the origin of tumor progression and is the byproduct of the selection process during the clonal expansion and progression of neoplasms. The simultaneous analysis of multiple DNA targets and next generation sequencing offer the best practical approach for an analytical genomic classification of tumors.

  16. Dissimilarity-based classification of anatomical tree structures

    DEFF Research Database (Denmark)

    Sørensen, Lauge Emil Borch Laurs; Lo, Pechin Chien Pau; Dirksen, Asger

    2011-01-01

    A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity measure between two trees is based on a linear assignment...... by including anatomical features in the branch feature vectors. The proposed approach is applied to classify airway trees in computed tomography images of subjects with and without chronic obstructive pulmonary disease (COPD). Using the wall area percentage (WA%), a common measure of airway abnormality in COPD...

  17. SEMIPARAMETRIC VERSUS PARAMETRIC CLASSIFICATION MODELS - AN APPLICATION TO DIRECT MARKETING

    NARCIS (Netherlands)

    BULT, [No Value

    1993-01-01

    In this paper we are concerned with estimation of a classification model using semiparametric and parametric methods. Benefits and limitations of semiparametric models in general, and of Manski's maximum score method in particular, are discussed. The maximum score method yields consistent estimates

  18. A Bayes fusion method based ensemble classification approach for Brown cloud application

    Directory of Open Access Journals (Sweden)

    M.Krishnaveni

    2014-03-01

    Full Text Available Classification is a recurrent task of determining a target function that maps each attribute set to one of the predefined class labels. Ensemble fusion is one of the suitable classifier model fusion techniques which combine the multiple classifiers to perform high classification accuracy than individual classifiers. The main objective of this paper is to combine base classifiers using ensemble fusion methods namely Decision Template, Dempster-Shafer and Bayes to compare the accuracy of the each fusion methods on the brown cloud dataset. The base classifiers like KNN, MLP and SVM have been considered in ensemble classification in which each classifier with four different function parameters. From the experimental study it is proved, that the Bayes fusion method performs better classification accuracy of 95% than Decision Template of 80%, Dempster-Shaferof 85%, in a Brown Cloud image dataset.

  19. Superiority of Classification Tree versus Cluster, Fuzzy and Discriminant Models in a Heartbeat Classification System.

    Directory of Open Access Journals (Sweden)

    Vessela Krasteva

    Full Text Available This study presents a 2-stage heartbeat classifier of supraventricular (SVB and ventricular (VB beats. Stage 1 makes computationally-efficient classification of SVB-beats, using simple correlation threshold criterion for finding close match with a predominant normal (reference beat template. The non-matched beats are next subjected to measurement of 20 basic features, tracking the beat and reference template morphology and RR-variability for subsequent refined classification in SVB or VB-class by Stage 2. Four linear classifiers are compared: cluster, fuzzy, linear discriminant analysis (LDA and classification tree (CT, all subjected to iterative training for selection of the optimal feature space among extended 210-sized set, embodying interactive second-order effects between 20 independent features. The optimization process minimizes at equal weight the false positives in SVB-class and false negatives in VB-class. The training with European ST-T, AHA, MIT-BIH Supraventricular Arrhythmia databases found the best performance settings of all classification models: Cluster (30 features, Fuzzy (72 features, LDA (142 coefficients, CT (221 decision nodes with top-3 best scored features: normalized current RR-interval, higher/lower frequency content ratio, beat-to-template correlation. Unbiased test-validation with MIT-BIH Arrhythmia database rates the classifiers in descending order of their specificity for SVB-class: CT (99.9%, LDA (99.6%, Cluster (99.5%, Fuzzy (99.4%; sensitivity for ventricular ectopic beats as part from VB-class (commonly reported in published beat-classification studies: CT (96.7%, Fuzzy (94.4%, LDA (94.2%, Cluster (92.4%; positive predictivity: CT (99.2%, Cluster (93.6%, LDA (93.0%, Fuzzy (92.4%. CT has superior accuracy by 0.3-6.8% points, with the advantage for easy model complexity configuration by pruning the tree consisted of easy interpretable 'if-then' rules.

  20. Classification of Hydrological time series using Probabilistic Neural Network for River Flow Modeling by RBF Networks

    Science.gov (United States)

    Abghari, H.; van de Giesen, N.; Mahdavi, M.; Salajegheh, A.

    2009-04-01

    Artificial intelligence modeling of nonstationary rainfall-runoff has some restrictions in simulation accuracy due to the complexity and nonlinearity of training patterns. Preprocessing of trainings dataset could determine homogeneity of rainfall-runoff patterns before modeling. In this presentation, a new hybrid model of Artificial Intelligence in conjunction with clustering is introduced and applied to flow prediction. Simulation of Nazloochaei river flow in North-West Iran was the case used for development of a PNN-RBF model. PNN classify a training dataset in two groups based on Parezen theory using unsupervised classification. Subsequently each data group is used to train and test two RBF networks and the results are compared to the application of all data in a RBF network without classification. Results show that classification of rainfall-runoff patterns using PNN and prediction of runoff with RBF increase prediction precise of networks. Keywords: Probabilistic Neural Network, Radial Base Function Neural Network, Parezen theory, River Flow Prediction

  1. Network traffic classification based on ensemble learning and co-training

    Institute of Scientific and Technical Information of China (English)

    HE HaiTao; LUO XiaoNan; MA FeiTeng; CHE ChunHui; WANG JianMin

    2009-01-01

    Classification of network traffic Is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identifi-cation approaches has been greatly diminished In recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training tech-niques. Compared to previous approaches, most of which only employed single classifier, multiple clas-sifiers and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings: limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is created and tested and the empirical results prove its feasibility and effectiveness.

  2. Dissimilarity-based classification of anatomical tree structures

    DEFF Research Database (Denmark)

    Sørensen, Lauge; Lo, Pechin Chien Pau; Dirksen, Asger

    2011-01-01

    A novel method for classification of abnormality in anatomical tree structures is presented. A tree is classified based on direct comparisons with other trees in a dissimilarity-based classification scheme. The pair-wise dissimilarity measure between two trees is based on a linear assignment...... between the branch feature vectors representing those trees. Hereby, localized information in the branches is collectively used in classification and variations in feature values across the tree are taken into account. An approximate anatomical correspondence between matched branches can be achieved...... by including anatomical features in the branch feature vectors. The proposed approach is applied to classify airway trees in computed tomography images of subjects with and without chronic obstructive pulmonary disease (COPD). Using the wall area percentage (WA%), a common measure of airway abnormality in COPD...

  3. Classification of Gait Types Based on the Duty-factor

    DEFF Research Database (Denmark)

    Fihl, Preben; Moeslund, Thomas B.

    2007-01-01

    This paper deals with classification of human gait types based on the notion that different gait types are in fact different types of locomotion, i.e., running is not simply walking done faster. We present the duty-factor, which is a descriptor based on this notion. The duty-factor is independent...

  4. Conceptualising Business Models: Definitions, Frameworks and Classifications

    Directory of Open Access Journals (Sweden)

    Erwin Fielt

    2013-12-01

    Full Text Available The business model concept is gaining traction in different disciplines but is still criticized for being fuzzy and vague and lacking consensus on its definition and compositional elements. In this paper we set out to advance our understanding of the business model concept by addressing three areas of foundational research: business model definitions, business model elements, and business model archetypes. We define a business model as a representation of the value logic of an organization in terms of how it creates and captures customer value. This abstract and generic definition is made more specific and operational by the compositional elements that need to address the customer, value proposition, organizational architecture (firm and network level and economics dimensions. Business model archetypes complement the definition and elements by providing a more concrete and empirical understanding of the business model concept. The main contributions of this paper are (1 explicitly including the customer value concept in the business model definition and focussing on value creation, (2 presenting four core dimensions that business model elements need to cover, (3 arguing for flexibility by adapting and extending business model elements to cater for different purposes and contexts (e.g. technology, innovation, strategy (4 stressing a more systematic approach to business model archetypes by using business model elements for their description, and (5 suggesting to use business model archetype research for the empirical exploration and testing of business model elements and their relationships.

  5. D Land Cover Classification Based on Multispectral LIDAR Point Clouds

    Science.gov (United States)

    Zou, Xiaoliang; Zhao, Guihua; Li, Jonathan; Yang, Yuanxi; Fang, Yong

    2016-06-01

    Multispectral Lidar System can emit simultaneous laser pulses at the different wavelengths. The reflected multispectral energy is captured through a receiver of the sensor, and the return signal together with the position and orientation information of sensor is recorded. These recorded data are solved with GNSS/IMU data for further post-processing, forming high density multispectral 3D point clouds. As the first commercial multispectral airborne Lidar sensor, Optech Titan system is capable of collecting point clouds data from all three channels at 532nm visible (Green), at 1064 nm near infrared (NIR) and at 1550nm intermediate infrared (IR). It has become a new source of data for 3D land cover classification. The paper presents an Object Based Image Analysis (OBIA) approach to only use multispectral Lidar point clouds datasets for 3D land cover classification. The approach consists of three steps. Firstly, multispectral intensity images are segmented into image objects on the basis of multi-resolution segmentation integrating different scale parameters. Secondly, intensity objects are classified into nine categories by using the customized features of classification indexes and a combination the multispectral reflectance with the vertical distribution of object features. Finally, accuracy assessment is conducted via comparing random reference samples points from google imagery tiles with the classification results. The classification results show higher overall accuracy for most of the land cover types. Over 90% of overall accuracy is achieved via using multispectral Lidar point clouds for 3D land cover classification.

  6. Super pixel density based clustering automatic image classification method

    Science.gov (United States)

    Xu, Mingxing; Zhang, Chuan; Zhang, Tianxu

    2015-12-01

    The image classification is an important means of image segmentation and data mining, how to achieve rapid automated image classification has been the focus of research. In this paper, based on the super pixel density of cluster centers algorithm for automatic image classification and identify outlier. The use of the image pixel location coordinates and gray value computing density and distance, to achieve automatic image classification and outlier extraction. Due to the increased pixel dramatically increase the computational complexity, consider the method of ultra-pixel image preprocessing, divided into a small number of super-pixel sub-blocks after the density and distance calculations, while the design of a normalized density and distance discrimination law, to achieve automatic classification and clustering center selection, whereby the image automatically classify and identify outlier. After a lot of experiments, our method does not require human intervention, can automatically categorize images computing speed than the density clustering algorithm, the image can be effectively automated classification and outlier extraction.

  7. Nonlinear Time Series Model for Shape Classification Using Neural Networks

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    A complex nonlinear exponential autoregressive (CNEAR) model for invariant feature extraction is developed for recognizing arbitrary shapes on a plane. A neural network is used to calculate the CNEAR coefficients. The coefficients, which constitute the feature set, are proven to be invariant to boundary transformations such as translation, rotation, scale and choice of starting point in tracing the boundary. The feature set is then used as the input to a complex multilayer perceptron (C-MLP) network for learning and classification. Experimental results show that complicated shapes can be accurately recognized even with the low-order model and that the classification method has good fault tolerance when noise is present.

  8. Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network

    Institute of Scientific and Technical Information of China (English)

    Rafael Geraldeli Rossi; Alneu de Andrade Lopes; Thiago de Paulo Faleiros; Solange Oliveira Rezende

    2014-01-01

    Algorithms for numeric data classification have been applied for text classification. Usually the vector space model is used to represent text collections. The characteristics of this representation such as sparsity and high dimensionality sometimes impair the quality of general-purpose classifiers. Networks can be used to represent text collections, avoiding the high sparsity and allowing to model relationships among different objects that compose a text collection. Such network-based representations can improve the quality of the classification results. One of the simplest ways to represent textual collections by a network is through a bipartite heterogeneous network, which is composed of objects that represent the documents connected to objects that represent the terms. Heterogeneous bipartite networks do not require computation of similarities or relations among the objects and can be used to model any type of text collection. Due to the advantages of representing text collections through bipartite heterogeneous networks, in this article we present a text classifier which builds a classification model using the structure of a bipartite heterogeneous network. Such an algorithm, referred to as IMBHN (Inductive Model Based on Bipartite Heterogeneous Network), induces a classification model assigning weights to ob jects that represent the terms for each class of the text collection. An empirical evaluation using a large amount of text collections from different domains shows that the proposed IMBHN algorithm produces significantly better results than k-NN, C4.5, SVM, and Naive Bayes algorithms.

  9. Latent Classification Models for Binary Data

    DEFF Research Database (Denmark)

    Langseth, Helge; Nielsen, Thomas Dyhre

    2009-01-01

    One of the simplest, and yet most consistently well-performing set of classifiers is the naive Bayes models (a special class of Bayesian network models). However, these models rely on the (naive) assumption that all the attributes used to describe an instance are conditionally independent given...

  10. Credit Risk Evaluation Using a C-Variable Least Squares Support Vector Classification Model

    Science.gov (United States)

    Yu, Lean; Wang, Shouyang; Lai, K. K.

    Credit risk evaluation is one of the most important issues in financial risk management. In this paper, a C-variable least squares support vector classification (C-VLSSVC) model is proposed for credit risk analysis. The main idea of this model is based on the prior knowledge that different classes may have different importance for modeling and more weights should be given to those classes with more importance. The C-VLSSVC model can be constructed by a simple modification of the regularization parameter in LSSVC, whereby more weights are given to the lease squares classification errors with important classes than the lease squares classification errors with unimportant classes while keeping the regularized terms in its original form. For illustration purpose, a real-world credit dataset is used to test the effectiveness of the C-VLSSVC model.

  11. Classification of ECG Using Chaotic Models

    Directory of Open Access Journals (Sweden)

    Khandakar Mohammad Ishtiak

    2012-09-01

    Full Text Available Chaotic analysis has been shown to be useful in a variety of medical applications, particularly in cardiology. Chaotic parameters have shown potential in the identification of diseases, especially in the analysis of biomedical signals like electrocardiogram (ECG. In this work, underlying chaos in ECG signals has been analyzed using various non-linear techniques. First, the ECG signal is processed through a series of steps to extract the QRS complex. From this extracted feature, bit-to-bit interval (BBI and instantaneous heart rate (IHR have been calculated. Then some nonlinear parameters like standard deviation, and coefficient of variation and nonlinear techniques like central tendency measure (CTM, and phase space portrait have been determined from both the BBI and IHR. Standard database of MIT-BIH is used as the reference data where each ECG record contains 650000 samples. CTM is calculated for both BBI and IHR for each ECG record of the database. A much higher value of CTM for IHR is observed for eleven patients with normal beats with a mean of 0.7737 and SD of 0.0946. On the contrary, the CTM for IHR of eleven patients with abnormal rhythm shows low value with a mean of 0.0833 and SD 0.0748. CTM for BBI of the same eleven normal rhythm records also shows high values with a mean of 0.6172 and SD 0.1472. CTM for BBI of eleven abnormal rhythm records show low values with a mean of 0.0478 and SD 0.0308. Phase space portrait also demonstrates visible attractor with little dispersion for a healthy person’s ECG and a widely dispersed plot in 2-D plane for the ailing person’s ECG. These results indicate that ECG can be classified based on this chaotic modeling which works on the nonlinear dynamics of the system.

  12. Data Stream Classification Based on the Gamma Classifier

    Directory of Open Access Journals (Sweden)

    Abril Valeria Uriarte-Arcia

    2015-01-01

    Full Text Available The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

  13. Object Based and Pixel Based Classification Using Rapideye Satellite Imager of ETI-OSA, Lagos, Nigeria

    OpenAIRE

    Esther Oluwafunmilayo Makinde; Ayobami Taofeek Salami; James Bolarinwa Olaleye; Oluwapelumi Comfort Okewusi

    2016-01-01

    Several studies have been carried out to find an appropriate method to classify the remote sensing data. Traditional classification approaches are all pixel-based, and do not utilize the spatial information within an object which is an important source of information to image classification. Thus, this study compared the pixel based and object based classification algorithms using RapidEye satellite image of Eti-Osa LGA, Lagos. In the object-oriented approach, the image was segmented to homog...

  14. A method for cloud detection and opacity classification based on ground based sky imagery

    Directory of Open Access Journals (Sweden)

    M. S. Ghonima

    2012-07-01

    Full Text Available Digital images of the sky obtained using a total sky imager (TSI are classified pixel by pixel into clear sky, optically thin and optically thick clouds. A new classification algorithm was developed that compares the pixel red-blue ratio (RBR to the RBR of a clear sky library (CSL generated from images captured on clear days. The difference, rather than the ratio, between pixel RBR and CSL RBR resulted in more accurate cloud classification. High correlation between TSI image RBR and aerosol optical depth (AOD measured by an AERONET photometer was observed and motivated the addition of a haze correction factor (HCF to the classification model to account for variations in AOD. Thresholds for clear and thick clouds were chosen based on a training image set and validated with set of manually annotated images. Misclassifications of clear and thick clouds into the opposite category were less than 1%. Thin clouds were classified with an accuracy of 60%. Accurate cloud detection and opacity classification techniques will improve the accuracy of short-term solar power forecasting.

  15. A method for cloud detection and opacity classification based on ground based sky imagery

    Directory of Open Access Journals (Sweden)

    M. S. Ghonima

    2012-11-01

    Full Text Available Digital images of the sky obtained using a total sky imager (TSI are classified pixel by pixel into clear sky, optically thin and optically thick clouds. A new classification algorithm was developed that compares the pixel red-blue ratio (RBR to the RBR of a clear sky library (CSL generated from images captured on clear days. The difference, rather than the ratio, between pixel RBR and CSL RBR resulted in more accurate cloud classification. High correlation between TSI image RBR and aerosol optical depth (AOD measured by an AERONET photometer was observed and motivated the addition of a haze correction factor (HCF to the classification model to account for variations in AOD. Thresholds for clear and thick clouds were chosen based on a training image set and validated with set of manually annotated images. Misclassifications of clear and thick clouds into the opposite category were less than 1%. Thin clouds were classified with an accuracy of 60%. Accurate cloud detection and opacity classification techniques will improve the accuracy of short-term solar power forecasting.

  16. An Adaptive Approach to Schema Classification for Data Warehouse Modeling

    Institute of Scientific and Technical Information of China (English)

    Hong-Ding Wang; Yun-Hai Tong; Shao-Hua Tan; Shi-Wei Tang; Dong-Qing Yang; Guo-Hui Sun

    2007-01-01

    Data warehouse (DW) modeling is a complicated task, involving both knowledge of business processes and familiarity with operational information systems structure and behavior. Existing DW modeling techniques suffer from the following major drawbacks -data-driven approach requires high levels of expertise and neglects the requirements of end users, while demand-driven approach lacks enterprise-wide vision and is regardless of existing models of underlying operational systems. In order to make up for those shortcomings, a method of classification of schema elements for DW modeling is proposed in this paper. We first put forward the vector space models for subjects and schema elements, then present an adaptive approach with self-tuning theory to construct context vectors of subjects, and finally classify the source schema elements into different subjects of the DW automatically. Benefited from the result of the schema elements classification, designers can model and construct a DW more easily.

  17. Hyperspectral Image Classification Based on the Weighted Probabilistic Fusion of Multiple Spectral-spatial Features

    Directory of Open Access Journals (Sweden)

    ZHANG Chunsen

    2015-08-01

    Full Text Available A hyperspectral images classification method based on the weighted probabilistic fusion of multiple spectral-spatial features was proposed in this paper. First, the minimum noise fraction (MNF approach was employed to reduce the dimension of hyperspectral image and extract the spectral feature from the image, then combined the spectral feature with the texture feature extracted based on gray level co-occurrence matrix (GLCM, the multi-scale morphological feature extracted based on OFC operator and the end member feature extracted based on sequential maximum angle convex cone (SMACC method to form three spectral-spatial features. Afterwards, support vector machine (SVM classifier was used for the classification of each spectral-spatial feature separately. Finally, we established the weighted probabilistic fusion model and applied the model to fuse the SVM outputs for the final classification result. In order to verify the proposed method, the ROSIS and AVIRIS image were used in our experiment and the overall accuracy reached 97.65% and 96.62% separately. The results indicate that the proposed method can not only overcome the limitations of traditional single-feature based hyperspectral image classification, but also be superior to conventional VS-SVM method and probabilistic fusion method. The classification accuracy of hyperspectral images was improved effectively.

  18. Accurate crop classification using hierarchical genetic fuzzy rule-based systems

    Science.gov (United States)

    Topaloglou, Charalampos A.; Mylonas, Stelios K.; Stavrakoudis, Dimitris G.; Mastorocostas, Paris A.; Theocharis, John B.

    2014-10-01

    This paper investigates the effectiveness of an advanced classification system for accurate crop classification using very high resolution (VHR) satellite imagery. Specifically, a recently proposed genetic fuzzy rule-based classification system (GFRBCS) is employed, namely, the Hierarchical Rule-based Linguistic Classifier (HiRLiC). HiRLiC's model comprises a small set of simple IF-THEN fuzzy rules, easily interpretable by humans. One of its most important attributes is that its learning algorithm requires minimum user interaction, since the most important learning parameters affecting the classification accuracy are determined by the learning algorithm automatically. HiRLiC is applied in a challenging crop classification task, using a SPOT5 satellite image over an intensively cultivated area in a lake-wetland ecosystem in northern Greece. A rich set of higher-order spectral and textural features is derived from the initial bands of the (pan-sharpened) image, resulting in an input space comprising 119 features. The experimental analysis proves that HiRLiC compares favorably to other interpretable classifiers of the literature, both in terms of structural complexity and classification accuracy. Its testing accuracy was very close to that obtained by complex state-of-the-art classification systems, such as the support vector machines (SVM) and random forest (RF) classifiers. Nevertheless, visual inspection of the derived classification maps shows that HiRLiC is characterized by higher generalization properties, providing more homogeneous classifications that the competitors. Moreover, the runtime requirements for producing the thematic map was orders of magnitude lower than the respective for the competitors.

  19. Object-Based Classification of Abandoned Logging Roads under Heavy Canopy Using LiDAR

    Directory of Open Access Journals (Sweden)

    Jason Sherba

    2014-05-01

    Full Text Available LiDAR-derived slope models may be used to detect abandoned logging roads in steep forested terrain. An object-based classification approach of abandoned logging road detection was employed in this study. First, a slope model of the study site in Marin County, California was created from a LiDAR derived DEM. Multiresolution segmentation was applied to the slope model and road seed objects were iteratively grown into candidate objects. A road classification accuracy of 86% was achieved using this fully automated procedure and post processing increased this accuracy to 90%. In order to assess the sensitivity of the road classification to LiDAR ground point spacing, the LiDAR ground point cloud was repeatedly thinned by a fraction of 0.5 and the classification procedure was reapplied. The producer’s accuracy of the road classification declined from 79% with a ground point spacing of 0.91 to below 50% with a ground point spacing of 2, indicating the importance of high point density for accurate classification of abandoned logging roads.

  20. Computerized Classification Testing under the Generalized Graded Unfolding Model

    Science.gov (United States)

    Wang, Wen-Chung; Liu, Chen-Wei

    2011-01-01

    The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree-disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut…

  1. Modeling and evaluating repeatability and reproducibility of ordinal classifications

    NARCIS (Netherlands)

    J. de Mast; W.N. van Wieringen

    2010-01-01

    This paper argues that currently available methods for the assessment of the repeatability and reproducibility of ordinal classifications are not satisfactory. The paper aims to study whether we can modify a class of models from Item Response Theory, well established for the study of the reliability

  2. Fuzzy modeling of farmers' knowledge for land suitability classification

    NARCIS (Netherlands)

    Sicat, R.S.; Carranza, E.J.M.; Nidumolu, U.B.

    2005-01-01

    In a case study, we demonstrate fuzzy modeling of farmers' knowledge (FK) for agricultural land suitability classification using GIS. Capture of FK was through rapid rural participatory approach. The farmer respondents consider, in order of decreasing importance, cropping season, soil color, soil te

  3. Road geometry classification by adaptive shape models

    NARCIS (Netherlands)

    J.M. Álvarez; T. Gevers; F. Diego; A.M. López

    2012-01-01

    Vision-based road detection is important for different applications in transportation, such as autonomous driving, vehicle collision warning, and pedestrian crossing detection. Common approaches to road detection are based on low-level road appearance (e.g., color or texture) and neglect of the scen

  4. Investigation of the Effect of Traffic Parameters on Road Hazard Using Classification Tree Model

    Directory of Open Access Journals (Sweden)

    Md. Mahmud Hasan

    2012-09-01

    Full Text Available This paper presents a method for the identification of hazardous situations on the freeways. For this study, about 18 km long section of Eastern Freeway in Melbourne, Australia was selected as a test bed. Three categories of data i.e. traffic, weather and accident record data were used for the analysis and modelling. In developing the crash risk probability model, classification tree based model was developed in this study. In formulating the models, it was found that weather conditions did not have significant impact on accident occurrence so the classification tree was built using two traffic indices; traffic flow and vehicle speed only. The formulated classification tree is able to identify the possible hazard and non-hazard situations on freeway. The outcome of the study will aid the hazard mitigation strategies.

  5. A Multi-Dimensional Classification Model for Scientific Workflow Characteristics

    Energy Technology Data Exchange (ETDEWEB)

    Ramakrishnan, Lavanya; Plale, Beth

    2010-04-05

    Workflows have been used to model repeatable tasks or operations in manufacturing, business process, and software. In recent years, workflows are increasingly used for orchestration of science discovery tasks that use distributed resources and web services environments through resource models such as grid and cloud computing. Workflows have disparate re uirements and constraints that affects how they might be managed in distributed environments. In this paper, we present a multi-dimensional classification model illustrated by workflow examples obtained through a survey of scientists from different domains including bioinformatics and biomedical, weather and ocean modeling, astronomy detailing their data and computational requirements. The survey results and classification model contribute to the high level understandingof scientific workflows.

  6. 3-Layered Bayesian Model Using in Text Classification

    Directory of Open Access Journals (Sweden)

    Chang Jiayu

    2013-01-01

    Full Text Available Naive Bayesian is one of quite effective classification methods in all of the text disaggregated models. Usually, the computed result will be large deviation from normal, with the reason of attribute relevance and so on. This study embarked from the degree of correlation, defined the node’s degree as well as the relations between nodes, proposed a 3-layered Bayesian Model. According to the conditional probability recurrence formula, the theory support of the 3-layered Bayesian Model is obtained. According to the theory analysis and the empirical datum contrast to the Naive Bayesian, the model has better attribute collection and classify. It can be also promoted to the Multi-layer Bayesian Model using in text classification.

  7. Vessel-guided airway segmentation based on voxel classification

    DEFF Research Database (Denmark)

    Lo, Pechin Chien Pau; Sporring, Jon; Ashraf, Haseem;

    2008-01-01

    This paper presents a method for improving airway tree segmentation using vessel orientation information. We use the fact that an airway branch is always accompanied by an artery, with both structures having similar orientations. This work is based on a  voxel classification airway segmentation...

  8. Classification and Target Group Selection Based Upon Frequent Patterns

    NARCIS (Netherlands)

    W.H.L.M. Pijls (Wim); R. Potharst (Rob)

    2000-01-01

    textabstractIn this technical report , two new algorithms based upon frequent patterns are proposed. One algorithm is a classification method. The other one is an algorithm for target group selection. In both algorithms, first of all, the collection of frequent patterns in the training set is constr

  9. Pulse frequency classification based on BP neural network

    Institute of Scientific and Technical Information of China (English)

    WANG Rui; WANG Xu; YANG Dan; FU Rong

    2006-01-01

    In Traditional Chinese Medicine (TCM), it is an important parameter of the clinic disease diagnosis to analysis the pulse frequency. This article accords to pulse eight major essentials to identify pulse type of the pulse frequency classification based on back-propagation neural networks (BPNN). The pulse frequency classification includes slow pulse, moderate pulse, rapid pulse etc. By feature parameter of the pulse frequency analysis research and establish to identify system of pulse frequency features. The pulse signal from detecting system extracts period, frequency etc feature parameter to compare with standard feature value of pulse type. The result shows that identify-rate attains 92.5% above.

  10. Refining personality disorder subtypes and classification using finite mixture modeling.

    Science.gov (United States)

    Yun, Rebecca J; Stern, Barry L; Lenzenweger, Mark F; Tiersky, Lana A

    2013-04-01

    The current Diagnostic and Statistical Manual of Mental Disorders (DSM) diagnostic system for Axis II disorders continues to be characterized by considerable heterogeneity and poor discriminant validity. Such problems impede accurate personality disorder (PD) diagnosis. As a result, alternative assessment tools are often used in conjunction with the DSM. One popular framework is the object relational model developed by Kernberg and his colleagues (J. F. Clarkin, M. F. Lenzenweger, F. Yeomans, K. N. Levy, & O. F. Kernberg, 2007, An object relations model of borderline pathology, Journal of Personality Disorders, Vol. 21, pp. 474-499; O. F. Kernberg, 1984, Severe Personality Disorders, New Haven, CT: Yale University Press; O. F. Kernberg & E. Caligor, 2005, A psychoanalytic theory of personality disorders, in M. F. Lenzenweger & J. F. Clarkin, Eds., Major Theories of Personality Disorder, New York, NY: Guilford Press). Drawing on this model and empirical studies thereof, the current study attempted to clarify Kernberg's (1984) PD taxonomy and identify subtypes within a sample with varying levels of personality pathology using finite mixture modeling. Subjects (N = 141) were recruited to represent a wide range of pathology. The finite mixture modeling results indicated that 3 components were harbored within the variables analyzed. Group 1 was characterized by low levels of antisocial, paranoid, and aggressive features, and Group 2 was characterized by elevated paranoid features. Group 3 revealed the highest levels across the 3 variables. The validity of the obtained solution was then evaluated by reference to a variety of external measures that supported the validity of the identified grouping structure. Findings generally appear congruent with previous research, which argued that a PD taxonomy based on paranoid, aggressive, and antisocial features is a viable supplement to current diagnostic systems. Our study suggests that Kernberg's object relational model offers a

  11. Hyperspectral image classification based on volumetric texture and dimensionality reduction

    Science.gov (United States)

    Su, Hongjun; Sheng, Yehua; Du, Peijun; Chen, Chen; Liu, Kui

    2015-06-01

    A novel approach using volumetric texture and reduced-spectral features is presented for hyperspectral image classification. Using this approach, the volumetric textural features were extracted by volumetric gray-level co-occurrence matrices (VGLCM). The spectral features were extracted by minimum estimated abundance covariance (MEAC) and linear prediction (LP)-based band selection, and a semi-supervised k-means (SKM) clustering method with deleting the worst cluster (SKMd) bandclustering algorithms. Moreover, four feature combination schemes were designed for hyperspectral image classification by using spectral and textural features. It has been proven that the proposed method using VGLCM outperforms the gray-level co-occurrence matrices (GLCM) method, and the experimental results indicate that the combination of spectral information with volumetric textural features leads to an improved classification performance in hyperspectral imagery.

  12. A classification of the multiple criteria decision making models

    OpenAIRE

    Guerras Martín, Luis Angel

    1987-01-01

    In this work we have tried to present a classification of multiobjective techniques based in the relationship between the main subJects of decision process: analyst and decision maker. These relation, in terms of information flows, have important consequences for decision making processes in business organizations.

  13. Classification of consumers based on perceptions

    DEFF Research Database (Denmark)

    Høg, Esben; Juhl, Hans Jørn; Poulsen, Carsten Stig

    1999-01-01

    This paper reports some results from a recent Danish study of fish consumption. One purpose of the study was to identify consumer segments according to their perceptions of fish in comparison with other food categories. We present a model, which has the capabilities to determine the number...

  14. Classification of consumers based on perceptions

    DEFF Research Database (Denmark)

    Høg, Esben; Juhl, Hans Jørn; Poulsen, Carsten Stig

    1999-01-01

    This paper reports some results from a recent Danish study of fish consumption. One major purpose of the study was to identify consumer segments according to their perceptions of fish in comparison with other food categories. We present a model which has the capabilities to determine the number...

  15. DPClass: An Effective but Concise Discriminative Patterns-Based Classification Framework

    Science.gov (United States)

    Shang, Jingbo; Tong, Wenzhu; Peng, Jian; Han, Jiawei

    2017-01-01

    Pattern-based classification was originally proposed to improve the accuracy using selected frequent patterns, where many efforts were paid to prune a huge number of non-discriminative frequent patterns. On the other hand, tree-based models have shown strong abilities on many classification tasks since they can easily build high-order interactions between different features and also handle both numerical and categorical features as well as high dimensional features. By taking the advantage of both modeling methodologies, we propose a natural and effective way to resolve pattern-based classification by adopting discriminative patterns which are the prefix paths from root to nodes in tree-based models (e.g., random forest). Moreover, we further compress the number of discriminative patterns by selecting the most effective pattern combinations that fit into a generalized linear model. As a result, our discriminative pattern-based classification framework (DPClass) could perform as good as previous state-of-the-art algorithms, provide great interpretability by utilizing only very limited number of discriminative patterns, and predict new data extremely fast. More specifically, in our experiments, DPClass could gain even better accuracy by only using top-20 discriminative patterns. The framework so generated is very concise and highly explanatory to human experts.

  16. Integrative disease classification based on cross-platform microarray data

    Directory of Open Access Journals (Sweden)

    Huang Haiyan

    2009-01-01

    Full Text Available Abstract Background Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification. Results In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank ratios across datasets. In addition, we systematically map textual annotations of datasets to concepts in Unified Medical Language System (UMLS, permitting quantitative analysis of the phenotype "distance" between datasets and automated construction of disease classes. We design a new classification approach named ManiSVM, which integrates Manifold data transformation with SVM learning to exploit the data properties. Using the leave one dataset out cross validation, ManiSVM achieved the overall accuracy of 70.7% (68.6% precision and 76.9% recall with many disease classes achieving the accuracy higher than 80%. Conclusion Our results not only demonstrated the feasibility of the integrated disease classification approach, but also showed that the classification accuracy increases with the number of homogenous training datasets. Thus, the power of the integrative approach will increase with the continuous accumulation of microarray data in public repositories. Our study shows that automated disease diagnosis can be an important and promising application of the enormous amount of costly to generate, yet freely available, public microarray data.

  17. AN INTELLIGENT CLASSIFICATION MODEL FOR PHISHING EMAIL DETECTION

    Directory of Open Access Journals (Sweden)

    Adwan Yasin

    2016-07-01

    Full Text Available Phishing attacks are one of the trending cyber-attacks that apply socially engineered messages that are communicated to people from professional hackers aiming at fooling users to reveal their sensitive information, the most popular communication channel to those messages is through users’ emails. This paper presents an intelligent classification model for detecting phishing emails using knowledge discovery, data mining and text processing techniques. This paper introduces the concept of phishing terms weighting which evaluates the weight of phishing terms in each email. The pre-processing phase is enhanced by applying text stemming and WordNet ontology to enrich the model with word synonyms. The model applied the knowledge discovery procedures using five popular classification algorithms and achieved a notable enhancement in classification accuracy; 99.1% accuracy was achieved using the Random Forest algorithm and 98.4% using J48, which is –to our knowledge- the highest accuracy rate for an accredited data set. This paper also presents a comparative study with similar proposed classification techniques.

  18. Invariance Properties for General Diagnostic Classification Models

    Science.gov (United States)

    Bradshaw, Laine P.; Madison, Matthew J.

    2016-01-01

    In item response theory (IRT), the invariance property states that item parameter estimates are independent of the examinee sample, and examinee ability estimates are independent of the test items. While this property has long been established and understood by the measurement community for IRT models, the same cannot be said for diagnostic…

  19. Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2013-01-01

    Full Text Available Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP implementation and two pure Register-Transfer Level (RTL implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.

  20. Agricultural Land Classification Based on Statistical Analysis of Full Polarimetric SAR Data

    Science.gov (United States)

    Mahdian, M.; Homayouni, S.; Fazel, M. A.; Mohammadimanesh, F.

    2013-09-01

    The discrimination capability of Polarimetric Synthetic Aperture Radar (PolSAR) data makes them a unique source of information with a significant contribution in tackling problems concerning environmental applications. One of the most important applications of these data is land cover classification of the earth surface. These data type, make more detailed classification of phenomena by using the physical parameters and scattering mechanisms. In this paper, we have proposed a contextual unsupervised classification approach for full PolSAR data, which allows the use of multiple sources of statistical evidence. Expectation-Maximization (EM) classification algorithm is basically performed to estimate land cover classes. The EM algorithm is an iterative algorithm that formalizes the problem of parameters estimation of a mixture distribution. To represent the statistical properties and integrate contextual information of the associated image data in the analysis process we used Markov random field (MRF) modelling technique. This model is developed by formulating the maximum posteriori decision rule as the minimization of suitable energy functions. For select optimum distribution which adapts the data more efficiently we used Mellin transform which is a natural analytical tool to study the distribution of products and quotients of independent random variables. Our proposed classification method is applied to a full polarimetric L-band dataset acquired from an agricultural region in Winnipeg, Canada. We evaluate the classification performance based on kappa and overall accuracies of the proposed approach and compared with other well-known classic methods.

  1. Research on Model of Content Contrast of Standard Documents Based on Text Classification%基于文本分类的标准文献内容比对模型研究

    Institute of Scientific and Technical Information of China (English)

    刘嘉谊; 刘高勇

    2015-01-01

    Based on the analysis of standard documents structure and text classification, this paper puts forward the model of content contrast of standard documents to realize the rapid extraction and automatic classification of standard documents, support the easy and quick standard contrast work of related technical personnel and enterprise, and provide methods and strategies for the sustainable development to the standard contrast work.%在分析标准文献结构和文本分类的基础上,提出基于文本分类的标准文献内容比对模型,实现标准文献内容的快速提取和自动分类,支持相关技术人员和企业轻松快捷地实现标准比对工作,为标准文献比对工作的可持续发展提供方法和策略。

  2. Classification Model with High Deviation for Intrusion Detection on System Call Traces

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    A new classification model for host intrusion detection based on the unidentified short sequences and RIPPER algorithm is proposed. The concepts of different short sequences on the system call traces are strictly defined on the basis of in-depth analysis of completeness and correctness of pattern databases. Labels of short sequences are predicted by learned RIPPER rule set and the nature of the unidentified short sequences is confirmed by statistical method. Experiment results indicate that the classification model increases clearly the deviation between the attack and the normal traces and improves detection capability against known and unknown attacks.

  3. A novel hybrid classification model of genetic algorithms, modified k-Nearest Neighbor and developed backpropagation neural network.

    Science.gov (United States)

    Salari, Nader; Shohaimi, Shamarina; Najafi, Farid; Nallappan, Meenakshii; Karishnarajah, Isthrinayagy

    2014-01-01

    Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the

  4. Multiple Sclerosis and Employment: A Research Review Based on the International Classification of Function

    Science.gov (United States)

    Frain, Michael P.; Bishop, Malachy; Rumrill, Phillip D., Jr.; Chan, Fong; Tansey, Timothy N.; Strauser, David; Chiu, Chung-Yi

    2015-01-01

    Multiple sclerosis (MS) is an unpredictable, sometimes progressive chronic illness affecting people in the prime of their working lives. This article reviews the effects of MS on employment based on the World Health Organization's International Classification of Functioning, Disability and Health model. Correlations between employment and…

  5. Trace elements based classification on clinkers. Application to Spanish clinkers

    OpenAIRE

    Tamás, F. D.; Abonyi, J.; Puertas, F.

    2001-01-01

    The qualitative identification to determine the origin (i.e. manufacturing factory) of Spanish clinkers is described. The classification of clinkers produced in different factories can be based on their trace element content. Approximately fifteen clinker sorts are analysed, collected from 11 Spanish cement factories to determine their Mg, Sr, Ba, Mn, Ti, Zr, Zn and V content. An expert system formulated by a binary decision tree is designed based on the collected data. The performance of the...

  6. Keypoint Density-Based Region Proposal for Fine-Grained Object Detection and Classification Using Regions with Convolutional Neural Network Features

    Science.gov (United States)

    2015-12-15

    convolution, activation functions, and pooling. For a model trained on classes, the output from the classification layer comprises + 1...Keypoint Density-based Region Proposal for Fine-Grained Object Detection and Classification using Regions with Convolutional Neural Network...Convolutional Neural Networks (CNNs) enable them to outperform conventional techniques on standard object detection and classification tasks, their

  7. A markov classification model for metabolic pathways

    Directory of Open Access Journals (Sweden)

    Mamitsuka Hiroshi

    2010-01-01

    Full Text Available Abstract Background This paper considers the problem of identifying pathways through metabolic networks that relate to a specific biological response. Our proposed model, HME3M, first identifies frequently traversed network paths using a Markov mixture model. Then by employing a hierarchical mixture of experts, separate classifiers are built using information specific to each path and combined into an ensemble prediction for the response. Results We compared the performance of HME3M with logistic regression and support vector machines (SVM for both simulated pathways and on two metabolic networks, glycolysis and the pentose phosphate pathway for Arabidopsis thaliana. We use AltGenExpress microarray data and focus on the pathway differences in the developmental stages and stress responses of Arabidopsis. The results clearly show that HME3M outperformed the comparison methods in the presence of increasing network complexity and pathway noise. Furthermore an analysis of the paths identified by HME3M for each metabolic network confirmed known biological responses of Arabidopsis. Conclusions This paper clearly shows HME3M to be an accurate and robust method for classifying metabolic pathways. HME3M is shown to outperform all comparison methods and further is capable of identifying known biologically active pathways within microarray data.

  8. Classification of Mental Disorders Based on Temperament

    Directory of Open Access Journals (Sweden)

    Nadi Sakhvidi

    2015-08-01

    Full Text Available Context Different paradoxical theories are available regarding psychiatric disorders. The current study aimed to establish a more comprehensive overall approach. Evidence Acquisition This basic study examined ancient medical books. “The Canon” by Avicenna and “Comprehensive Textbook of Psychiatry” by Kaplan and Sadock were the most important and frequently consulted books in this study. Results Four groups of temperaments were identified: high active, high flexible; high active, low flexible; low active, low flexible; and low active, high flexible. When temperament deteriorates personality, non-psychotic, and psychotic psychiatric disorders can develop. Conclusions Temperaments can provide a basis to classify psychiatric disorders. Psychiatric disorders can be placed in a spectrum based on temperaments.

  9. Improving EEG-Based Driver Fatigue Classification Using Sparse-Deep Belief Networks

    Science.gov (United States)

    Chai, Rifai; Ling, Sai Ho; San, Phyo Phyo; Naik, Ganesh R.; Nguyen, Tuan N.; Tran, Yvonne; Craig, Ashley; Nguyen, Hung T.

    2017-01-01

    This paper presents an improvement of classification performance for electroencephalography (EEG)-based driver fatigue classification between fatigue and alert states with the data collected from 43 participants. The system employs autoregressive (AR) modeling as the features extraction algorithm, and sparse-deep belief networks (sparse-DBN) as the classification algorithm. Compared to other classifiers, sparse-DBN is a semi supervised learning method which combines unsupervised learning for modeling features in the pre-training layer and supervised learning for classification in the following layer. The sparsity in sparse-DBN is achieved with a regularization term that penalizes a deviation of the expected activation of hidden units from a fixed low-level prevents the network from overfitting and is able to learn low-level structures as well as high-level structures. For comparison, the artificial neural networks (ANN), Bayesian neural networks (BNN), and original deep belief networks (DBN) classifiers are used. The classification results show that using AR feature extractor and DBN classifiers, the classification performance achieves an improved classification performance with a of sensitivity of 90.8%, a specificity of 90.4%, an accuracy of 90.6%, and an area under the receiver operating curve (AUROC) of 0.94 compared to ANN (sensitivity at 80.8%, specificity at 77.8%, accuracy at 79.3% with AUC-ROC of 0.83) and BNN classifiers (sensitivity at 84.3%, specificity at 83%, accuracy at 83.6% with AUROC of 0.87). Using the sparse-DBN classifier, the classification performance improved further with sensitivity of 93.9%, a specificity of 92.3%, and an accuracy of 93.1% with AUROC of 0.96. Overall, the sparse-DBN classifier improved accuracy by 13.8, 9.5, and 2.5% over ANN, BNN, and DBN classifiers, respectively. PMID:28326009

  10. Cardiac arrhythmia classification based on mutiple lead electrocardiogram signals and multivariate autoregressive modeling method%基于多导联心电信号和多变量回归模型的心律失常的分类

    Institute of Scientific and Technical Information of China (English)

    葛丁飞; 李时辉; Krishnan S. M.

    2004-01-01

    心电信号(ECG)智能分析非常有利于严重心脏病人的自动诊断.本文介绍了多变量回归模型(MAR)建模法,利用MAR模型从双导联ECG中提取特征对ECG信号进行分类.在分类时,利用MAR模型系数及其K-L变换(K-L MAR系数)作为信号特征,并采用了树状决策过程和二次判别函数(QDF)分类器.利用文中方法对MIT-BIH标准数据库中的正常窦性心律(NSR)、期收缩(APC)、心室早期收缩(PVC)、心室性心动过速(VT)和心室纤维性颤动(VF)各300个样本信号进行了建模和测试. 结果表明,为了达到分类目的,MAR模型阶数取4是足够的,基于MAR系数的分类取得了比基于K-L MAR系数的分类稍好的结果.基于MAR系数的分类获得了97.3%~98.6%的分类精度.%Artificial-intelligence analysis of electrocardiogram (ECG) signals is great benefit to the automatic diagnosis in critical ill patients. Multivariate autoregressive modeling (MAR) for the purpose of classification of cardiac arrhythmias has been introduced. The MAR coefficients and K-L transformation of MAR coefficients extracted from two-lead ECG signals have been utilized for representing the ECG signals. The ECG data obtained from MIT-BIH database included normal sinus rhythm, atria premature contraction, premature ventricular contraction, ventricular tachycardia, and ventricular fibrillation. The current classification was performed using a stage-by-stage quadratic discriminant function (QDF). The results showed a MAR order of 4 was sufficient for the purpose of classification, and MAR coefficients produced slightly better results than K-L transformation of MAR coefficients. The classification accuracy of 97.3% to 98.6% based on MAR coefficients is obtained in the research.

  11. QSAR models for oxidation of organic micropollutants in water based on ozone and hydroxyl radical rate constants and their chemical classification

    KAUST Repository

    Sudhakaran, Sairam

    2013-03-01

    Ozonation is an oxidation process for the removal of organic micropollutants (OMPs) from water and the chemical reaction is governed by second-order kinetics. An advanced oxidation process (AOP), wherein the hydroxyl radicals (OH radicals) are generated, is more effective in removing a wider range of OMPs from water than direct ozonation. Second-order rate constants (kOH and kO3) are good indices to estimate the oxidation efficiency, where higher rate constants indicate more rapid oxidation. In this study, quantitative structure activity relationships (QSAR) models for O3 and AOP processes were developed, and rate constants, kOH and kO3, were predicted based on target compound properties. The kO3 and kOH values ranged from 5 * 10-4 to 105 M-1s-1 and 0.04 to 18 * (109) M-1 s-1, respectively. Several molecular descriptors which potentially influence O3 and OH radical oxidation were identified and studied. The QSAR-defining descriptors were double bond equivalence (DBE), ionisation potential (IP), electron-affinity (EA) and weakly-polar component of solvent accessible surface area (WPSA), and the chemical and statistical significance of these descriptors was discussed. Multiple linear regression was used to build the QSAR models, resulting in high goodness-of-fit, r2 (>0.75). The models were validated by internal and external validation along with residual plots. © 2012 Elsevier Ltd.

  12. Zone-specific logistic regression models improve classification of prostate cancer on multi-parametric MRI

    Energy Technology Data Exchange (ETDEWEB)

    Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim U.; Emberton, Mark [University College London, Research Department of Urology, Division of Surgery and Interventional Science, London (United Kingdom); Kirkham, Alex [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)

    2015-09-15

    To assess the interchangeability of zone-specific (peripheral-zone (PZ) and transition-zone (TZ)) multiparametric-MRI (mp-MRI) logistic-regression (LR) models for classification of prostate cancer. Two hundred and thirty-one patients (70 TZ training-cohort; 76 PZ training-cohort; 85 TZ temporal validation-cohort) underwent mp-MRI and transperineal-template-prostate-mapping biopsy. PZ and TZ uni/multi-variate mp-MRI LR-models for classification of significant cancer (any cancer-core-length (CCL) with Gleason > 3 + 3 or any grade with CCL ≥ 4 mm) were derived from the respective cohorts and validated within the same zone by leave-one-out analysis. Inter-zonal performance was tested by applying TZ models to the PZ training-cohort and vice-versa. Classification performance of TZ models for TZ cancer was further assessed in the TZ validation-cohort. ROC area-under-curve (ROC-AUC) analysis was used to compare models. The univariate parameters with the best classification performance were the normalised T2 signal (T2nSI) within the TZ (ROC-AUC = 0.77) and normalized early contrast-enhanced T1 signal (DCE-nSI) within the PZ (ROC-AUC = 0.79). Performance was not significantly improved by bi-variate/tri-variate modelling. PZ models that contained DCE-nSI performed poorly in classification of TZ cancer. The TZ model based solely on maximum-enhancement poorly classified PZ cancer. LR-models dependent on DCE-MRI parameters alone are not interchangeable between prostatic zones; however, models based exclusively on T2 and/or ADC are more robust for inter-zonal application. (orig.)

  13. A wrapper-based approach to image segmentation and classification.

    Science.gov (United States)

    Farmer, Michael E; Jain, Anil K

    2005-12-01

    The traditional processing flow of segmentation followed by classification in computer vision assumes that the segmentation is able to successfully extract the object of interest from the background image. It is extremely difficult to obtain a reliable segmentation without any prior knowledge about the object that is being extracted from the scene. This is further complicated by the lack of any clearly defined metrics for evaluating the quality of segmentation or for comparing segmentation algorithms. We propose a method of segmentation that addresses both of these issues, by using the object classification subsystem as an integral part of the segmentation. This will provide contextual information regarding the objects to be segmented, as well as allow us to use the probability of correct classification as a metric to determine the quality of the segmentation. We view traditional segmentation as a filter operating on the image that is independent of the classifier, much like the filter methods for feature selection. We propose a new paradigm for segmentation and classification that follows the wrapper methods of feature selection. Our method wraps the segmentation and classification together, and uses the classification accuracy as the metric to determine the best segmentation. By using shape as the classification feature, we are able to develop a segmentation algorithm that relaxes the requirement that the object of interest to be segmented must be homogeneous in some low-level image parameter, such as texture, color, or grayscale. This represents an improvement over other segmentation methods that have used classification information only to modify the segmenter parameters, since these algorithms still require an underlying homogeneity in some parameter space. Rather than considering our method as, yet, another segmentation algorithm, we propose that our wrapper method can be considered as an image segmentation framework, within which existing image segmentation

  14. Similarity-Based Classification in Partially Labeled Networks

    Science.gov (United States)

    Zhang, Qian-Ming; Shang, Ming-Sheng; Lü, Linyuan

    Two main difficulties in the problem of classification in partially labeled networks are the sparsity of the known labeled nodes and inconsistency of label information. To address these two difficulties, we propose a similarity-based method, where the basic assumption is that two nodes are more likely to be categorized into the same class if they are more similar. In this paper, we introduce ten similarity indices defined based on the network structure. Empirical results on the co-purchase network of political books show that the similarity-based method can, to some extent, overcome these two difficulties and give higher accurate classification than the relational neighbors method, especially when the labeled nodes are sparse. Furthermore, we find that when the information of known labeled nodes is sufficient, the indices considering only local information can perform as good as those global indices while having much lower computational complexity.

  15. Object-Based Classification and Change Detection of Hokkaido, Japan

    Science.gov (United States)

    Park, J. G.; Harada, I.; Kwak, Y.

    2016-06-01

    Topography and geology are factors to characterize the distribution of natural vegetation. Topographic contour is particularly influential on the living conditions of plants such as soil moisture, sunlight, and windiness. Vegetation associations having similar characteristics are present in locations having similar topographic conditions unless natural disturbances such as landslides and forest fires or artificial disturbances such as deforestation and man-made plantation bring about changes in such conditions. We developed a vegetation map of Japan using an object-based segmentation approach with topographic information (elevation, slope, slope direction) that is closely related to the distribution of vegetation. The results found that the object-based classification is more effective to produce a vegetation map than the pixel-based classification.

  16. Robust model selection and the statistical classification of languages

    Science.gov (United States)

    García, J. E.; González-López, V. A.; Viola, M. L. L.

    2012-10-01

    In this paper we address the problem of model selection for the set of finite memory stochastic processes with finite alphabet, when the data is contaminated. We consider m independent samples, with more than half of them being realizations of the same stochastic process with law Q, which is the one we want to retrieve. We devise a model selection procedure such that for a sample size large enough, the selected process is the one with law Q. Our model selection strategy is based on estimating relative entropies to select a subset of samples that are realizations of the same law. Although the procedure is valid for any family of finite order Markov models, we will focus on the family of variable length Markov chain models, which include the fixed order Markov chain model family. We define the asymptotic breakdown point (ABDP) for a model selection procedure, and we show the ABDP for our procedure. This means that if the proportion of contaminated samples is smaller than the ABDP, then, as the sample size grows our procedure selects a model for the process with law Q. We also use our procedure in a setting where we have one sample conformed by the concatenation of sub-samples of two or more stochastic processes, with most of the subsamples having law Q. We conducted a simulation study. In the application section we address the question of the statistical classification of languages according to their rhythmic features using speech samples. This is an important open problem in phonology. A persistent difficulty on this problem is that the speech samples correspond to several sentences produced by diverse speakers, corresponding to a mixture of distributions. The usual procedure to deal with this problem has been to choose a subset of the original sample which seems to best represent each language. The selection is made by listening to the samples. In our application we use the full dataset without any preselection of samples. We apply our robust methodology estimating

  17. Classification data mining method based on dynamic RBF neural networks

    Science.gov (United States)

    Zhou, Lijuan; Xu, Min; Zhang, Zhang; Duan, Luping

    2009-04-01

    With the widely application of databases and sharp development of Internet, The capacity of utilizing information technology to manufacture and collect data has improved greatly. It is an urgent problem to mine useful information or knowledge from large databases or data warehouses. Therefore, data mining technology is developed rapidly to meet the need. But DM (data mining) often faces so much data which is noisy, disorder and nonlinear. Fortunately, ANN (Artificial Neural Network) is suitable to solve the before-mentioned problems of DM because ANN has such merits as good robustness, adaptability, parallel-disposal, distributing-memory and high tolerating-error. This paper gives a detailed discussion about the application of ANN method used in DM based on the analysis of all kinds of data mining technology, and especially lays stress on the classification Data Mining based on RBF neural networks. Pattern classification is an important part of the RBF neural network application. Under on-line environment, the training dataset is variable, so the batch learning algorithm (e.g. OLS) which will generate plenty of unnecessary retraining has a lower efficiency. This paper deduces an incremental learning algorithm (ILA) from the gradient descend algorithm to improve the bottleneck. ILA can adaptively adjust parameters of RBF networks driven by minimizing the error cost, without any redundant retraining. Using the method proposed in this paper, an on-line classification system was constructed to resolve the IRIS classification problem. Experiment results show the algorithm has fast convergence rate and excellent on-line classification performance.

  18. Automatic age and gender classification using supervised appearance model

    Science.gov (United States)

    Bukar, Ali Maina; Ugail, Hassan; Connah, David

    2016-11-01

    Age and gender classification are two important problems that recently gained popularity in the research community, due to their wide range of applications. Research has shown that both age and gender information are encoded in the face shape and texture, hence the active appearance model (AAM), a statistical model that captures shape and texture variations, has been one of the most widely used feature extraction techniques for the aforementioned problems. However, AAM suffers from some drawbacks, especially when used for classification. This is primarily because principal component analysis (PCA), which is at the core of the model, works in an unsupervised manner, i.e., PCA dimensionality reduction does not take into account how the predictor variables relate to the response (class labels). Rather, it explores only the underlying structure of the predictor variables, thus, it is no surprise if PCA discards valuable parts of the data that represent discriminatory features. Toward this end, we propose a supervised appearance model (sAM) that improves on AAM by replacing PCA with partial least-squares regression. This feature extraction technique is then used for the problems of age and gender classification. Our experiments show that sAM has better predictive power than the conventional AAM.

  19. Land Cover and Land Use Classification with TWOPAC: towards Automated Processing for Pixel- and Object-Based Image Classification

    Directory of Open Access Journals (Sweden)

    Stefan Dech

    2012-09-01

    Full Text Available We present a novel and innovative automated processing environment for the derivation of land cover (LC and land use (LU information. This processing framework named TWOPAC (TWinned Object and Pixel based Automated classification Chain enables the standardized, independent, user-friendly, and comparable derivation of LC and LU information, with minimized manual classification labor. TWOPAC allows classification of multi-spectral and multi-temporal remote sensing imagery from different sensor types. TWOPAC enables not only pixel-based classification, but also allows classification based on object-based characteristics. Classification is based on a Decision Tree approach (DT for which the well-known C5.0 code has been implemented, which builds decision trees based on the concept of information entropy. TWOPAC enables automatic generation of the decision tree classifier based on a C5.0-retrieved ascii-file, as well as fully automatic validation of the classification output via sample based accuracy assessment.Envisaging the automated generation of standardized land cover products, as well as area-wide classification of large amounts of data in preferably a short processing time, standardized interfaces for process control, Web Processing Services (WPS, as introduced by the Open Geospatial Consortium (OGC, are utilized. TWOPAC’s functionality to process geospatial raster or vector data via web resources (server, network enables TWOPAC’s usability independent of any commercial client or desktop software and allows for large scale data processing on servers. Furthermore, the components of TWOPAC were built-up using open source code components and are implemented as a plug-in for Quantum GIS software for easy handling of the classification process from the user’s perspective.

  20. Rule-Based Classification of Chemical Structures by Scaffold.

    Science.gov (United States)

    Schuffenhauer, Ansgar; Varin, Thibault

    2011-08-01

    Databases for small organic chemical molecules usually contain millions of structures. The screening decks of pharmaceutical companies contain more than a million of structures. Nevertheless chemical substructure searching in these databases can be performed interactively in seconds. Because of this nobody has really missed structural classification of these databases for the purpose of finding data for individual chemical substructures. However, a full deck high-throughput screen produces also activity data for more than a million of substances. How can this amount of data be analyzed? Which are the active scaffolds identified by an assays? To answer such questions systematic classifications of molecules by scaffolds are needed. In this review it is described how molecules can be hierarchically classified by their scaffolds. It is explained how such classifications can be used to identify active scaffolds in an HTS data set. Once active classes are identified, they need to be visualized in the context of related scaffolds in order to understand SAR. Consequently such visualizations are another topic of this review. In addition scaffold based diversity measures are discussed and an outlook is given about the potential impact of structural classifications on a chemically aware semantic web.

  1. Comparison Of Power Quality Disturbances Classification Based On Neural Network

    Directory of Open Access Journals (Sweden)

    Nway Nway Kyaw Win

    2015-07-01

    Full Text Available Abstract Power quality disturbances PQDs result serious problems in the reliability safety and economy of power system network. In order to improve electric power quality events the detection and classification of PQDs must be made type of transient fault. Software analysis of wavelet transform with multiresolution analysis MRA algorithm and feed forward neural network probabilistic and multilayer feed forward neural network based methodology for automatic classification of eight types of PQ signals flicker harmonics sag swell impulse fluctuation notch and oscillatory will be presented. The wavelet family Db4 is chosen in this system to calculate the values of detailed energy distributions as input features for classification because it can perform well in detecting and localizing various types of PQ disturbances. This technique classifies the types of PQDs problem sevents.The classifiers classify and identify the disturbance type according to the energy distribution. The results show that the PNN can analyze different power disturbance types efficiently. Therefore it can be seen that PNN has better classification accuracy than MLFF.

  2. An AERONET-based aerosol classification using the Mahalanobis distance

    Science.gov (United States)

    Hamill, Patrick; Giordano, Marco; Ward, Carolyne; Giles, David; Holben, Brent

    2016-09-01

    We present an aerosol classification based on AERONET aerosol data from 1993 to 2012. We used the AERONET Level 2.0 almucantar aerosol retrieval products to define several reference aerosol clusters which are characteristic of the following general aerosol types: Urban-Industrial, Biomass Burning, Mixed Aerosol, Dust, and Maritime. The classification of a particular aerosol observation as one of these aerosol types is determined by its five-dimensional Mahalanobis distance to each reference cluster. We have calculated the fractional aerosol type distribution at 190 AERONET sites, as well as the monthly variation in aerosol type at those locations. The results are presented on a global map and individually in the supplementary material. Our aerosol typing is based on recognizing that different geographic regions exhibit characteristic aerosol types. To generate reference clusters we only keep data points that lie within a Mahalanobis distance of 2 from the centroid. Our aerosol characterization is based on the AERONET retrieved quantities, therefore it does not include low optical depth values. The analysis is based on "point sources" (the AERONET sites) rather than globally distributed values. The classifications obtained will be useful in interpreting aerosol retrievals from satellite borne instruments.

  3. Classification of Deep Web Based on Model Matching%基于模型匹配的Deep Web数据库分类

    Institute of Scientific and Technical Information of China (English)

    郭东伟; 李三义; 张仲明; 刘淼

    2011-01-01

    The present paper presents a new method of information extraction from the Deep Web based on model matching. It extracts the characteristic vector of the Deep Web query interface by means of analysising the depth of feature of web page structure automatically. The frequency and concentration rate are both considered when the weight in vector space model is defined. The characteristic word vector is used to construct the database query interface with the number of characteristic word taken into account. At last,model matching is used to classify different databases. This method is validated by experiment results.%提出一种基于模型匹配的深网(Deep Web)在线专业数据库查询接口特征抽取方法,该方法通过分析网页结构中特征词的深度自动抽取查询接口特征向量,同时考虑频度和集中度两种因素定义特征词向量空间中的权值,并在传统向量模型的基础上加入特征词个数作为一个新的分量,构建一个数据库查询接口,使用模型匹配的分类方法对其进行分类.实验验证了该方法的有效性.

  4. A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

    Directory of Open Access Journals (Sweden)

    Yong Wang

    2016-02-01

    Full Text Available Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

  5. Cancer pain: A critical review of mechanism-based classification and physical therapy management in palliative care

    Directory of Open Access Journals (Sweden)

    Senthil P Kumar

    2011-01-01

    Full Text Available Mechanism-based classification and physical therapy management of pain is essential to effectively manage painful symptoms in patients attending palliative care. The objective of this review is to provide a detailed review of mechanism-based classification and physical therapy management of patients with cancer pain. Cancer pain can be classified based upon pain symptoms, pain mechanisms and pain syndromes. Classification based upon mechanisms not only addresses the underlying pathophysiology but also provides us with an understanding behind patient′s symptoms and treatment responses. Existing evidence suggests that the five mechanisms - central sensitization, peripheral sensitization, sympathetically maintained pain, nociceptive and cognitive-affective - operate in patients with cancer pain. Summary of studies showing evidence for physical therapy treatment methods for cancer pain follows with suggested therapeutic implications. Effective palliative physical therapy care using a mechanism-based classification model should be tailored to suit each patient′s findings, using a biopsychosocial model of pain.

  6. Three-Class Mammogram Classification Based on Descriptive CNN Features

    Science.gov (United States)

    Zhang, Qianni; Jadoon, Adeel

    2017-01-01

    In this paper, a novel classification technique for large data set of mammograms using a deep learning method is proposed. The proposed model targets a three-class classification study (normal, malignant, and benign cases). In our model we have presented two methods, namely, convolutional neural network-discrete wavelet (CNN-DW) and convolutional neural network-curvelet transform (CNN-CT). An augmented data set is generated by using mammogram patches. To enhance the contrast of mammogram images, the data set is filtered by contrast limited adaptive histogram equalization (CLAHE). In the CNN-DW method, enhanced mammogram images are decomposed as its four subbands by means of two-dimensional discrete wavelet transform (2D-DWT), while in the second method discrete curvelet transform (DCT) is used. In both methods, dense scale invariant feature (DSIFT) for all subbands is extracted. Input data matrix containing these subband features of all the mammogram patches is created that is processed as input to convolutional neural network (CNN). Softmax layer and support vector machine (SVM) layer are used to train CNN for classification. Proposed methods have been compared with existing methods in terms of accuracy rate, error rate, and various validation assessment measures. CNN-DW and CNN-CT have achieved accuracy rate of 81.83% and 83.74%, respectively. Simulation results clearly validate the significance and impact of our proposed model as compared to other well-known existing techniques. PMID:28191461

  7. Object-Based Classification as an Alternative Approach to the Traditional Pixel-Based Classification to Identify Potential Habitat of the Grasshopper Sparrow

    Science.gov (United States)

    Jobin, Benoît; Labrecque, Sandra; Grenier, Marcelle; Falardeau, Gilles

    2008-01-01

    The traditional method of identifying wildlife habitat distribution over large regions consists of pixel-based classification of satellite images into a suite of habitat classes used to select suitable habitat patches. Object-based classification is a new method that can achieve the same objective based on the segmentation of spectral bands of the image creating homogeneous polygons with regard to spatial or spectral characteristics. The segmentation algorithm does not solely rely on the single pixel value, but also on shape, texture, and pixel spatial continuity. The object-based classification is a knowledge base process where an interpretation key is developed using ground control points and objects are assigned to specific classes according to threshold values of determined spectral and/or spatial attributes. We developed a model using the eCognition software to identify suitable habitats for the Grasshopper Sparrow, a rare and declining species found in southwestern Québec. The model was developed in a region with known breeding sites and applied on other images covering adjacent regions where potential breeding habitats may be present. We were successful in locating potential habitats in areas where dairy farming prevailed but failed in an adjacent region covered by a distinct Landsat scene and dominated by annual crops. We discuss the added value of this method, such as the possibility to use the contextual information associated to objects and the ability to eliminate unsuitable areas in the segmentation and land cover classification processes, as well as technical and logistical constraints. A series of recommendations on the use of this method and on conservation issues of Grasshopper Sparrow habitat is also provided.

  8. Multi-robot system learning based on evolutionary classification

    Directory of Open Access Journals (Sweden)

    Manko Sergey

    2016-01-01

    Full Text Available This paper presents a novel machine learning method for agents of a multi-robot system. The learning process is based on knowledge discovery through continual analysis of robot sensory information. We demonstrate that classification trees and evolutionary forests may be a basis for creation of autonomous robots capable both of learning and knowledge exchange with other agents in multi-robot system. The results of experimental studies confirm the effectiveness of the proposed approach.

  9. Label-Embedding for Attribute-Based Classification

    OpenAIRE

    Akata, Zeynep; Perronnin, Florent; Harchaoui, Zaid; Schmid, Cordelia

    2013-01-01

    International audience; Attributes are an intermediate representation, which enables parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function which measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, gi...

  10. Hierarchical Classification of Chinese Documents Based on N-grams

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    We explore the techniques of utilizing N-gram informatio n tocategorize Chinese text documents hierarchically so that the classifier can shak e off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A hierarchical Chinese text classif ier is implemented. Experimental results show that hierarchically classifying Chinese text documents based N-grams can achieve satisfactory performance and outperforms the other traditional Chinese text classifiers.

  11. Tree-based disease classification using protein data.

    Science.gov (United States)

    Zhu, Hongtu; Yu, Chang-Yung; Zhang, Heping

    2003-09-01

    A reliable and precise classification of diseases is essential for successful diagnosis and treatment. Using mass spectrometry from clinical specimens, scientists may find the protein variations among disease and use this information to improve diagnosis. In this paper, we propose a novel procedure to classify disease status based on the protein data from mass spectrometry. Our new tree-based algorithm consists of three steps: projection, selection and classification tree. The projection step aims to project all observations from specimens into the same bases so that the projected data have fixed coordinates. Thus, for each specimen, we obtain a large vector of 'coefficients' on the same basis. The purpose of the selection step is data reduction by condensing the large vector from the projection step into a much lower order of informative vector. Finally, using these reduced vectors, we apply recursive partitioning to construct an informative classification tree. This method has been successfully applied to protein data, provided by the Department of Radiology and Chemistry at Duke University.

  12. Prediction and Classification of Human G-protein Coupled Receptors Based on Support Vector Machines

    Institute of Scientific and Technical Information of China (English)

    Yun-Fei Wang; Huan Chen; Yan-Hong Zhou

    2005-01-01

    A computational system for the prediction and classification of human G-protein coupled receptors (GPCRs) has been developed based on the support vector machine (SVM) method and protein sequence information. The feature vectors used to develop the SVM prediction models consist of statistically significant features selected from single amino acid, dipeptide, and tripeptide compositions of protein sequences. Furthermore, the length distribution difference between GPCRsand non-GPCRs has also been exploited to improve the prediction performance.The testing results with annotated human protein sequences demonstrate that this system can get good performance for both prediction and classification of human GPCRs.

  13. 以犯罪分类为基础的狱内防控模式%Crime Prevention Model inside Prison Based on Crime Classification

    Institute of Scientific and Technical Information of China (English)

    徐为霞; 王火钦; 王秀丽

    2014-01-01

    监狱安全的核心在于狱内犯罪的防控,狱内犯罪防控必须以犯罪分类为基础。狱内犯罪可以分为预谋犯罪、情景犯罪和激情犯罪三类。狱内防控主要有三种模式:预谋犯罪---主动侦查模式;情景犯罪---排查防控模式;激情犯罪---处置和解模式。%The core of prison safety lies in the prevention and control of crimes inside prison, which must be the basis for the classification of crime.Crimes inside prison can be divided into premeditated crime, scene crime and crimes of passion. Crime prevention inside prison has three modes : premeditated crime - active investigation;scene crime - investigation and prevention mode;crime of passion - disposal reconciliation mode.

  14. Variable Star Signature Classification using Slotted Symbolic Markov Modeling

    CERN Document Server

    Johnston, Kyle B

    2016-01-01

    With the advent of digital astronomy, new benefits and new challenges have been presented to the modern day astronomer. No longer can the astronomer rely on manual processing, instead the profession as a whole has begun to adopt more advanced computational means. This paper focuses on the construction and application of a novel time-domain signature extraction methodology and the development of a supporting supervised pattern classification algorithm for the identification of variable stars. A methodology for the reduction of stellar variable observations (time-domain data) into a novel feature space representation is introduced. The methodology presented will be referred to as Slotted Symbolic Markov Modeling (SSMM) and has a number of advantages which will be demonstrated to be beneficial; specifically to the supervised classification of stellar variables. It will be shown that the methodology outperformed a baseline standard methodology on a standardized set of stellar light curve data. The performance on ...

  15. The DTW-based representation space for seismic pattern classification

    Science.gov (United States)

    Orozco-Alzate, Mauricio; Castro-Cabrera, Paola Alexandra; Bicego, Manuele; Londoño-Bonilla, John Makario

    2015-12-01

    Distinguishing among the different seismic volcanic patterns is still one of the most important and labor-intensive tasks for volcano monitoring. This task could be lightened and made free from subjective bias by using automatic classification techniques. In this context, a core but often overlooked issue is the choice of an appropriate representation of the data to be classified. Recently, it has been suggested that using a relative representation (i.e. proximities, namely dissimilarities on pairs of objects) instead of an absolute one (i.e. features, namely measurements on single objects) is advantageous to exploit the relational information contained in the dissimilarities to derive highly discriminant vector spaces, where any classifier can be used. According to that motivation, this paper investigates the suitability of a dynamic time warping (DTW) dissimilarity-based vector representation for the classification of seismic patterns. Results show the usefulness of such a representation in the seismic pattern classification scenario, including analyses of potential benefits from recent advances in the dissimilarity-based paradigm such as the proper selection of representation sets and the combination of different dissimilarity representations that might be available for the same data.

  16. Changing Histopathological Diagnostics by Genome-Based Tumor Classification

    Directory of Open Access Journals (Sweden)

    Michael Kloth

    2014-05-01

    Full Text Available Traditionally, tumors are classified by histopathological criteria, i.e., based on their specific morphological appearances. Consequently, current therapeutic decisions in oncology are strongly influenced by histology rather than underlying molecular or genomic aberrations. The increase of information on molecular changes however, enabled by the Human Genome Project and the International Cancer Genome Consortium as well as the manifold advances in molecular biology and high-throughput sequencing techniques, inaugurated the integration of genomic information into disease classification. Furthermore, in some cases it became evident that former classifications needed major revision and adaption. Such adaptations are often required by understanding the pathogenesis of a disease from a specific molecular alteration, using this molecular driver for targeted and highly effective therapies. Altogether, reclassifications should lead to higher information content of the underlying diagnoses, reflecting their molecular pathogenesis and resulting in optimized and individual therapeutic decisions. The objective of this article is to summarize some particularly important examples of genome-based classification approaches and associated therapeutic concepts. In addition to reviewing disease specific markers, we focus on potentially therapeutic or predictive markers and the relevance of molecular diagnostics in disease monitoring.

  17. An ellipse detection algorithm based on edge classification

    Science.gov (United States)

    Yu, Liu; Chen, Feng; Huang, Jianming; Wei, Xiangquan

    2015-12-01

    In order to enhance the speed and accuracy of ellipse detection, an ellipse detection algorithm based on edge classification is proposed. Too many edge points are removed by making edge into point in serialized form and the distance constraint between the edge points. It achieves effective classification by the criteria of the angle between the edge points. And it makes the probability of randomly selecting the edge points falling on the same ellipse greatly increased. Ellipse fitting accuracy is significantly improved by the optimization of the RED algorithm. It uses Euclidean distance to measure the distance from the edge point to the elliptical boundary. Experimental results show that: it can detect ellipse well in case of edge with interference or edges blocking each other. It has higher detecting precision and less time consuming than the RED algorithm.

  18. Entropy coders for image compression based on binary forward classification

    Science.gov (United States)

    Yoo, Hoon; Jeong, Jechang

    2000-12-01

    Entropy coders as a noiseless compression method are widely used as final step compression for images, and there have been many contributions to increase of entropy coder performance and to reduction of entropy coder complexity. In this paper, we propose some entropy coders based on the binary forward classification (BFC). The BFC requires overhead of classification but there is no change between the amount of input information and the total amount of classified output information, which we prove this property in this paper. And using the proved property, we propose entropy coders that are the BFC followed by Golomb-Rice coders (BFC+GR) and the BFC followed by arithmetic coders (BFC+A). The proposed entropy coders introduce negligible additional complexity due to the BFC. Simulation results also show better performance than other entropy coders that have similar complexity to the proposed coders.

  19. Discriminative likelihood score weighting based on acoustic-phonetic classification for speaker identification

    Science.gov (United States)

    Suh, Youngjoo; Kim, Hoirin

    2014-12-01

    In this paper, a new discriminative likelihood score weighting technique is proposed for speaker identification. The proposed method employs a discriminative weighting of frame-level log-likelihood scores with acoustic-phonetic classification in the Gaussian mixture model (GMM)-based speaker identification. Experiments performed on the Aurora noise-corrupted TIMIT database showed that the proposed approach provides meaningful performance improvement with an overall relative error reduction of 15.8% over the maximum likelihood-based baseline GMM approach.

  20. Local fractal dimension based approaches for colonic polyp classification.

    Science.gov (United States)

    Häfner, Michael; Tamaki, Toru; Tanaka, Shinji; Uhl, Andreas; Wimmer, Georg; Yoshida, Shigeto

    2015-12-01

    This work introduces texture analysis methods that are based on computing the local fractal dimension (LFD; or also called the local density function) and applies them for colonic polyp classification. The methods are tested on 8 HD-endoscopic image databases, where each database is acquired using different imaging modalities (Pentax's i-Scan technology combined with or without staining the mucosa) and on a zoom-endoscopic image database using narrow band imaging. In this paper, we present three novel extensions to a LFD based approach. These extensions additionally extract shape and/or gradient information of the image to enhance the discriminativity of the original approach. To compare the results of the LFD based approaches with the results of other approaches, five state of the art approaches for colonic polyp classification are applied to the employed databases. Experiments show that LFD based approaches are well suited for colonic polyp classification, especially the three proposed extensions. The three proposed extensions are the best performing methods or at least among the best performing methods for each of the employed databases. The methods are additionally tested by means of a public texture image database, the UIUCtex database. With this database, the viewpoint invariance of the methods is assessed, an important features for the employed endoscopic image databases. Results imply that most of the LFD based methods are more viewpoint invariant than the other methods. However, the shape, size and orientation adapted LFD approaches (which are especially designed to enhance the viewpoint invariance) are in general not more viewpoint invariant than the other LFD based approaches.

  1. Rule based fuzzy logic approach for classification of fibromyalgia syndrome.

    Science.gov (United States)

    Arslan, Evren; Yildiz, Sedat; Albayrak, Yalcin; Koklukaya, Etem

    2016-06-01

    Fibromyalgia syndrome (FMS) is a chronic muscle and skeletal system disease observed generally in women, manifesting itself with a widespread pain and impairing the individual's quality of life. FMS diagnosis is made based on the American College of Rheumatology (ACR) criteria. However, recently the employability and sufficiency of ACR criteria are under debate. In this context, several evaluation methods, including clinical evaluation methods were proposed by researchers. Accordingly, ACR had to update their criteria announced back in 1990, 2010 and 2011. Proposed rule based fuzzy logic method aims to evaluate FMS at a different angle as well. This method contains a rule base derived from the 1990 ACR criteria and the individual experiences of specialists. The study was conducted using the data collected from 60 inpatient and 30 healthy volunteers. Several tests and physical examination were administered to the participants. The fuzzy logic rule base was structured using the parameters of tender point count, chronic widespread pain period, pain severity, fatigue severity and sleep disturbance level, which were deemed important in FMS diagnosis. It has been observed that generally fuzzy predictor was 95.56 % consistent with at least of the specialists, who are not a creator of the fuzzy rule base. Thus, in diagnosis classification where the severity of FMS was classified as well, consistent findings were obtained from the comparison of interpretations and experiences of specialists and the fuzzy logic approach. The study proposes a rule base, which could eliminate the shortcomings of 1990 ACR criteria during the FMS evaluation process. Furthermore, the proposed method presents a classification on the severity of the disease, which was not available with the ACR criteria. The study was not limited to only disease classification but at the same time the probability of occurrence and severity was classified. In addition, those who were not suffering from FMS were

  2. A Novel Algorithm of Network Trade Customer Classification Based on Fourier Basis Functions

    Directory of Open Access Journals (Sweden)

    Li Xinwu

    2013-11-01

    Full Text Available Learning algorithm of neural network is always an important research contents in neural network theory research and application field, learning algorithm about the feed-forward neural network has no satisfactory solution in particular for its defects in calculation speed. The paper presents a new Fourier basis functions neural network algorithm and applied it to classify network trade customer. First, 21 customer classification indicators are designed, based on characteristics and behaviors analysis of network trade customer, including customer characteristics type variables and customer behaviors type variables,; Second, Fourier basis functions is used to improve the calculation flow and algorithm structure of original BP neural network algorithm to speed up its convergence and then a new Fourier basis neural network model is constructed. Finally the experimental results show that the problem of convergence speed can been solved, and the accuracy of the customer classification are ensured when the new algorithm is used in network trade customer classification practically.

  3. Dictionary-Based, Clustered Sparse Representation for Hyperspectral Image Classification

    Directory of Open Access Journals (Sweden)

    Zhen-tao Qin

    2015-01-01

    Full Text Available This paper presents a new, dictionary-based method for hyperspectral image classification, which incorporates both spectral and contextual characteristics of a sample clustered to obtain a dictionary of each pixel. The resulting pixels display a common sparsity pattern in identical clustered groups. We calculated the image’s sparse coefficients using the dictionary approach, which generated the sparse representation features of the remote sensing images. The sparse coefficients are then used to classify the hyperspectral images via a linear SVM. Experiments show that our proposed method of dictionary-based, clustered sparse coefficients can create better representations of hyperspectral images, with a greater overall accuracy and a Kappa coefficient.

  4. Typology of Digital News Media: Theoretical Bases for their Classification

    Directory of Open Access Journals (Sweden)

    Ramón SALAVERRÍA

    2017-01-01

    Full Text Available Since their beginnings in the 1990s, digital news media have undergone a process of settlement and diversification. As a result, the prolific classification of online media has become increasingly rich and complex. Based on a review of media typologies, this article proposes some theoretical bases for the distinction of the online media from previous media and, above all, for the differentiation of the various types of online media among then. With that purpose, nine typologic criteria are proposed: 1 platform, 2 temporality, 3 topic, 4 reach, 5 ownership, 6 authorship, 7 focus, 8 economic purpose, and 9 dynamism.

  5. Analytical models and system topologies for remote multispectral data acquisition and classification

    Science.gov (United States)

    Huck, F. O.; Park, S. K.; Burcher, E. E.; Kelly, W. L., IV

    1978-01-01

    Simple analytical models are presented of the radiometric and statistical processes that are involved in multispectral data acquisition and classification. Also presented are basic system topologies which combine remote sensing with data classification. These models and topologies offer a preliminary but systematic step towards the use of computer simulations to analyze remote multispectral data acquisition and classification systems.

  6. Classification of inflationary models and constraints on fundamental physics

    CERN Document Server

    Pieroni, Mauro

    2016-01-01

    This work is focused on the study of early time cosmology and in particular on the study of inflation. After an introduction on the standard Big Bang theory, we discuss the physics of CMB and we explain how its observations can be used to set constraints on cosmological models. We introduce inflation and we carry out its simplest realization by presenting the observables and the experimental constraints that can be set on inflationary models. The possibility of observing primordial gravitational waves (GWs) produced during inflation is discussed. We present the reasons to define a classification of inflationary models and introduce the \\beta-function formalism for inflation by explaining why in this framework we can naturally define a set of universality classes for inflationary models. Theoretical motivations to support the formulation of inflation in terms of this formalism are presented. Some generalized models of inflation are introduced and the extension of the \\beta-function formalism for inflation to t...

  7. Classification of body movements based on posturographic data.

    Science.gov (United States)

    Saripalle, Sashi K; Paiva, Gavin C; Cliett, Thomas C; Derakhshani, Reza R; King, Gregory W; Lovelace, Christopher T

    2014-02-01

    The human body, standing on two feet, produces a continuous sway pattern. Intended movements, sensory cues, emotional states, and illnesses can all lead to subtle changes in sway appearing as alterations in ground reaction forces and the body's center of pressure (COP). The purpose of this study is to demonstrate that carefully selected COP parameters and classification methods can differentiate among specific body movements while standing, providing new prospects in camera-free motion identification. Force platform data were collected from participants performing 11 choreographed postural and gestural movements. Twenty-three different displacement- and frequency-based features were extracted from COP time series, and supplied to classification-guided feature extraction modules. For identification of movement type, several linear and nonlinear classifiers were explored; including linear discriminants, nearest neighbor classifiers, and support vector machines. The average classification rates on previously unseen test sets ranged from 67% to 100%. Within the context of this experiment, no single method was able to uniformly outperform the others for all movement types, and therefore a set of movement-specific features and classifiers is recommended.

  8. Risk Classification and Risk-based Safety and Mission Assurance

    Science.gov (United States)

    Leitner, Jesse A.

    2014-01-01

    Recent activities to revamp and emphasize the need to streamline processes and activities for Class D missions across the agency have led to various interpretations of Class D, including the lumping of a variety of low-cost projects into Class D. Sometimes terms such as Class D minus are used. In this presentation, mission risk classifications will be traced to official requirements and definitions as a measure to ensure that projects and programs align with the guidance and requirements that are commensurate for their defined risk posture. As part of this, the full suite of risk classifications, formal and informal will be defined, followed by an introduction to the new GPR 8705.4 that is currently under review.GPR 8705.4 lays out guidance for the mission success activities performed at the Classes A-D for NPR 7120.5 projects as well as for projects not under NPR 7120.5. Furthermore, the trends in stepping from Class A into higher risk posture classifications will be discussed. The talk will conclude with a discussion about risk-based safety and mission assuranceat GSFC.

  9. Spectral classification of stars based on LAMOST spectra

    CERN Document Server

    Liu, Chao; Zhang, Bo; Wan, Jun-Chen; Deng, Li-Cai; Hou, Yonghui; Wang, Yuefei; Yang, Ming; Zhang, Yong

    2015-01-01

    In this work, we select the high signal-to-noise ratio spectra of stars from the LAMOST data andmap theirMK classes to the spectral features. The equivalentwidths of the prominent spectral lines, playing the similar role as the multi-color photometry, form a clean stellar locus well ordered by MK classes. The advantage of the stellar locus in line indices is that it gives a natural and continuous classification of stars consistent with either the broadly used MK classes or the stellar astrophysical parameters. We also employ a SVM-based classification algorithm to assignMK classes to the LAMOST stellar spectra. We find that the completenesses of the classification are up to 90% for A and G type stars, while it is down to about 50% for OB and K type stars. About 40% of the OB and K type stars are mis-classified as A and G type stars, respectively. This is likely owe to the difference of the spectral features between the late B type and early A type stars or between the late G and early K type stars are very we...

  10. Spatial uncertainty modeling of fuzzy information in images for pattern classification.

    Directory of Open Access Journals (Sweden)

    Tuan D Pham

    Full Text Available The modeling of the spatial distribution of image properties is important for many pattern recognition problems in science and engineering. Mathematical methods are needed to quantify the variability of this spatial distribution based on which a decision of classification can be made in an optimal sense. However, image properties are often subject to uncertainty due to both incomplete and imprecise information. This paper presents an integrated approach for estimating the spatial uncertainty of vagueness in images using the theory of geostatistics and the calculus of probability measures of fuzzy events. Such a model for the quantification of spatial uncertainty is utilized as a new image feature extraction method, based on which classifiers can be trained to perform the task of pattern recognition. Applications of the proposed algorithm to the classification of various types of image data suggest the usefulness of the proposed uncertainty modeling technique for texture feature extraction.

  11. An Approach for Leukemia Classification Based on Cooperative Game Theory

    Directory of Open Access Journals (Sweden)

    Atefeh Torkaman

    2011-01-01

    Full Text Available Hematological malignancies are the types of cancer that affect blood, bone marrow and lymph nodes. As these tissues are naturally connected through the immune system, a disease affecting one of them will often affect the others as well. The hematological malignancies include; Leukemia, Lymphoma, Multiple myeloma. Among them, leukemia is a serious malignancy that starts in blood tissues especially the bone marrow, where the blood is made. Researches show, leukemia is one of the common cancers in the world. So, the emphasis on diagnostic techniques and best treatments would be able to provide better prognosis and survival for patients. In this paper, an automatic diagnosis recommender system for classifying leukemia based on cooperative game is presented. Through out this research, we analyze the flow cytometry data toward the classification of leukemia into eight classes. We work on real data set from different types of leukemia that have been collected at Iran Blood Transfusion Organization (IBTO. Generally, the data set contains 400 samples taken from human leukemic bone marrow. This study deals with cooperative game used for classification according to different weights assigned to the markers. The proposed method is versatile as there are no constraints to what the input or output represent. This means that it can be used to classify a population according to their contributions. In other words, it applies equally to other groups of data. The experimental results show the accuracy rate of 93.12%, for classification and compared to decision tree (C4.5 with (90.16% in accuracy. The result demonstrates that cooperative game is very promising to be used directly for classification of leukemia as a part of Active Medical decision support system for interpretation of flow cytometry readout. This system could assist clinical hematologists to properly recognize different kinds of leukemia by preparing suggestions and this could improve the treatment

  12. Content Based Image Retrieval : Classification Using Neural Networks

    Directory of Open Access Journals (Sweden)

    Shereena V.B

    2014-10-01

    Full Text Available In a content-based image retrieval system (CBIR, the main issue is to extract the image features that effectively represent the image contents in a database. Such an extraction requires a detailed evaluation of retrieval performance of image features. This paper presents a review of fundamental aspects of content based image retrieval including feature extraction of color and texture features. Commonly used color features including color moments, color histogram and color correlogram and Gabor texture are compared. The paper reviews the increase in efficiency of image retrieval when the color and texture features are combined. The similarity measures based on which matches are made and images are retrieved are also discussed. For effective indexing and fast searching of images based on visual features, neural network based pattern learning can be used to achieve effective classification.

  13. Content Based Image Retrieval : Classification Using Neural Networks

    Directory of Open Access Journals (Sweden)

    Shereena V.B

    2014-11-01

    Full Text Available In a content-based image retrieval system (CBIR, the main issue is to extract the image features that effectively represent the image contents in a database. Such an extraction requires a detailed evaluation of retrieval performance of image features. This paper presents a review of fundamental aspects of content based image retrieval including feature extraction of color and texture features. Commonly used color features including color moments, color histogram and color correlogram and Gabor texture are compared. The paper reviews the increase in efficiency of image retrieval when the color and texture features are combined. The similarity measures based on which matches are made and images are retrieved are also discussed. For effective indexing and fast searching of images based on visual features, neural network based pattern learning can be used to achieve effective classification.

  14. A minimum spanning forest based classification method for dedicated breast CT images

    Energy Technology Data Exchange (ETDEWEB)

    Pike, Robert [Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, Georgia 30329 (United States); Sechopoulos, Ioannis [Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, Georgia 30329 and Winship Cancer Institute of Emory University, Atlanta, Georgia 30322 (United States); Fei, Baowei, E-mail: bfei@emory.edu [Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, Georgia 30329 (United States); Department of Biomedical Engineering, Emory University and Georgia Institute of Technology, Atlanta, Georgia 30322 (United States); Department of Mathematics and Computer Science, Emory University, Atlanta, Georgia 30322 (United States); Winship Cancer Institute of Emory University, Atlanta, Georgia 30322 (United States)

    2015-11-15

    Purpose: To develop and test an automated algorithm to classify different types of tissue in dedicated breast CT images. Methods: Images of a single breast of five different patients were acquired with a dedicated breast CT clinical prototype. The breast CT images were processed by a multiscale bilateral filter to reduce noise while keeping edge information and were corrected to overcome cupping artifacts. As skin and glandular tissue have similar CT values on breast CT images, morphologic processing is used to identify the skin based on its position information. A support vector machine (SVM) is trained and the resulting model used to create a pixelwise classification map of fat and glandular tissue. By combining the results of the skin mask with the SVM results, the breast tissue is classified as skin, fat, and glandular tissue. This map is then used to identify markers for a minimum spanning forest that is grown to segment the image using spatial and intensity information. To evaluate the authors’ classification method, they use DICE overlap ratios to compare the results of the automated classification to those obtained by manual segmentation on five patient images. Results: Comparison between the automatic and the manual segmentation shows that the minimum spanning forest based classification method was able to successfully classify dedicated breast CT image with average DICE ratios of 96.9%, 89.8%, and 89.5% for fat, glandular, and skin tissue, respectively. Conclusions: A 2D minimum spanning forest based classification method was proposed and evaluated for classifying the fat, skin, and glandular tissue in dedicated breast CT images. The classification method can be used for dense breast tissue quantification, radiation dose assessment, and other applications in breast imaging.

  15. Risk Classification Model for Design and Build Projects

    Directory of Open Access Journals (Sweden)

    O. E. Ogunsanmi

    2011-07-01

    Full Text Available The purpose of this paper is to investigate if the various risk sources in Design and Build projects can be classified into three risk groups of cost, time and quality using the discriminant analysis technique. Literature search was undertaken to review issues of risk sources, classification of the identified risks into a risk structure, management of risks and effects of risks all on Design and Build projects as well as concepts of discriminant analysis as a statistical technique. This literature review was undertaken through the use of internet, published papers, journal articles and other published reports on risks in Design and Build projects. A research questionnaire was further designed to collect research information. This research study is a survey research that utilized cross-sectional design to capture the primary data. The data for the survey was collected in Nigeria. In all 40 questionnaires were sent to various respondents that included Architects, Engineers, Quantity Surveyors and Builders who had used Design and Build procurement method for their recently completed projects. Responses from these retrieved questionnaires that measured the impact of risks on Design and Build were analyzed using the discriminant analysis technique through the use of SPSS software package to build two discriminant models for classifying risks into cost, time and quality risk groups. Results of the study indicate that time overrun and poor quality are the two factors that discriminate between cost, time and quality related risk groups. These two discriminant functions explain the variation between the risk groups. All the discriminating variables of cost overrun, time overrun and poor quality demonstrate some relationships with the two discriminant functions. The two discriminant models built can classify risks in Design and Build projects into risk groups of cost, time and quality. These classifications models have 72% success rate of classification

  16. Active Build-Model Random Forest Method for Network Traffic Classification

    Directory of Open Access Journals (Sweden)

    Alhamza Munther

    2014-05-01

    Full Text Available Network traffic classification continues to be an interesting subject among numerous networking communities. This method introduces multi-beneficial solutions in different avenues, such as network security, network management, anomaly detection, and quality-of-service. In this paper, we propose a supervised machine learning method that efficiently classifies different types of applications using the Active Build-Model Random Forest (ABRF method. This method constructs a new build model for the original Random Forest (RF method to decrease processing time. This build model includes only the active trees (i.e., trees with high accuracy, whereas the passive trees are excluded from the forest. The passive trees were excluded without any negative effect on classification accuracy. Results show that the ABRF method decreases the processing time by up to 37.5% compared with the original RF method. Our model has an overall accuracy of 98.66% based on the benchmark dataset considered in this paper.

  17. Using decision models to decompose anxiety-related bias in threat classification.

    Science.gov (United States)

    White, Corey N; Skokin, Kimberly; Carlos, Brandon; Weaver, Alexandria

    2016-03-01

    Individuals with high levels of anxiety show preferential processing of threatening information, and this cognitive bias is thought to be an integral component of anxiety disorders. In threat classification tasks, this bias manifests as high-anxiety participants being more likely to classify stimuli as threatening than their low-anxiety counterparts. However, it is unclear which cognitive mechanisms drive this bias in threat classification. To better understand this phenomenon, threat classification data were analyzed with 2 decision models: a signal detection model and a drift-diffusion model. Signal detection models can dissociate measures of discriminability and bias, and diffusion models can further dissociate bias due to response preparation from bias due to stimulus evaluation. Individuals in the study completed a trait anxiety measure and classified threatening and neutral words based on whether they deemed them threatening. Signal detection analysis showed that high-anxiety participants had a bias driven by a weaker threat criterion than low-anxiety participants, but no differences in discriminability. Drift-diffusion analysis further decomposed the threat bias to show that it is driven by both an expectation bias that the threat response was more likely to be correct, and a stimulus bias driven by a weaker criterion for evaluating the stimuli under consideration. These model-based analyses provide valuable insight and show that multiple cognitive mechanisms underlie differential threat processing in anxiety. Implications for theories of anxiety are discussed.

  18. Application of Bayesian Classification to Content-Based Data Management

    Science.gov (United States)

    Lynnes, Christopher; Berrick, S.; Gopalan, A.; Hua, X.; Shen, S.; Smith, P.; Yang, K-Y.; Wheeler, K.; Curry, C.

    2004-01-01

    The high volume of Earth Observing System data has proven to be challenging to manage for data centers and users alike. At the Goddard Earth Sciences Distributed Active Archive Center (GES DAAC), about 1 TB of new data are archived each day. Distribution to users is also about 1 TB/day. A substantial portion of this distribution is MODIS calibrated radiance data, which has a wide variety of uses. However, much of the data is not useful for a particular user's needs: for example, ocean color users typically need oceanic pixels that are free of cloud and sun-glint. The GES DAAC is using a simple Bayesian classification scheme to rapidly classify each pixel in the scene in order to support several experimental content-based data services for near-real-time MODIS calibrated radiance products (from Direct Readout stations). Content-based subsetting would allow distribution of, say, only clear pixels to the user if desired. Content-based subscriptions would distribute data to users only when they fit the user's usability criteria in their area of interest within the scene. Content-based cache management would retain more useful data on disk for easy online access. The classification may even be exploited in an automated quality assessment of the geolocation product. Though initially to be demonstrated at the GES DAAC, these techniques have applicability in other resource-limited environments, such as spaceborne data systems.

  19. Concept Association and Hierarchical Hamming Clustering Model in Text Classification

    Institute of Scientific and Technical Information of China (English)

    Su Gui-yang; Li Jian-hua; Ma Ying-hua; Li Sheng-hong; Yin Zhong-hang

    2004-01-01

    We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among keywords in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality.

  20. 基于词对主题模型的中分辨率遥感影像土地利用分类%Biterm topic model-based land use classification of moderate-resolution remote sensing images

    Institute of Scientific and Technical Information of China (English)

    邵华; 李杨; 丁远; 刘凤臣

    2016-01-01

    利用遥感影像数据进行土地利用/覆被分类是多学科共同关注的热点问题,但传统自动分类方法仍然难以满足应用需求,以隐狄利克雷分配模型(latent dirichlet allocation,LDA)为代表的概率主题模型能够建立底层特征和高层语义之间的桥梁,近年来也被引入了遥感影像分析领域,但多集中于针对高空间分辨遥感影像的分析。该文分析了一般概率主题模型在遥感影像空间分辨率降低后面临的问题,在此基础上借鉴词对主题模型(biterm topic model,BTM)对单词稀疏文档的推理能力,将其引入中空间分辨率遥感影像的分类中,并提出使用空间相邻的视觉单词对作为模型的观测数据。试验结果表明,BTM模型的分类性能优于LDA模型,并且使用空间相邻视觉单词对可以比标准BTM模型使用更少的观测数据,取得更高的分类精度。%Land Use/Land Cover type automatic interpretation based on remote sensing data is one of the key problems in many relevant fields. Although a large number of image classification algorithms have been developed, most of them can hardly meet the application requirements. Probabilistic topic models, represented by Latent Dirichlet Allocation (LDA) model, have showed a great success in the field of natural language processing and image processing, which can be used to effectively overcome the gap between low-level features and high-level semantic. In recent years it have also been introduced into remote sensing image analysis field, while most of the researches focused on the analysis of high-resolution remote sensing images. Nonetheless, the moderate-resolution remote sensing data is one of the main sources in Land Use/Land Cover type automatic interpretation. The study analyzed the problem faced by traditional probabilistic topic models in reduced resolution remote sensing image analyzing, and pointed out that low segmentation scale made

  1. Object-based Dimensionality Reduction in Land Surface Phenology Classification

    Directory of Open Access Journals (Sweden)

    Brian E. Bunker

    2016-11-01

    Full Text Available Unsupervised classification or clustering of multi-decadal land surface phenology provides a spatio-temporal synopsis of natural and agricultural vegetation response to environmental variability and anthropogenic activities. Notwithstanding the detailed temporal information available in calibrated bi-monthly normalized difference vegetation index (NDVI and comparable time series, typical pre-classification workflows average a pixel’s bi-monthly index within the larger multi-decadal time series. While this process is one practical way to reduce the dimensionality of time series with many hundreds of image epochs, it effectively dampens temporal variation from both intra and inter-annual observations related to land surface phenology. Through a novel application of object-based segmentation aimed at spatial (not temporal dimensionality reduction, all 294 image epochs from a Moderate Resolution Imaging Spectroradiometer (MODIS bi-monthly NDVI time series covering the northern Fertile Crescent were retained (in homogenous landscape units as unsupervised classification inputs. Given the inherent challenges of in situ or manual image interpretation of land surface phenology classes, a cluster validation approach based on transformed divergence enabled comparison between traditional and novel techniques. Improved intra-annual contrast was clearly manifest in rain-fed agriculture and inter-annual trajectories showed increased cluster cohesion, reducing the overall number of classes identified in the Fertile Crescent study area from 24 to 10. Given careful segmentation parameters, this spatial dimensionality reduction technique augments the value of unsupervised learning to generate homogeneous land surface phenology units. By combining recent scalable computational approaches to image segmentation, future work can pursue new global land surface phenology products based on the high temporal resolution signatures of vegetation index time series.

  2. Coordinate Descent Based Hierarchical Interactive Lasso Penalized Logistic Regression and Its Application to Classification Problems

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2014-01-01

    Full Text Available We present the hierarchical interactive lasso penalized logistic regression using the coordinate descent algorithm based on the hierarchy theory and variables interactions. We define the interaction model based on the geometric algebra and hierarchical constraint conditions and then use the coordinate descent algorithm to solve for the coefficients of the hierarchical interactive lasso model. We provide the results of some experiments based on UCI datasets, Madelon datasets from NIPS2003, and daily activities of the elder. The experimental results show that the variable interactions and hierarchy contribute significantly to the classification. The hierarchical interactive lasso has the advantages of the lasso and interactive lasso.

  3. GSM-MRF based classification approach for real-time moving object detection

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Statistical and contextual information are typically used to detect moving regions in image sequences for a fixed camera. In this paper, we propose a fast and stable linear discriminant approach based on Gaussian Single Model (GSM) and Markov Random Field (MRF). The performance of GSM is analyzed first, and then two main improvements corresponding to the drawbacks of GSM are proposed: the latest filtered data based update scheme of the background model and the linear classification judgment rule based on spatial-temporal feature specified by MRF. Experimental results show that the proposed method runs more rapidly and accurately when compared with other methods.

  4. A Categorical Framework for Model Classification in the Geosciences

    Science.gov (United States)

    Hauhs, Michael; Trancón y Widemann, Baltasar; Lange, Holger

    2016-04-01

    Models have a mixed record of success in the geosciences. In meteorology, model development and implementation has been among the first and most successful examples of triggering computer technology in science. On the other hand, notorious problems such as the 'equifinality issue' in hydrology lead to a rather mixed reputation of models in other areas. The most successful models in geosciences are applications of dynamic systems theory to non-living systems or phenomena. Thus, we start from the hypothesis that the success of model applications relates to the influence of life on the phenomenon under study. We thus focus on the (formal) representation of life in models. The aim is to investigate whether disappointment in model performance is due to system properties such as heterogeneity and historicity of ecosystems, or rather reflects an abstraction and formalisation problem at a fundamental level. As a formal framework for this investigation, we use category theory as applied in computer science to specify behaviour at an interface. Its methods have been developed for translating and comparing formal structures among different application areas and seems highly suited for a classification of the current "model zoo" in the geosciences. The approach is rather abstract, with a high degree of generality but a low level of expressibility. Here, category theory will be employed to check the consistency of assumptions about life in different models. It will be shown that it is sufficient to distinguish just four logical cases to check for consistency of model content. All four cases can be formalised as variants of coalgebra-algebra homomorphisms. It can be demonstrated that transitions between the four variants affect the relevant observations (time series or spatial maps), the formalisms used (equations, decision trees) and the test criteria of success (prediction, classification) of the resulting model types. We will present examples from hydrology and ecology in

  5. Feature-Opinion Pairs Classification Based on Dependency Relations and Maximum Entropy Model%基于依存关系和最大熵的特征-情感对分类

    Institute of Scientific and Technical Information of China (English)

    张磊; 李珊; 彭舰; 陈黎; 黎红友

    2014-01-01

    In recent years, feature-opinion pairs classification of Chinese product review is one of the most important research field in Web data mining technology. In this paper, five types of Chinese dependency relationships for product review have been concluded based on the traditional English dependency grammar. The maximum entropy model is used to predict the opinion-relevant product feature relations. To train the model, a set of feature symbol combinations have been designed by means of Chinese dependency. The experiment result shows that the recall and F-score of our approach could reach 78.68%and 75.36%respectively, which is clearly superior to Hu’s adjacent based method and Popesecu’s pattern based method.%中文产品评论特征词与关联的情感词的分类是观点挖掘的重要研究内容之一。该文改进了英文依存关系语法,总结出5种常用的中文产品评论依存关系;利用最大熵模型进行训练,设计了基于依存关系的复合特征模板。实验证明,应用该复合模板进行特征-情感对的提取,系统的查全率和F-score相比于传统方法,分别提高到78.68%和75.36%。

  6. Generalization performance of graph-based semisupervised classification

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    Semi-supervised learning has been of growing interest over the past few years and many methods have been proposed. Although various algorithms are provided to implement semi-supervised learning,there are still gaps in our understanding of the dependence of generalization error on the numbers of labeled and unlabeled data. In this paper,we consider a graph-based semi-supervised classification algorithm and establish its generalization error bounds. Our results show the close relations between the generalization performance and the structural invariants of data graph.

  7. Hydrophobicity classification of polymeric materials based on fractal dimension

    Directory of Open Access Journals (Sweden)

    Daniel Thomazini

    2008-12-01

    Full Text Available This study proposes a new method to obtain hydrophobicity classification (HC in high voltage polymer insulators. In the method mentioned, the HC was analyzed by fractal dimension (fd and its processing time was evaluated having as a goal the application in mobile devices. Texture images were created from spraying solutions produced of mixtures of isopropyl alcohol and distilled water in proportions, which ranged from 0 to 100% volume of alcohol (%AIA. Based on these solutions, the contact angles of the drops were measured and the textures were used as patterns for fractal dimension calculations.

  8. Radar Image Texture Classification based on Gabor Filter Bank

    OpenAIRE

    Mbainaibeye Jérôme; Olfa Marrakchi Charfi

    2014-01-01

    The aim of this paper is to design and develop a filter bank for the detection and classification of radar image texture with 4.6m resolution obtained by airborne synthetic Aperture Radar. The textures of this kind of images are more correlated and contain forms with random disposition. The design and the developing of the filter bank is based on Gabor filter. We have elaborated a set of filters applied to each set of feature texture allowing its identification and enhancement in comparison w...

  9. Likelihood ratio model for classification of forensic evidence

    Energy Technology Data Exchange (ETDEWEB)

    Zadora, G., E-mail: gzadora@ies.krakow.pl [Institute of Forensic Research, Westerplatte 9, 31-033 Krakow (Poland); Neocleous, T., E-mail: tereza@stats.gla.ac.uk [University of Glasgow, Department of Statistics, 15 University Gardens, Glasgow G12 8QW (United Kingdom)

    2009-05-29

    One of the problems of analysis of forensic evidence such as glass fragments, is the determination of their use-type category, e.g. does a glass fragment originate from an unknown window or container? Very small glass fragments arise during various accidents and criminal offences, and could be carried on the clothes, shoes and hair of participants. It is therefore necessary to obtain information on their physicochemical composition in order to solve the classification problem. Scanning Electron Microscopy coupled with an Energy Dispersive X-ray Spectrometer and the Glass Refractive Index Measurement method are routinely used in many forensic institutes for the investigation of glass. A natural form of glass evidence evaluation for forensic purposes is the likelihood ratio-LR = p(E|H{sub 1})/p(E|H{sub 2}). The main aim of this paper was to study the performance of LR models for glass object classification which considered one or two sources of data variability, i.e. between-glass-object variability and(or) within-glass-object variability. Within the proposed model a multivariate kernel density approach was adopted for modelling the between-object distribution and a multivariate normal distribution was adopted for modelling within-object distributions. Moreover, a graphical method of estimating the dependence structure was employed to reduce the highly multivariate problem to several lower-dimensional problems. The performed analysis showed that the best likelihood model was the one which allows to include information about between and within-object variability, and with variables derived from elemental compositions measured by SEM-EDX, and refractive values determined before (RI{sub b}) and after (RI{sub a}) the annealing process, in the form of dRI = log{sub 10}|RI{sub a} - RI{sub b}|. This model gave better results than the model with only between-object variability considered. In addition, when dRI and variables derived from elemental compositions were used, this

  10. Development of a definition, classification system, and model for cultural geology

    Science.gov (United States)

    Mitchell, Lloyd W., III

    The concept for this study is based upon a personal interest by the author, an American Indian, in promoting cultural perspectives in undergraduate college teaching and learning environments. Most academicians recognize that merged fields can enhance undergraduate curricula. However, conflict may occur when instructors attempt to merge social science fields such as history or philosophy with geoscience fields such as mining and geomorphology. For example, ideologies of Earth structures derived from scientific methodologies may conflict with historical and spiritual understandings of Earth structures held by American Indians. Specifically, this study addresses the problem of how to combine cultural studies with the geosciences into a new merged academic discipline called cultural geology. This study further attempts to develop the merged field of cultural geology using an approach consisting of three research foci: a definition, a classification system, and a model. Literature reviews were conducted for all three foci. Additionally, to better understand merged fields, a literature review was conducted specifically for academic fields that merged social and physical sciences. Methodologies concentrated on the three research foci: definition, classification system, and model. The definition was derived via a two-step process. The first step, developing keyword hierarchical ranking structures, was followed by creating and analyzing semantic word meaning lists. The classification system was developed by reviewing 102 classification systems and incorporating selected components into a system framework. The cultural geology model was created also utilizing a two-step process. A literature review of scientific models was conducted. Then, the definition and classification system were incorporated into a model felt to reflect the realm of cultural geology. A course syllabus was then developed that incorporated the resulting definition, classification system, and model. This

  11. A Method for Data Classification Based on Discernibility Matrix and Discernibility Function

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    A method for data classification will influence the efficiency of classification. Attributes reduction based on discernibility matrix and discernibility function in rough sets can use in data classification, so we put forward a method for data classification. Namely, firstly, we use discernibility matrix and discernibility function to delete superfluous attributes in formation system and get a necessary attribute set. Secondly, we delete superfluous attribute values and get decision rules. Finally, we classify data by means of decision rules. The experiments show that data classification using this method is simpler in the structure, and can improve the efficiency of classification.

  12. Task Classification Based Energy-Aware Consolidation in Clouds

    Directory of Open Access Journals (Sweden)

    HeeSeok Choi

    2016-01-01

    Full Text Available We consider a cloud data center, in which the service provider supplies virtual machines (VMs on hosts or physical machines (PMs to its subscribers for computation in an on-demand fashion. For the cloud data center, we propose a task consolidation algorithm based on task classification (i.e., computation-intensive and data-intensive and resource utilization (e.g., CPU and RAM. Furthermore, we design a VM consolidation algorithm to balance task execution time and energy consumption without violating a predefined service level agreement (SLA. Unlike the existing research on VM consolidation or scheduling that applies none or single threshold schemes, we focus on a double threshold (upper and lower scheme, which is used for VM consolidation. More specifically, when a host operates with resource utilization below the lower threshold, all the VMs on the host will be scheduled to be migrated to other hosts and then the host will be powered down, while when a host operates with resource utilization above the upper threshold, a VM will be migrated to avoid using 100% of resource utilization. Based on experimental performance evaluations with real-world traces, we prove that our task classification based energy-aware consolidation algorithm (TCEA achieves a significant energy reduction without incurring predefined SLA violations.

  13. Feature selection gait-based gender classification under different circumstances

    Science.gov (United States)

    Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

    2014-05-01

    This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.

  14. Vertebral classification using localized pathology-related shape model

    Science.gov (United States)

    Zewail, R.; Elsafi, A.; Durdle, N.

    2008-03-01

    Radiographs of the spine are frequently examined for assessment of vertebral abnormalities. Features like osteophytes (bony growth of vertebra's corners), and disc space narrowing are often used as visual evidence of osteoarthris or degenerative joint disease. These symptoms result in remarkable changes in the shapes of the vertebral body. Statistical analysis of anatomical structure has recently gained increased popularity within the medical imaging community, since they have the potential to enhance the automated diagnosis process. In this paper, we present a novel method for computer-assisted vertebral classification using a localized, pathology-related shape model. The new classification scheme is able to assess the condition of multiple vertebrae simultaneously, hence is possible to directly classify the whole spine anatomy according to the condition of interest (anterior osteophites). At the core of this method is a new localized shape model that uses concepts of sparsity, dimension reduction, and statistical independence to extract sets of localized modes of deformations specific to each of the vertebrae under investigation. By projection of the shapes onto any specific set of deformation modes (or basis), we obtain low-dimensional features that are most directly related to the pathology of the vertebra of interest. These features are then used as input to a support vector machine classifier to classify the vertebra under investigation as normal or upnormal. Experiments are conducted using contours from digital x-ray images of five vertebrae of lumbar spine. The accuracy of the classification scheme is assessed using the ROC curves. An average specifity of 96.8 % is achieved with a sensitivity of 80 %.

  15. Histotype-based prognostic classification of gastric cancer

    Institute of Scientific and Technical Information of China (English)

    Anna Maria Chiaravalli; Catherine Klersy; Alessandro Vanoli; Andrea Ferretti; Carlo Capella; Enrico Solcia

    2012-01-01

    AIM:To test the efficiency of a recently proposed histotype-based grading system in a consecutive series of gastric cancers.METHOIS:Two hundred advanced gastric cancers operated upon in 1980-1987 and followed for a median 159 mo were investigated on hematoxylin-eosinstained sections to identify low-grade [muconodular,well differentiated tubular,diffuse desmoplastic and high lymphoid response (HLR)],high-grade (anaplastic and mucinous invasive) and intermediate-grade (ordinarycohesive,diffuse and mucinous) cancers,in parallel with a previously investigated series of 292 cases.In addition,immunohistochemical analyses for CD8,CD11 and HLA-DR antigens,pancytokeratin and podoplanin,as well as immunohistochemical and molecular tests for microsatellite DNA instability and in situ hybridization for the Epstein-Barr virus (EBV) EBER1 gene were performed.Patient survival was assessed with death rates per 100 person-years and with Kaplan-Meier or Cox model estimates.RESULTS:Collectively,the four low-grade histotypes accounted for 22% and the two high-grade histotypes for 7% of the consecutive cancers investigated,while the remaining 71% of cases were intermediate-grade cancers,with highly significant,stage-independent,survival differences among the three tumor grades (P =0.004 for grade 1 vs 2 and P =0.0019 for grade 2 vs grade 3),thus confirming the results in the original series.A combined analysis of 492 cases showed an improved prognostic value of histotype-based grading compared with the Lauren classification.In addition,it allowed better characterization of rare histotypes,particularly the three subsets of prognostically different mucinous neoplasms,of which 10 ordinary mucinous cancers showed stage-inclusive survival worse than that of 20 muconodular (P =0.037) and better than that of 21 high-grade (P < 0.001) cases.Tumors with high-level microsatellite DNA instability(MSI-H) or EBV infection,together with a third subset negative for both conditions,formed the

  16. Optimization based tumor classification from microarray gene expression data.

    Directory of Open Access Journals (Sweden)

    Onur Dagliyan

    Full Text Available BACKGROUND: An important use of data obtained from microarray measurements is the classification of tumor types with respect to genes that are either up or down regulated in specific cancer types. A number of algorithms have been proposed to obtain such classifications. These algorithms usually require parameter optimization to obtain accurate results depending on the type of data. Additionally, it is highly critical to find an optimal set of markers among those up or down regulated genes that can be clinically utilized to build assays for the diagnosis or to follow progression of specific cancer types. In this paper, we employ a mixed integer programming based classification algorithm named hyper-box enclosure method (HBE for the classification of some cancer types with a minimal set of predictor genes. This optimization based method which is a user friendly and efficient classifier may allow the clinicians to diagnose and follow progression of certain cancer types. METHODOLOGY/PRINCIPAL FINDINGS: We apply HBE algorithm to some well known data sets such as leukemia, prostate cancer, diffuse large B-cell lymphoma (DLBCL, small round blue cell tumors (SRBCT to find some predictor genes that can be utilized for diagnosis and prognosis in a robust manner with a high accuracy. Our approach does not require any modification or parameter optimization for each data set. Additionally, information gain attribute evaluator, relief attribute evaluator and correlation-based feature selection methods are employed for the gene selection. The results are compared with those from other studies and biological roles of selected genes in corresponding cancer type are described. CONCLUSIONS/SIGNIFICANCE: The performance of our algorithm overall was better than the other algorithms reported in the literature and classifiers found in WEKA data-mining package. Since it does not require a parameter optimization and it performs consistently very high prediction rate on

  17. Scene classification of infrared images based on texture feature

    Science.gov (United States)

    Zhang, Xiao; Bai, Tingzhu; Shang, Fei

    2008-12-01

    Scene Classification refers to as assigning a physical scene into one of a set of predefined categories. Utilizing the method texture feature is good for providing the approach to classify scenes. Texture can be considered to be repeating patterns of local variation of pixel intensities. And texture analysis is important in many applications of computer image analysis for classification or segmentation of images based on local spatial variations of intensity. Texture describes the structural information of images, so it provides another data to classify comparing to the spectrum. Now, infrared thermal imagers are used in different kinds of fields. Since infrared images of the objects reflect their own thermal radiation, there are some shortcomings of infrared images: the poor contrast between the objectives and background, the effects of blurs edges, much noise and so on. Because of these shortcomings, it is difficult to extract to the texture feature of infrared images. In this paper we have developed an infrared image texture feature-based algorithm to classify scenes of infrared images. This paper researches texture extraction using Gabor wavelet transform. The transformation of Gabor has excellent capability in analysis the frequency and direction of the partial district. Gabor wavelets is chosen for its biological relevance and technical properties In the first place, after introducing the Gabor wavelet transform and the texture analysis methods, the infrared images are extracted texture feature by Gabor wavelet transform. It is utilized the multi-scale property of Gabor filter. In the second place, we take multi-dimensional means and standard deviation with different scales and directions as texture parameters. The last stage is classification of scene texture parameters with least squares support vector machine (LS-SVM) algorithm. SVM is based on the principle of structural risk minimization (SRM). Compared with SVM, LS-SVM has overcome the shortcoming of

  18. HETEAC: The Aerosol Classification Model for EarthCARE

    Directory of Open Access Journals (Sweden)

    Wandinger Ulla

    2016-01-01

    Full Text Available We introduce the Hybrid End-To-End Aerosol Classification (HETEAC model for the upcoming EarthCARE mission. The model serves as the common baseline for development, evaluation, and implementation of EarthCARE algorithms. It shall ensure the consistency of different aerosol products from the multi-instrument platform as well as facilitate the conform specification of broad-band optical properties necessary for the EarthCARE radiative closure efforts. The hybrid approach ensures the theoretical description of aerosol microphysics consistent with the optical properties of various aerosol types known from observations. The end-to-end model permits the uniform representation of aerosol types in terms of microphysical, optical and radiative properties.

  19. Optimal Non-Invasive Fault Classification Model for Packaged Ceramic Tile Quality Monitoring Using MMW Imaging

    Science.gov (United States)

    Agarwal, Smriti; Singh, Dharmendra

    2016-04-01

    Millimeter wave (MMW) frequency has emerged as an efficient tool for different stand-off imaging applications. In this paper, we have dealt with a novel MMW imaging application, i.e., non-invasive packaged goods quality estimation for industrial quality monitoring applications. An active MMW imaging radar operating at 60 GHz has been ingeniously designed for concealed fault estimation. Ceramic tiles covered with commonly used packaging cardboard were used as concealed targets for undercover fault classification. A comparison of computer vision-based state-of-the-art feature extraction techniques, viz, discrete Fourier transform (DFT), wavelet transform (WT), principal component analysis (PCA), gray level co-occurrence texture (GLCM), and histogram of oriented gradient (HOG) has been done with respect to their efficient and differentiable feature vector generation capability for undercover target fault classification. An extensive number of experiments were performed with different ceramic tile fault configurations, viz., vertical crack, horizontal crack, random crack, diagonal crack along with the non-faulty tiles. Further, an independent algorithm validation was done demonstrating classification accuracy: 80, 86.67, 73.33, and 93.33 % for DFT, WT, PCA, GLCM, and HOG feature-based artificial neural network (ANN) classifier models, respectively. Classification results show good capability for HOG feature extraction technique towards non-destructive quality inspection with appreciably low false alarm as compared to other techniques. Thereby, a robust and optimal image feature-based neural network classification model has been proposed for non-invasive, automatic fault monitoring for a financially and commercially competent industrial growth.

  20. Web entity extraction based on entity attribute classification

    Science.gov (United States)

    Li, Chuan-Xi; Chen, Peng; Wang, Ru-Jing; Su, Ya-Ru

    2011-12-01

    The large amount of entity data are continuously published on web pages. Extracting these entities automatically for further application is very significant. Rule-based entity extraction method yields promising result, however, it is labor-intensive and hard to be scalable. The paper proposes a web entity extraction method based on entity attribute classification, which can avoid manual annotation of samples. First, web pages are segmented into different blocks by algorithm Vision-based Page Segmentation (VIPS), and a binary classifier LibSVM is trained to retrieve the candidate blocks which contain the entity contents. Second, the candidate blocks are partitioned into candidate items, and the classifiers using LibSVM are performed for the attributes annotation of the items and then the annotation results are aggregated into an entity. Results show that the proposed method performs well to extract agricultural supply and demand entities from web pages.

  1. Purchase-oriented Classification Model of the Spare Parts of Agricultural Machinery

    Institute of Scientific and Technical Information of China (English)

    2011-01-01

    Based on the classification of spare parts and the research results of the demand of spare parts,a three-dimensional classification model of spare parts of agricultural machinery is established,which includes the application axis sorted by technical characteristics,the cost axis classified by ABC method,and the demand axis classified by the demand of the spare parts of agricultural machinery.These dimension axes represent different factors,and the application of factors in purchase is analyzed.Guiding value of each dimension axis is summarized in the field of the spare parts purchase;and corresponding strategy instruction is put forward.Integrated application of these strategies by model makes the purchase have more realistic operational meaning.Application field of the three-dimensional model of spare parts is discussed;and the direction for further research is pointed out.

  2. Polarimetric SAR Data for Urban Land Cover Classification Using Finite Mixture Model

    Science.gov (United States)

    Mahdianpari, Masoud; Akbari, Vahid; Mohammadimanesh, Fariba; Alioghli Fazel, Mohammad

    2013-04-01

    Image classification techniques play an important role in automatic analysis of remote sensing data. This paper demonstrates the potential of polarimetric synthetic aperture radar (PolSAR) for urban land cover mapping using an unsupervised classification approach. Analysis of PolSAR images often shows that non-Gaussian models give better representation of the scattering vector statistics. Hence, processing algorithms based on non-Gaussian statistics should improve performance, compared to complex Gaussian distributions. Several distributions could be used to model SAR image texture with different spatial correlation properties and various degrees of inhomogeneity [1-3]. Statistical properties are widely used for image segmentation and land cover classification of PolSAR data. The pixel-based approaches cluster individual pixels through analysis of their statistical properties. Those methods work well on the relatively coarse spatial resolution images. But classification results based on pixelwise analysis demonstrate the pepper-salt effect of speckle in medium and high resolution applications such as urban area monitoring [4]. Therefore, the expected improvement of the classification results is hindered by the increase of textural differences within a class. In such situation, enhancement could be made through exploring the contextual correlation among pixels by Markov random field (MRF) models [4, 5]. The potential of MRF models to retrieve spatial contextual information is desired to improve the accuracy and reliability of image classification. Unsupervised contextual polarimetric SAR image segmentation is addressed by combining statistical modeling and spatial context within an MRF framework. We employ the stochastic expectation maximization (SEM) algorithm [6] to jointly perform clustering of the data and parameter estimation of the statistical distribution conditioned to each image cluster and the MRF model. This classification method is applied on medium

  3. Soft computing based feature selection for environmental sound classification

    NARCIS (Netherlands)

    Shakoor, A.; May, T.M.; Van Schijndel, N.H.

    2010-01-01

    Environmental sound classification has a wide range of applications,like hearing aids, mobile communication devices, portable media players, and auditory protection devices. Sound classification systemstypically extract features from the input sound. Using too many features increases complexity unne

  4. ECG-based heartbeat classification for arrhythmia detection: A survey.

    Science.gov (United States)

    Luz, Eduardo José da S; Schwartz, William Robson; Cámara-Chávez, Guillermo; Menotti, David

    2016-04-01

    An electrocardiogram (ECG) measures the electric activity of the heart and has been widely used for detecting heart diseases due to its simplicity and non-invasive nature. By analyzing the electrical signal of each heartbeat, i.e., the combination of action impulse waveforms produced by different specialized cardiac tissues found in the heart, it is possible to detect some of its abnormalities. In the last decades, several works were developed to produce automatic ECG-based heartbeat classification methods. In this work, we survey the current state-of-the-art methods of ECG-based automated abnormalities heartbeat classification by presenting the ECG signal preprocessing, the heartbeat segmentation techniques, the feature description methods and the learning algorithms used. In addition, we describe some of the databases used for evaluation of methods indicated by a well-known standard developed by the Association for the Advancement of Medical Instrumentation (AAMI) and described in ANSI/AAMI EC57:1998/(R)2008 (ANSI/AAMI, 2008). Finally, we discuss limitations and drawbacks of the methods in the literature presenting concluding remarks and future challenges, and also we propose an evaluation process workflow to guide authors in future works.

  5. Understanding Acupuncture Based on ZHENG Classification from System Perspective

    Directory of Open Access Journals (Sweden)

    Junwei Fang

    2013-01-01

    Full Text Available Acupuncture is an efficient therapy method originated in ancient China, the study of which based on ZHENG classification is a systematic research on understanding its complexity. The system perspective is contributed to understand the essence of phenomena, and, as the coming of the system biology era, broader technology platforms such as omics technologies were established for the objective study of traditional chinese medicine (TCM. Omics technologies could dynamically determine molecular components of various levels, which could achieve a systematic understanding of acupuncture by finding out the relationships of various response parts. After reviewing the literature of acupuncture studied by omics approaches, the following points were found. Firstly, with the help of omics approaches, acupuncture was found to be able to treat diseases by regulating the neuroendocrine immune (NEI network and the change of which could reflect the global effect of acupuncture. Secondly, the global effect of acupuncture could reflect ZHENG information at certain structure and function levels, which might reveal the mechanism of Meridian and Acupoint Specificity. Furthermore, based on comprehensive ZHENG classification, omics researches could help us understand the action characteristics of acupoints and the molecular mechanisms of their synergistic effect.

  6. Gear Crack Level Classification Based on EMD and EDT

    Directory of Open Access Journals (Sweden)

    Haiping Li

    2015-01-01

    Full Text Available Gears are the most essential parts in rotating machinery. Crack fault is one of damage modes most frequently occurring in gears. So, this paper deals with the problem of different crack levels classification. The proposed method is mainly based on empirical mode decomposition (EMD and Euclidean distance technique (EDT. First, vibration signal acquired by accelerometer is processed by EMD and intrinsic mode functions (IMFs are obtained. Then, a correlation coefficient based method is proposed to select the sensitive IMFs which contain main gear fault information. And energy of these IMFs is chosen as the fault feature by comparing with kurtosis and skewness. Finally, Euclidean distances between test sample and four classes trained samples are calculated, and on this basis, fault level classification of the test sample can be made. The proposed approach is tested and validated through a gearbox experiment, in which four crack levels and three kinds of loads are utilized. The results show that the proposed method has high accuracy rates in classifying different crack levels and may be adaptive to different conditions.

  7. 78 FR 58153 - Prevailing Rate Systems; North American Industry Classification System Based Federal Wage System...

    Science.gov (United States)

    2013-09-23

    ... RIN 3206-AM78 Prevailing Rate Systems; North American Industry Classification System Based Federal... Industry Classification System (NAICS) codes currently used in Federal Wage System wage survey industry... 2007 North American Industry Classification System (NAICS) codes used in Federal Wage System (FWS)...

  8. 78 FR 18252 - Prevailing Rate Systems; North American Industry Classification System Based Federal Wage System...

    Science.gov (United States)

    2013-03-26

    ... Industry Classification System Based Federal Wage System Wage Surveys AGENCY: U. S. Office of Personnel... is issuing a proposed rule that would update the 2007 North American Industry Classification System... North American Industry Classification System (NAICS) codes used in Federal Wage System (FWS)...

  9. [Hyperspectral remote sensing image classification based on radical basis function neural network].

    Science.gov (United States)

    Tan, Kun; Du, Pei-jun

    2008-09-01

    Based on the radial basis function neural network (RBFNN) theory and the specialty of hyperspectral remote sensing data, the effective feature extraction model was designed, and those extracted features were connected to the input layer of RBFNN, finally the classifier based on radial basis function neural network was constructed. The hyperspectral image with 64 bands of OMIS II made by Chinese was experimented, and the case study area was zhongguancun in Beijing. Minimum noise fraction (MNF) was conducted, and the former 20 components were extracted for further processing. The original data (20 dimension) of extraction by MNF, the texture transformation data (20 dimension) extracted from the former 20 components after MNF, and the principal component analysis data (20 dimension) of extraction were combined to 60 dimension. For classification by RBFNN, the sizes of training samples were less than 6.13% of the whole image. That classifier has a simple structure and fast convergence capacity, and can be easily trained. The classification precision of radial basis function neural network classifier is up to 69.27% in contrast with the 51.20% of back propagation neural network (BPNN) and 40. 88% of traditional minimum distance classification (MDC), so RBFNN classifier performs better than the other three classifiers. It proves that RBFNN is of validity in hyperspectral remote sensing classification.

  10. The high-density lipoprotein-adjusted SCORE model worsens SCORE-based risk classification in a contemporary population of 30,824 Europeans

    DEFF Research Database (Denmark)

    Mortensen, Martin B; Afzal, Shoaib; Nordestgaard, Børge G;

    2015-01-01

    discrimination of future fatal CVD, compared with SCORE, but decreased the detection rate (sensitivity) of the 5% high-risk threshold from 42 to 26%, yielding a negative net reclassification index (NRI) of -12%. Importantly, using SCORE-HDL, the sensitivity was zero among women. Both SCORE and SCORE.......8 years of follow-up, 339 individuals died of CVD. In the SCORE target population (age 40-65; n = 30,824), fewer individuals were at baseline categorized as high risk (≥5% 10-year risk of fatal CVD) using SCORE-HDL compared with SCORE (10 vs. 17% in men, 1 vs. 3% in women). SCORE-HDL did not improve......-HDL overestimated risk of fatal CVD. In well-calibrated models developed from the CGPS, HDL did not improve discrimination or NRI. Lowering the decision threshold from 5 to 1% led to progressive gain in NRI for both CVD mortality and morbidity. CONCLUSION: SCORE-HDL did not improve discrimination compared...

  11. Comparison Effectiveness of Pixel Based Classification and Object Based Classification Using High Resolution Image In Floristic Composition Mapping (Study Case: Gunung Tidar Magelang City)

    Science.gov (United States)

    Ardha Aryaguna, Prama; Danoedoro, Projo

    2016-11-01

    Developments of analysis remote sensing have same way with development of technology especially in sensor and plane. Now, a lot of image have high spatial and radiometric resolution, that's why a lot information. Vegetation object analysis such floristic composition got a lot advantage of that development. Floristic composition can be interpreted using a lot of method such pixel based classification and object based classification. The problems for pixel based method on high spatial resolution image are salt and paper who appear in result of classification. The purpose of this research are compare effectiveness between pixel based classification and object based classification for composition vegetation mapping on high resolution image Worldview-2. The results show that pixel based classification using majority 5×5 kernel windows give the highest accuracy between another classifications. The highest accuracy is 73.32% from image Worldview-2 are being radiometric corrected level surface reflectance, but for overall accuracy in every class, object based are the best between another methods. Reviewed from effectiveness aspect, pixel based are more effective then object based for vegetation composition mapping in Tidar forest.

  12. Kernel-based machine learning techniques for infrasound signal classification

    Science.gov (United States)

    Tuma, Matthias; Igel, Christian; Mialle, Pierrick

    2014-05-01

    Infrasound monitoring is one of four remote sensing technologies continuously employed by the CTBTO Preparatory Commission. The CTBTO's infrasound network is designed to monitor the Earth for potential evidence of atmospheric or shallow underground nuclear explosions. Upon completion, it will comprise 60 infrasound array stations distributed around the globe, of which 47 were certified in January 2014. Three stages can be identified in CTBTO infrasound data processing: automated processing at the level of single array stations, automated processing at the level of the overall global network, and interactive review by human analysts. At station level, the cross correlation-based PMCC algorithm is used for initial detection of coherent wavefronts. It produces estimates for trace velocity and azimuth of incoming wavefronts, as well as other descriptive features characterizing a signal. Detected arrivals are then categorized into potentially treaty-relevant versus noise-type signals by a rule-based expert system. This corresponds to a binary classification task at the level of station processing. In addition, incoming signals may be grouped according to their travel path in the atmosphere. The present work investigates automatic classification of infrasound arrivals by kernel-based pattern recognition methods. It aims to explore the potential of state-of-the-art machine learning methods vis-a-vis the current rule-based and task-tailored expert system. To this purpose, we first address the compilation of a representative, labeled reference benchmark dataset as a prerequisite for both classifier training and evaluation. Data representation is based on features extracted by the CTBTO's PMCC algorithm. As classifiers, we employ support vector machines (SVMs) in a supervised learning setting. Different SVM kernel functions are used and adapted through different hyperparameter optimization routines. The resulting performance is compared to several baseline classifiers. All

  13. Comparison of Cheng's Index-and SSR Marker-based Classification of Asian Cultivated Rice

    Institute of Scientific and Technical Information of China (English)

    WANG Cai-hong; XU Qun; YU Ping; YUAN Xiao-ping; YU Han-yong; WANG Yi-ping; TANG Sheng-xiang

    2013-01-01

    A total of 100 cultivated rice accessions,with a clear isozyme-based classification,were analyzed based on Cheng's index and simple sequence repeat (SSR) marker.The results showed that the isozyme-based classification was in high accordance with that based on Cheng's index and SSR markers.Mantel-test revealed that the Euclidean distance of Cheng's index was significantly correlated with Nei's unbiased genetic distance of SSR markers (r =0.466,P ≤ 0.01).According to the model-based group and cluster analysis,the Cheng's index-and SSR-based classification coincided with each other,with the goodness of fit of 82.1% and 84.7% in indica,97.4% and 95.1% in japonica,respectively,showing higher accordance than that within subspecies.Therefore,Cheng's index could be used to classify subspecies,while SSR marker could be more efficient to analyze the subgroups within subspecies.

  14. Hepatic CT Image Query Based on Threshold-based Classification Scheme with Gabor Features

    Institute of Scientific and Technical Information of China (English)

    JIANG Li-jun; LUO Yong-zing; ZHAO Jun; ZHUANG Tian-ge

    2008-01-01

    Hepatic computed tomography (CT) images with Gabor function were analyzed.Then a thresholdbased classification scheme was proposed using Gabor features and proceeded with the retrieval of the hepatic CT images.In our experiments,a batch of hepatic CT images containing several types of CT findings was used and compared with the Zhao's image classification scheme,support vector machines (SVM) scheme and threshold-based scheme.

  15. [Hyperspectral remote sensing image classification based on SVM optimized by clonal selection].

    Science.gov (United States)

    Liu, Qing-Jie; Jing, Lin-Hai; Wang, Meng-Fei; Lin, Qi-Zhong

    2013-03-01

    Model selection for support vector machine (SVM) involving kernel and the margin parameter values selection is usually time-consuming, impacts training efficiency of SVM model and final classification accuracies of SVM hyperspectral remote sensing image classifier greatly. Firstly, based on combinatorial optimization theory and cross-validation method, artificial immune clonal selection algorithm is introduced to the optimal selection of SVM (CSSVM) kernel parameter a and margin parameter C to improve the training efficiency of SVM model. Then an experiment of classifying AVIRIS in India Pine site of USA was performed for testing the novel CSSVM, as well as a traditional SVM classifier with general Grid Searching cross-validation method (GSSVM) for comparison. And then, evaluation indexes including SVM model training time, classification overall accuracy (OA) and Kappa index of both CSSVM and GSSVM were all analyzed quantitatively. It is demonstrated that OA of CSSVM on test samples and whole image are 85.1% and 81.58, the differences from that of GSSVM are both within 0.08% respectively; And Kappa indexes reach 0.8213 and 0.7728, the differences from that of GSSVM are both within 0.001; While the ratio of model training time of CSSVM and GSSVM is between 1/6 and 1/10. Therefore, CSSVM is fast and accurate algorithm for hyperspectral image classification and is superior to GSSVM.

  16. Variable Star Signature Classification using Slotted Symbolic Markov Modeling

    Science.gov (United States)

    Johnston, K. B.; Peter, A. M.

    2017-01-01

    With the advent of digital astronomy, new benefits and new challenges have been presented to the modern day astronomer. No longer can the astronomer rely on manual processing, instead the profession as a whole has begun to adopt more advanced computational means. This paper focuses on the construction and application of a novel time-domain signature extraction methodology and the development of a supporting supervised pattern classification algorithm for the identification of variable stars. A methodology for the reduction of stellar variable observations (time-domain data) into a novel feature space representation is introduced. The methodology presented will be referred to as Slotted Symbolic Markov Modeling (SSMM) and has a number of advantages which will be demonstrated to be beneficial; specifically to the supervised classification of stellar variables. It will be shown that the methodology outperformed a baseline standard methodology on a standardized set of stellar light curve data. The performance on a set of data derived from the LINEAR dataset will also be shown.

  17. Classification of knee arthropathy with accelerometer-based vibroarthrography.

    Science.gov (United States)

    Moreira, Dinis; Silva, Joana; Correia, Miguel V; Massada, Marta

    2016-01-01

    One of the most common knee joint disorders is known as osteoarthritis which results from the progressive degeneration of cartilage and subchondral bone over time, affecting essentially elderly adults. Current evaluation techniques are either complex, expensive, invasive or simply fails into detection of small and progressive changes that occur within the knee. Vibroarthrography appeared as a new solution where the mechanical vibratory signals arising from the knee are recorded recurring only to an accelerometer and posteriorly analyzed enabling the differentiation between a healthy and an arthritic joint. In this study, a vibration-based classification system was created using a dataset with 92 healthy and 120 arthritic segments of knee joint signals collected from 19 healthy and 20 arthritic volunteers, evaluated with k-nearest neighbors and support vector machine classifiers. The best classification was obtained using the k-nearest neighbors classifier with only 6 time-frequency features with an overall accuracy of 89.8% and with a precision, recall and f-measure of 88.3%, 92.4% and 90.1%, respectively. Preliminary results showed that vibroarthrography can be a promising, non-invasive and low cost tool that could be used for screening purposes. Despite this encouraging results, several upgrades in the data collection process and analysis can be further implemented.

  18. Pro duct Image Classification Based on Fusion Features

    Institute of Scientific and Technical Information of China (English)

    YANG Xiao-hui; LIU Jing-jing; YANG Li-jun

    2015-01-01

    Two key challenges raised by a product images classification system are classi-fication precision and classification time. In some categories, classification precision of the latest techniques, in the product images classification system, is still low. In this paper, we propose a local texture descriptor termed fan refined local binary pattern, which captures more detailed information by integrating the spatial distribution into the local binary pattern feature. We compare our approach with different methods on a subset of product images on Amazon/eBay and parts of PI100 and experimental results have demonstrated that our proposed approach is superior to the current existing methods. The highest classification precision is increased by 21%and the average classification time is reduced by 2/3.

  19. Hyperspectral image classification based on spatial and spectral features and sparse representation

    Institute of Scientific and Technical Information of China (English)

    Yang Jing-Hui; Wang Li-Guo; Qian Jin-Xi

    2014-01-01

    To minimize the low classification accuracy and low utilization of spatial information in traditional hyperspectral image classification methods, we propose a new hyperspectral image classification method, which is based on the Gabor spatial texture features and nonparametric weighted spectral features, and the sparse representation classification method (Gabor–NWSF and SRC), abbreviated GNWSF–SRC. The proposed (GNWSF–SRC) method first combines the Gabor spatial features and nonparametric weighted spectral features to describe the hyperspectral image, and then applies the sparse representation method. Finally, the classification is obtained by analyzing the reconstruction error. We use the proposed method to process two typical hyperspectral data sets with different percentages of training samples. Theoretical analysis and simulation demonstrate that the proposed method improves the classification accuracy and Kappa coefficient compared with traditional classification methods and achieves better classification performance.

  20. Software for automated classification of probe-based confocal laser endomicroscopy videos of colorectal polyps

    Institute of Scientific and Technical Information of China (English)

    Barbara André; Tom Vercauteren; Anna M Buchner; Murli Krishna; Nicholas Ayache; Michael B Wallace

    2012-01-01

    not a "black box" but an informative tool based on the query by example model that produces,as intermediate results,visually similar annotated videos that are directly interpretable by the endoscopist.CONCLUSION:The proposed software for automated classification of pCLE videos of colonic polyps achieves high performance,comparable to that of off-line diagnosis of pCLE videos established by expert endoscopists.

  1. A Method of Soil Salinization Information Extraction with SVM Classification Based on ICA and Texture Features

    Institute of Scientific and Technical Information of China (English)

    ZHANG Fei; TASHPOLAT Tiyip; KUNG Hsiang-te; DING Jian-li; MAMAT.Sawut; VERNER Johnson; HAN Gui-hong; GUI Dong-wei

    2011-01-01

    Salt-affected soils classification using remotely sensed images is one of the most common applications in remote sensing,and many algorithms have been developed and applied for this purpose in the literature.This study takes the Delta Oasis of Weigan and Kuqa Rivers as a study area and discusses the prediction of soil salinization from ETM+ Landsat data.It reports the Support Vector Machine(SVM) classification method based on Independent Component Analysis(ICA) and Texture features.Meanwhile,the letter introduces the fundamental theory of SVM algorithm and ICA,and then incorporates ICA and texture features.The classification result is compared with ICA-SVM classification,single data source SVM classification,maximum likelihood classification(MLC) and neural network classification qualitatively and quantitatively.The result shows that this method can effectively solve the problem of low accuracy and fracture classification result in single data source classification.It has high spread ability toward higher array input.The overall accuracy is 98.64%,which increases by 10.2% compared with maximum likelihood classification,even increases by 12.94% compared with neural net classification,and thus acquires good effectiveness.Therefore,the classification method based on SVM and incorporating the ICA and texture features can be adapted to RS image classification and monitoring of soil salinization.

  2. Radiological classification of renal angiomyolipomas based on 127 tumors

    Directory of Open Access Journals (Sweden)

    Prando Adilson

    2003-01-01

    Full Text Available PURPOSE: Demonstrate radiological findings of 127 angiomyolipomas (AMLs and propose a classification based on the radiological evidence of fat. MATERIALS AND METHODS: The imaging findings of 85 consecutive patients with AMLs: isolated (n = 73, multiple without tuberous sclerosis (TS (n = 4 and multiple with TS (n = 8, were retrospectively reviewed. Eighteen AMLs (14% presented with hemorrhage. All patients were submitted to a dedicated helical CT or magnetic resonance studies. All hemorrhagic and non-hemorrhagic lesions were grouped together since our objective was to analyze the presence of detectable fat. Out of 85 patients, 53 were monitored and 32 were treated surgically due to large perirenal component (n = 13, hemorrhage (n = 11 and impossibility of an adequate preoperative characterization (n = 8. There was not a case of renal cell carcinoma (RCC with fat component in this group of patients. RESULTS: Based on the presence and amount of detectable fat within the lesion, AMLs were classified in 4 distinct radiological patterns: Pattern-I, predominantly fatty (usually less than 2 cm in diameter and intrarenal: 54%; Pattern-II, partially fatty (intrarenal or exophytic: 29%; Pattern-III, minimally fatty (most exophytic and perirenal: 11%; and Pattern-IV, without fat (most exophytic and perirenal: 6%. CONCLUSIONS: This proposed classification might be useful to understand the imaging manifestations of AMLs, their differential diagnosis and determine when further radiological evaluation would be necessary. Small (< 1.5 cm, pattern-I AMLs tend to be intra-renal, homogeneous and predominantly fatty. As they grow they tend to be partially or completely exophytic and heterogeneous (patterns II and III. The rare pattern-IV AMLs, however, can be small or large, intra-renal or exophytic but are always homogeneous and hyperdense mass. Since no renal cell carcinoma was found in our series, from an evidence-based practice, all renal mass with detectable

  3. ANALYZING AVIATION SAFETY REPORTS: FROM TOPIC MODELING TO SCALABLE MULTI-LABEL CLASSIFICATION

    Data.gov (United States)

    National Aeronautics and Space Administration — ANALYZING AVIATION SAFETY REPORTS: FROM TOPIC MODELING TO SCALABLE MULTI-LABEL CLASSIFICATION AMRUDIN AGOVIC*, HANHUAI SHAN, AND ARINDAM BANERJEE Abstract. The...

  4. GIS—Based Red Soil Resources Classification and Evaluation

    Institute of Scientific and Technical Information of China (English)

    HUYUEMING; WANGRENCHAO; 等

    1999-01-01

    A small scale red soil resources information system(RSRIS) with applied mathematical models was developed and applied in red soil resources(RSR) classification and evaluation,taking Zhejiang Province,a typical distribution area of red soil,as the study area.Computer-aided overlay was conductied to classifty RSR types.The evaluation was carried out by using three methods,i.e.,index summation,square root of index multiplication and fuzzy comprehensive assessment,with almost identical results,The result of index summation could represent the basic qualitatie condition of RSR,that of square root of index miltiplication reflected the real condition of RSR qualitative rank,while fuzzy comprehensive assessment could satisfactorily handle the relationship between the evaluation factors and the qualitative rank of RSR,and therefore it is a feasible method for RSR evaluation.

  5. [Classification models of structure - P-glycoprotein activity of drugs].

    Science.gov (United States)

    Grigorev, V Yu; Solodova, S L; Polianczyk, D E; Raevsky, O A

    2016-01-01

    Thirty three classification models of substrate specificity of 177 drugs to P-glycoprotein have been created using of the linear discriminant analysis, random forest and support vector machine methods. QSAR modeling was carried out using 2 strategies. The first strategy consisted in search of all possible combinations from 1÷5 descriptors on the basis of 7 most significant molecular descriptors with clear physico-chemical interpretation. In the second case forward selection procedure up to 5 descriptors, starting from the best single descriptor was used. This strategy was applied to a set of 387 DRAGON descriptors. It was found that only one of 33 models has necessary statistical parameters. This model was designed by means of the linear discriminant analysis on the basis of a single descriptor of H-bond (ΣC(ad)). The model has good statistical characteristics as evidenced by results to both internal cross-validation, and external validation with application of 44 new chemicals. This confirms an important role of hydrogen bond in the processes connected with penetration of chemical compounds through a blood-brain barrier.

  6. One-class classification based on the convex hull for bearing fault detection

    Science.gov (United States)

    Zeng, Ming; Yang, Yu; Luo, Songrong; Cheng, Junsheng

    2016-12-01

    Originating from a nearest point problem, a novel method called one-class classification based on the convex hull (OCCCH) is proposed for one-class classification problems. The basic goal of OCCCH is to find the nearest point to the origin from the reduced convex hull of training samples. A generalized Gilbert algorithm is proposed to solve the nearest point problem. It is a geometric algorithm with high computational efficiency. OCCCH has two different forms, i.e., OCCCH-1 and OCCCH-2. The relationships among OCCCH-1, OCCCH-2 and one-class support vector machine (OCSVM) are investigated theoretically. The classification accuracy and the computational efficiency of the three methods are compared through the experiments conducted on several benchmark datasets. Experimental results show that OCCCH (including OCCCH-1 and OCCCH-2) using the generalized Gilbert algorithm performs more efficiently than OCSVM using the well-known sequential minimal optimization (SMO) algorithm; at the same time, OCCCH-2 can always obtain comparable classification accuracies to OCSVM. Finally, these methods are applied to the monitoring model constructions for bearing fault detection. Compared with OCCCH-2 and OCSVM, OCCCH-1 can significantly decrease the false alarm ratio while detecting the bearing fault successfully.

  7. A model for a multi-class classification machine

    Science.gov (United States)

    Rau, Albrecht; Nadal, Jean-Pierre

    1992-06-01

    We consider the properties of multi-class neural networks, where each neuron can be in several different states. The motivations for considering such systems are manifold. In image processing for example, the different states correspond to the different grey tone levels. Another multi-class classification task implemented on a feed-forward network is the analysis of DNA sequences or the prediction of the secondary structure of proteins from the sequence of amino acids. To investigate the behaviour of such systems, one specific dynamical rule - the “winner-take-all” rule - is studied. Gauge invariances of the model are analysed. For a multi-class perceptron with N Q-state input neurons and Q‧-state output neuron, the maximal number of patterns that can be stored in the large N limit is found to be proportional to N(Q - 1) ƒ(Q‧), where ƒ( Q‧) is a slowly increasing and bounded function of order 1.

  8. Fines Classification Based on Sensitivity to Pore-Fluid Chemistry

    KAUST Repository

    Jang, Junbong

    2015-12-28

    The 75-μm particle size is used to discriminate between fine and coarse grains. Further analysis of fine grains is typically based on the plasticity chart. Whereas pore-fluid-chemistry-dependent soil response is a salient and distinguishing characteristic of fine grains, pore-fluid chemistry is not addressed in current classification systems. Liquid limits obtained with electrically contrasting pore fluids (deionized water, 2-M NaCl brine, and kerosene) are combined to define the soil "electrical sensitivity." Liquid limit and electrical sensitivity can be effectively used to classify fine grains according to their fluid-soil response into no-, low-, intermediate-, or high-plasticity fine grains of low, intermediate, or high electrical sensitivity. The proposed methodology benefits from the accumulated experience with liquid limit in the field and addresses the needs of a broader range of geotechnical engineering problems. © ASCE.

  9. Improved Collaborative Filtering Recommendation Based on Classification and User Trust

    Institute of Scientific and Technical Information of China (English)

    Xiao-Lin Xu; Guang-Lin Xu

    2016-01-01

    When dealing with the ratings from users, traditional collaborative filtering algorithms do not consider the credibility of rating data, which affects the accuracy of similarity. To address this issue, the paper proposes an improved algorithm based on classification and user trust. It firstly classifies all the ratings by the categories of items. And then, for each category, it evaluates the trustworthy degree of each user on the category and imposes the degree on the ratings of the user. Finally, the algorithm explores the similarities between users, finds the nearest neighbors, and makes recommendations within each category. Simulations show that the improved algorithm outperforms the traditional collaborative filtering algorithms and enhances the accuracy of recommendation.

  10. The generalization ability of online SVM classification based on Markov sampling.

    Science.gov (United States)

    Xu, Jie; Yan Tang, Yuan; Zou, Bin; Xu, Zongben; Li, Luoqing; Lu, Yang

    2015-03-01

    In this paper, we consider online support vector machine (SVM) classification learning algorithms with uniformly ergodic Markov chain (u.e.M.c.) samples. We establish the bound on the misclassification error of an online SVM classification algorithm with u.e.M.c. samples based on reproducing kernel Hilbert spaces and obtain a satisfactory convergence rate. We also introduce a novel online SVM classification algorithm based on Markov sampling, and present the numerical studies on the learning ability of online SVM classification based on Markov sampling for benchmark repository. The numerical studies show that the learning performance of the online SVM classification algorithm based on Markov sampling is better than that of classical online SVM classification based on random sampling as the size of training samples is larger.

  11. Classification of cassava genotypes based on qualitative and quantitative data.

    Science.gov (United States)

    Oliveira, E J; Oliveira Filho, O S; Santos, V S

    2015-02-02

    We evaluated the genetic variation of cassava accessions based on qualitative (binomial and multicategorical) and quantitative traits (continuous). We characterized 95 accessions obtained from the Cassava Germplasm Bank of Embrapa Mandioca e Fruticultura; we evaluated these accessions for 13 continuous, 10 binary, and 25 multicategorical traits. First, we analyzed the accessions based only on quantitative traits; next, we conducted joint analysis (qualitative and quantitative traits) based on the Ward-MLM method, which performs clustering in two stages. According to the pseudo-F, pseudo-t2, and maximum likelihood criteria, we identified five and four groups based on quantitative trait and joint analysis, respectively. The smaller number of groups identified based on joint analysis may be related to the nature of the data. On the other hand, quantitative data are more subject to environmental effects in the phenotype expression; this results in the absence of genetic differences, thereby contributing to greater differentiation among accessions. For most of the accessions, the maximum probability of classification was >0.90, independent of the trait analyzed, indicating a good fit of the clustering method. Differences in clustering according to the type of data implied that analysis of quantitative and qualitative traits in cassava germplasm might explore different genomic regions. On the other hand, when joint analysis was used, the means and ranges of genetic distances were high, indicating that the Ward-MLM method is very useful for clustering genotypes when there are several phenotypic traits, such as in the case of genetic resources and breeding programs.

  12. A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms.

    Science.gov (United States)

    Şen, Baha; Peker, Musa; Çavuşoğlu, Abdullah; Çelebi, Fatih V

    2014-03-01

    Sleep scoring is one of the most important diagnostic methods in psychiatry and neurology. Sleep staging is a time consuming and difficult task undertaken by sleep experts. This study aims to identify a method which would classify sleep stages automatically and with a high degree of accuracy and, in this manner, will assist sleep experts. This study consists of three stages: feature extraction, feature selection from EEG signals, and classification of these signals. In the feature extraction stage, it is used 20 attribute algorithms in four categories. 41 feature parameters were obtained from these algorithms. Feature selection is important in the elimination of irrelevant and redundant features and in this manner prediction accuracy is improved and computational overhead in classification is reduced. Effective feature selection algorithms such as minimum redundancy maximum relevance (mRMR); fast correlation based feature selection (FCBF); ReliefF; t-test; and Fisher score algorithms are preferred at the feature selection stage in selecting a set of features which best represent EEG signals. The features obtained are used as input parameters for the classification algorithms. At the classification stage, five different classification algorithms (random forest (RF); feed-forward neural network (FFNN); decision tree (DT); support vector machine (SVM); and radial basis function neural network (RBF)) classify the problem. The results, obtained from different classification algorithms, are provided so that a comparison can be made between computation times and accuracy rates. Finally, it is obtained 97.03 % classification accuracy using the proposed method. The results show that the proposed method indicate the ability to design a new intelligent assistance sleep scoring system.

  13. Research into Financial Position of Listed Companies following Classification via Extreme Learning Machine Based upon DE Optimization

    Directory of Open Access Journals (Sweden)

    Fu Yu

    2016-01-01

    Full Text Available By means of the model of extreme learning machine based upon DE optimization, this article particularly centers on the optimization thinking of such a model as well as its application effect in the field of listed company’s financial position classification. It proves that the improved extreme learning machine algorithm based upon DE optimization eclipses the traditional extreme learning machine algorithm following comparison. Meanwhile, this article also intends to introduce certain research thinking concerning extreme learning machine into the economics classification area so as to fulfill the purpose of computerizing the speedy but effective evaluation of massive financial statements of listed companies pertain to different classes

  14. Organizational information assets classification model and security architecture methodology

    Directory of Open Access Journals (Sweden)

    Mostafa Tamtaji

    2015-12-01

    Full Text Available Today's, Organizations are exposed with huge and diversity of information and information assets that are produced in different systems shuch as KMS, financial and accounting systems, official and industrial automation sysytems and so on and protection of these information is necessary. Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released.several benefits of this model cuses that organization has a great trend to implementing Cloud computing. Maintaining and management of information security is the main challenges in developing and accepting of this model. In this paper, at first, according to "design science research methodology" and compatible with "design process at information systems research", a complete categorization of organizational assets, including 355 different types of information assets in 7 groups and 3 level, is presented to managers be able to plan corresponding security controls according to importance of each groups. Then, for directing of organization to architect it’s information security in cloud computing environment, appropriate methodology is presented. Presented cloud computing security architecture , resulted proposed methodology, and presented classification model according to Delphi method and expers comments discussed and verified.

  15. Multilingual Medical Documents Classification Based on MesH Domain Ontology

    Directory of Open Access Journals (Sweden)

    Elberrichi Zakaria

    2012-03-01

    Full Text Available This article deals with the semantic Web and ontologies. It addresses the issue of the classification of multilingual Web documents, based on domain ontology. The objective is being able, using a model, to classify documents in different languages. We will try to solve this problematic using two different approaches. The two approaches will have two elementary stages: the creation of the model using machine learning algorithms on a labeled corpus, then the classification of documents after detecting their languages and mapping their terms into the concepts of the language of reference (English. But each one will deal with the multilingualism with a different approach. One supposes the ontology is monolingual, whereas the other considers it multilingual. To show the feasibility and the importance of our work, we implemented it on a domain that attracts nowadays a lot of attention from the data mining community: the biomedical domain. The selected documents are from the biomedical benchmark corpus Ohsumed, and the associated ontology is the thesaurus MeSH (Medical Subject Headings. The main idea in our work is a new document representation, the masterpiece of all good classification, based on concept. The experimental results show that the recommended ideas are promising.

  16. Multiscale modeling for classification of SAR imagery using hybrid EM algorithm and genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    Xianbin Wen; Hua Zhang; Jianguang Zhang; Xu Jiao; Lei Wang

    2009-01-01

    A novel method that hybridizes genetic algorithm (GA) and expectation maximization (EM) algorithm for the classification of syn-thetic aperture radar (SAR) imagery is proposed by the finite Gaussian mixtures model (GMM) and multiscale autoregressive (MAR)model. This algorithm is capable of improving the global optimality and consistency of the classification performance. The experiments on the SAR images show that the proposed algorithm outperforms the standard EM method significantly in classification accuracy.

  17. Automated segmentation of atherosclerotic histology based on pattern classification

    Directory of Open Access Journals (Sweden)

    Arna van Engelen

    2013-01-01

    Full Text Available Background: Histology sections provide accurate information on atherosclerotic plaque composition, and are used in various applications. To our knowledge, no automated systems for plaque component segmentation in histology sections currently exist. Materials and Methods: We perform pixel-wise classification of fibrous, lipid, and necrotic tissue in Elastica Von Gieson-stained histology sections, using features based on color channel intensity and local image texture and structure. We compare an approach where we train on independent data to an approach where we train on one or two sections per specimen in order to segment the remaining sections. We evaluate the results on segmentation accuracy in histology, and we use the obtained histology segmentations to train plaque component classification methods in ex vivo Magnetic resonance imaging (MRI and in vivo MRI and computed tomography (CT. Results: In leave-one-specimen-out experiments on 176 histology slices of 13 plaques, a pixel-wise accuracy of 75.7 ± 6.8% was obtained. This increased to 77.6 ± 6.5% when two manually annotated slices of the specimen to be segmented were used for training. Rank correlations of relative component volumes with manually annotated volumes were high in this situation (P = 0.82-0.98. Using the obtained histology segmentations to train plaque component classification methods in ex vivo MRI and in vivo MRI and CT resulted in similar image segmentations for training on the automated histology segmentations as for training on a fully manual ground truth. The size of the lipid-rich necrotic core was significantly smaller when training on fully automated histology segmentations than when manually annotated histology sections were used. This difference was reduced and not statistically significant when one or two slices per section were manually annotated for histology segmentation. Conclusions: Good histology segmentations can be obtained by automated segmentation

  18. Multimodal Classification of Mild Cognitive Impairment Based on Partial Least Squares.

    Science.gov (United States)

    Wang, Pingyue; Chen, Kewei; Yao, Li; Hu, Bin; Wu, Xia; Zhang, Jiacai; Ye, Qing; Guo, Xiaojuan

    2016-08-10

    In recent years, increasing attention has been given to the identification of the conversion of mild cognitive impairment (MCI) to Alzheimer's disease (AD). Brain neuroimaging techniques have been widely used to support the classification or prediction of MCI. The present study combined magnetic resonance imaging (MRI), 18F-fluorodeoxyglucose PET (FDG-PET), and 18F-florbetapir PET (florbetapir-PET) to discriminate MCI converters (MCI-c, individuals with MCI who convert to AD) from MCI non-converters (MCI-nc, individuals with MCI who have not converted to AD in the follow-up period) based on the partial least squares (PLS) method. Two types of PLS models (informed PLS and agnostic PLS) were built based on 64 MCI-c and 65 MCI-nc from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The results showed that the three-modality informed PLS model achieved better classification accuracy of 81.40%, sensitivity of 79.69%, and specificity of 83.08% compared with the single-modality model, and the three-modality agnostic PLS model also achieved better classification compared with the two-modality model. Moreover, combining the three modalities with clinical test score (ADAS-cog), the agnostic PLS model (independent data: florbetapir-PET; dependent data: FDG-PET and MRI) achieved optimal accuracy of 86.05%, sensitivity of 81.25%, and specificity of 90.77%. In addition, the comparison of PLS, support vector machine (SVM), and random forest (RF) showed greater diagnostic power of PLS. These results suggested that our multimodal PLS model has the potential to discriminate MCI-c from the MCI-nc and may therefore be helpful in the early diagnosis of AD.

  19. Entropy-based gene ranking without selection bias for the predictive classification of microarray data

    Directory of Open Access Journals (Sweden)

    Serafini Maria

    2003-11-01

    Full Text Available Abstract Background We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process. Results With E-RFE, we speed up the recursive feature elimination (RFE with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Conclusions Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

  20. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  1. An innovative blazar classification based on radio jet kinematics

    Science.gov (United States)

    Hervet, O.; Boisson, C.; Sol, H.

    2016-07-01

    Context. Blazars are usually classified following their synchrotron peak frequency (νF(ν) scale) as high, intermediate, low frequency peaked BL Lacs (HBLs, IBLs, LBLs), and flat spectrum radio quasars (FSRQs), or, according to their radio morphology at large scale, FR I or FR II. However, the diversity of blazars is such that these classes seem insufficient to chart the specific properties of each source. Aims: We propose to classify a wide sample of blazars following the kinematic features of their radio jets seen in very long baseline interferometry (VLBI). Methods: For this purpose we use public data from the MOJAVE collaboration in which we select a sample of blazars with known redshift and sufficient monitoring to constrain apparent velocities. We selected 161 blazars from a sample of 200 sources. We identify three distinct classes of VLBI jets depending on radio knot kinematics: class I with quasi-stationary knots, class II with knots in relativistic motion from the radio core, and class I/II, intermediate, showing quasi-stationary knots at the jet base and relativistic motions downstream. Results: A notable result is the good overlap of this kinematic classification with the usual spectral classification; class I corresponds to HBLs, class II to FSRQs, and class I/II to IBLs/LBLs. We deepen this study by characterizing the physical parameters of jets from VLBI radio data. Hence we focus on the singular case of the class I/II by the study of the blazar BL Lac itself. Finally we show how the interpretation that radio knots are recollimation shocks is fully appropriate to describe the characteristics of these three classes.

  2. Evaluation of Air Quality Zone Classification Methods Based on Ambient Air Concentration Exposure.

    Science.gov (United States)

    Freeman, Brian; McBean, Ed; Gharabaghi, Bahram; Thé, Jesse

    2016-12-14

    Air quality zones are used by regulatory authorities to implement ambient air standards in order to protect human health. Air quality measurements at discrete air monitoring stations are critical tools to determine whether an air quality zone complies with local air quality standards or is noncompliant. This study presents a novel approach for evaluation of air quality zone classification methods by breaking the concentration distribution of a pollutant measured at an air monitoring station into compliance and exceedance probability density functions (PDFs) and then using Monte Carlo analysis with the Central Limit Theorem to estimate long-term exposure. The purpose of this paper is to compare the risk associated with selecting one ambient air classification approach over another by testing the possible exposure an individual living within a zone may face. The chronic daily intake (CDI) is utilized to compare different pollutant exposures over the classification duration of 3 years between two classification methods. Historical data collected from air monitoring stations in Kuwait are used to build representative models of 1-hr NO2 and 8-hr O3 within a zone that meets the compliance requirements of each method. The first method, the "3 Strike" method, is a conservative approach based on a winner-take-all approach common with most compliance classification methods, while the second, the 99% Rule method, allows for more robust analyses and incorporates long-term trends. A Monte Carlo analysis is used to model the CDI for each pollutant and each method with the zone at a single station and with multiple stations. The model assumes that the zone is already in compliance with air quality standards over the 3 years under the different classification methodologies. The model shows that while the CDI of the two methods differs by 2.7% over the exposure period for the single station case, the large number of samples taken over the duration period impacts the sensitivity of

  3. Research and Application of Human Capital Strategic Classification Tool: Human Capital Classification Matrix Based on Biological Natural Attribute

    Directory of Open Access Journals (Sweden)

    Yong Liu

    2014-12-01

    Full Text Available In order to study the causes of weak human capital structure strategic classification management in China, we analyze that enterprises around the world face increasingly difficult for human capital management. In order to provide strategically sound answers, the HR managers need the critical information provided by the right technology processing and analytical tools. In this study, there are different types and levels of human capital in formal organization management, which is not the same contribution to a formal organization. An important guarantee for sustained and healthy development of the formal or informal organization is lower human capital risk. To resist this risk is primarily dependent on human capital hedge force and appreciation force in value, which is largely dependent on the strategic value of the performance of senior managers. Based on the analysis of high-level managers perspective, we also discuss the value and configuration of principles and methods to be followed in human capital strategic classification based on Boston Consulting Group (BCG matrix and build Human Capital Classification (HCC matrix based on biological natural attribute to effectively realize human capital structure strategic classification.

  4. Automated classification of mouse pup isolation syllables: from cluster analysis to an Excel-based "mouse pup syllable classification calculator".

    Science.gov (United States)

    Grimsley, Jasmine M S; Gadziola, Marie A; Wenstrup, Jeffrey J

    2012-01-01

    Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified 10 syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.

  5. Binary classification of dyslipidemia from the waist-to-hip ratio and body mass index: a comparison of linear, logistic, and CART models

    Directory of Open Access Journals (Sweden)

    Paccaud Fred

    2004-04-01

    Full Text Available Abstract Background We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. Methods Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i linear regression; (ii logistic classification; (iii regression trees; (iv classification trees (iii and iv are collectively known as "CART". Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. Results Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60–80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. Conclusions There were no striking differences between either the algebraic (i, ii vs. non-algebraic (iii, iv, or the regression (i, iii vs. classification (ii, iv modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.

  6. Support vector machine based classification and mapping of atherosclerotic plaques using fluorescence lifetime imaging (Conference Presentation)

    Science.gov (United States)

    Fatakdawala, Hussain; Gorpas, Dimitris S.; Bec, Julien; Ma, Dinglong M.; Yankelevich, Diego R.; Bishop, John W.; Marcu, Laura

    2016-02-01

    The progression of atherosclerosis in coronary vessels involves distinct pathological changes in the vessel wall. These changes manifest in the formation of a variety of plaque sub-types. The ability to detect and distinguish these plaques, especially thin-cap fibroatheromas (TCFA) may be relevant for guiding percutaneous coronary intervention as well as investigating new therapeutics. In this work we demonstrate the ability of fluorescence lifetime imaging (FLIm) derived parameters (lifetime values from sub-bands 390/40 nm, 452/45 nm and 542/50 nm respectively) for generating classification maps for identifying eight different atherosclerotic plaque sub-types in ex vivo human coronary vessels. The classification was performed using a support vector machine based classifier that was built from data gathered from sixteen coronary vessels in a previous study. This classifier was validated in the current study using an independent set of FLIm data acquired from four additional coronary vessels with a new rotational FLIm system. Classification maps were compared to co-registered histological data. Results show that the classification maps allow identification of the eight different plaque sub-types despite the fact that new data was gathered with a different FLIm system. Regions with diffuse intimal thickening (n=10), fibrotic tissue (n=2) and thick-cap fibroatheroma (n=1) were correctly identified on the classification map. The ability to identify different plaque types using FLIm data alone may serve as a powerful clinical and research tool for studying atherosclerosis in animal models as well as in humans.

  7. A Neural-Network-Based Semi-Automated Geospatial Classification Tool

    Science.gov (United States)

    Hale, R. G.; Herzfeld, U. C.

    2014-12-01

    North America's largest glacier system, the Bering Bagley Glacier System (BBGS) in Alaska, surged in 2011-2013, as shown by rapid mass transfer, elevation change, and heavy crevassing. Little is known about the physics controlling surge glaciers' semi-cyclic patterns; therefore, it is crucial to collect and analyze as much data as possible so that predictive models can be made. In addition, physical signs frozen in ice in the form of crevasses may help serve as a warning for future surges. The BBGS surge provided an opportunity to develop an automated classification tool for crevasse classification based on imagery collected from small aircraft. The classification allows one to link image classification to geophysical processes associated with ice deformation. The tool uses an approach that employs geostatistical functions and a feed-forward perceptron with error back-propagation. The connectionist-geostatistical approach uses directional experimental (discrete) variograms to parameterize images into a form that the Neural Network (NN) can recognize. In an application to preform analysis on airborne video graphic data from the surge of the BBGS, an NN was able to distinguish 18 different crevasse classes with 95 percent or higher accuracy, for over 3,000 images. Recognizing that each surge wave results in different crevasse types and that environmental conditions affect the appearance in imagery, we designed the tool's semi-automated pre-training algorithm to be adaptable. The tool can be optimized to specific settings and variables of image analysis: (airborne and satellite imagery, different camera types, observation altitude, number and types of classes, and resolution). The generalization of the classification tool brings three important advantages: (1) multiple types of problems in geophysics can be studied, (2) the training process is sufficiently formalized to allow non-experts in neural nets to perform the training process, and (3) the time required to

  8. Hyperspectral remote sensing image classification based on decision level fusion

    Institute of Scientific and Technical Information of China (English)

    Peijun Du; Wei Zhang; Junshi Xia

    2011-01-01

    @@ To apply decision level fusion to hyperspectral remote sensing (HRS) image classification, three decision level fusion strategies are experimented on and compared, namely, linear consensus algorithm, improved evidence theory, and the proposed support vector machine (SVM) combiner.To evaluate the effects of the input features on classification performance, four schemes are used to organize input features for member classifiers.In the experiment, by using the operational modular imaging spectrometer (OMIS) II HRS image, the decision level fusion is shown as an effective way for improving the classification accuracy of the HRS image, and the proposed SVM combiner is especially suitable for decision level fusion.The results also indicate that the optimization of input features can improve the classification performance.%To apply decision level fusion to hyperspectral remote sensing (HRS) image classification, three decision level fusion strategies are experimented on and compared, namely, linear consensus algorithm, improved evidence theory, and the proposed support vector machine (SVM) combiner. To evaluate the effects of the input features on classification performance, four schemes are used to organize input features for member classifiers. In the experiment, by using the operational modular imaging spectrometer (OMIS) Ⅱ HRS image, the decision level fusion is shown as an effective way for improving the classification accuracy of the HRS image, and the proposed SVM combiner is especially suitable for decision level fusion. The results also indicate that the optimization of input features can improve the classification performance.

  9. Text Passage Retrieval Based on Colon Classification: Retrieval Performance.

    Science.gov (United States)

    Shepherd, Michael A.

    1981-01-01

    Reports the results of experiments using colon classification for the analysis, representation, and retrieval of primary information from the full text of documents. Recall, precision, and search length measures indicate colon classification did not perform significantly better than Boolean or simple word occurrence systems. Thirteen references…

  10. Text Classification Retrieval Based on Complex Network and ICA Algorithm

    Directory of Open Access Journals (Sweden)

    Hongxia Li

    2013-08-01

    Full Text Available With the development of computer science and information technology, the library is developing toward information and network. The library digital process converts the book into digital information. The high-quality preservation and management are achieved by computer technology as well as text classification techniques. It realizes knowledge appreciation. This paper introduces complex network theory in the text classification process and put forwards the ICA semantic clustering algorithm. It realizes the independent component analysis of complex network text classification. Through the ICA clustering algorithm of independent component, it realizes character words clustering extraction of text classification. The visualization of text retrieval is improved. Finally, we make a comparative analysis of collocation algorithm and ICA clustering algorithm through text classification and keyword search experiment. The paper gives the clustering degree of algorithm and accuracy figure. Through simulation analysis, we find that ICA clustering algorithm increases by 1.2% comparing with text classification clustering degree. Accuracy can be improved by 11.1% at most. It improves the efficiency and accuracy of text classification retrieval. It also provides a theoretical reference for text retrieval classification of eBook

  11. The ARMA model's pole characteristics of Doppler signals from the carotid artery and their classification application

    Institute of Scientific and Technical Information of China (English)

    CHEN Xi; WANG Yuanyuan; ZHANG Yu; WANG Weiqi

    2002-01-01

    In order to diagnose the cerebral infarction, a classification system based onthe ARMA model and BP (Back-Propagation) neural network is presented to analyzeblood flow Doppler signals from the carotid artery. In this system, an ARMA modelis first used to analyze the audio Doppler blood flow signals from the carotid artery.Then several characteristic parameters of the pole's distribution are estimated. Afterstudies of these characteristic parameters' sensitivity to the textcolor cerebral infarctiondiagnosis, a BP neural network using sensitive parameters is established to classify thenormal or abnormal state of the cerebral vessel. With 474 cases used to establish theappropriate neural network, and 52 cases used to test the network, the results showthat the correct classification rate of both training and testing are over 94%. Thus thissystem is useful to diagnose the cerebral infarction.

  12. A spectral-spatial kernel-based method for hyperspectral imagery classification

    Science.gov (United States)

    Li, Li; Ge, Hongwei; Gao, Jianqiang

    2017-02-01

    Spectral-based classification methods have gained increasing attention in hyperspectral imagery classification. Nevertheless, the spectral cannot fully represent the inherent spatial distribution of the imagery. In this paper, a spectral-spatial kernel-based method for hyperspectral imagery classification is proposed. Firstly, the spatial feature was extracted by using area median filtering (AMF). Secondly, the result of the AMF was used to construct spatial feature patch according to different window sizes. Finally, using the kernel technique, the spectral feature and the spatial feature were jointly used for the classification through a support vector machine (SVM) formulation. Therefore, for hyperspectral imagery classification, the proposed method was called spectral-spatial kernel-based support vector machine (SSF-SVM). To evaluate the proposed method, experiments are performed on three hyperspectral images. The experimental results show that an improvement is possible with the proposed technique in most of the real world classification problems.

  13. Region-based geometric active contour for classification using hyperspectral remote sensing images

    Science.gov (United States)

    Yan, Lin

    2011-12-01

    The high spectral resolution of hyperspectral imaging (HSI) systems greatly enhances the capabilities of discrimination, identification and quantification of objects of different materials from remote sensing images, but they also bring challenges to the processing and analysis of HSI data. One issue is the high computation cost and the curse of dimensionality associated with the high dimensions of HSI data. A second issue is how to effectively utilize the information including spectral and spatial information embedded in HSI data. Geometric Active Contour (GAC) is a widely used image segmentation method that utilizes the geometric information of objects within images. One category of GAC models, the region-based GAC models (RGAC), have good potential for remote sensing image processing because they use both spectral and geometry information in images are robust to initial contour placement. These models have been introduced to target extractions and classifications on remote sensing images. However, there are some restrictions on the applications of the RGAC models on remote sensing. First, the heavy involvement of iterative contour evolutions makes GAC applications time-consuming and inconvenient to use. Second, the current RGAC models must be based on a certain distance metric and the performance of RGAC classifiers are restricted by the performance of the employed distance metrics. According to the key features of the RGAC models analyzed in this dissertation, a classification framework is developed for remote sensing image classifications using the RGAC models. This framework allows the RGAC models to be combined with conventional pixel-based classifiers to promote them to spectral-spatial classifiers and also greatly reduces the iterations of contour evolutions. An extended Chan-Vese (ECV) model is proposed that is able to incorporate the widely used distance metrics in remote sensing image processing. A new type of RGAC model, the edge-oriented RGAC model

  14. View-constrained latent variable model for multi-view facial expression classification

    NARCIS (Netherlands)

    Eleftheriadis, Stefanos; Rudovic, Ognjen; Pantic, Maja

    2014-01-01

    We propose a view-constrained latent variable model for multi-view facial expression classification. In this model, we first learn a discriminative manifold shared by multiple views of facial expressions, followed by the expression classification in the shared manifold. For learning, we use the expr

  15. The Zipf Law revisited: An evolutionary model of emerging classification

    Energy Technology Data Exchange (ETDEWEB)

    Levitin, L.B. [Boston Univ., MA (United States); Schapiro, B. [TINA, Brandenburg (Germany); Perlovsky, L. [NRC, Wakefield, MA (United States)

    1996-12-31

    Zipf`s Law is a remarkable rank-frequency relationship observed in linguistics (the frequencies of the use of words are approximately inversely proportional to their ranks in the decreasing frequency order) as well as in the behavior of many complex systems of surprisingly different nature. We suggest an evolutionary model of emerging classification of objects into classes corresponding to concepts and denoted by words. The evolution of the system is derived from two basic assumptions: first, the probability to recognize an object as belonging to a known class is proportional to the number of objects in this class already recognized, and, second, there exists a small probability to observe an object that requires creation of a new class ({open_quotes}mutation{close_quotes} that gives birth to a new {open_quotes}species{close_quotes}). It is shown that the populations of classes in such a system obey the Zipf Law provided that the rate of emergence of new classes is small. The model leads also to the emergence of a second-tier structure of {open_quotes}super-classes{close_quotes} - groups of classes with almost equal populations.

  16. GIS-based landform classification of Bronze Age archaeological sites on Crete Island

    Science.gov (United States)

    Argyriou, Athanasios V.; Teeuw, Richard M.; Sarris, Apostolos

    2017-01-01

    Various physical attributes of the Earth’s surface are factors that influence local topography and indirectly influence human behaviour in terms of habitation locations. The determination of geomorphological setting plays an important role in archaeological landscape research. Several landform types can be distinguished by characteristic geomorphic attributes that portray the landscape surrounding a settlement and influence its ability to sustain a population. Geomorphometric landform information, derived from digital elevation models (DEMs), such as the ASTER Global DEM, can provide useful insights into the processes shaping landscapes. This work examines the influence of landform classification on the settlement locations of Bronze Age (Minoan) Crete, focusing on the districts of Phaistos, Kavousi and Vrokastro. The landform classification was based on the topographic position index (TPI) and deviation from mean elevation (DEV) analysis to highlight slope steepness of various landform classes, characterizing the surrounding landscape environment of the settlements locations. The outcomes indicate no interrelationship between the settlement locations and topography during the Early Minoan period, but a significant interrelationship exists during the later Minoan periods with the presence of more organised societies. The landform classification can provide insights into factors favouring human habitation and can contribute to archaeological predictive modelling. PMID:28222134

  17. A Novel Hepatocellular Carcinoma Image Classification Method Based on Voting Ranking Random Forests.

    Science.gov (United States)

    Xia, Bingbing; Jiang, Huiyan; Liu, Huiling; Yi, Dehui

    2015-01-01

    This paper proposed a novel voting ranking random forests (VRRF) method for solving hepatocellular carcinoma (HCC) image classification problem. Firstly, in preprocessing stage, this paper used bilateral filtering for hematoxylin-eosin (HE) pathological images. Next, this paper segmented the bilateral filtering processed image and got three different kinds of images, which include single binary cell image, single minimum exterior rectangle cell image, and single cell image with a size of n⁎n. After that, this paper defined atypia features which include auxiliary circularity, amendment circularity, and cell symmetry. Besides, this paper extracted some shape features, fractal dimension features, and several gray features like Local Binary Patterns (LBP) feature, Gray Level Co-occurrence Matrix (GLCM) feature, and Tamura features. Finally, this paper proposed a HCC image classification model based on random forests and further optimized the model by voting ranking method. The experiment results showed that the proposed features combined with VRRF method have a good performance in HCC image classification problem.

  18. A Novel Hepatocellular Carcinoma Image Classification Method Based on Voting Ranking Random Forests

    Directory of Open Access Journals (Sweden)

    Bingbing Xia

    2016-01-01

    Full Text Available This paper proposed a novel voting ranking random forests (VRRF method for solving hepatocellular carcinoma (HCC image classification problem. Firstly, in preprocessing stage, this paper used bilateral filtering for hematoxylin-eosin (HE pathological images. Next, this paper segmented the bilateral filtering processed image and got three different kinds of images, which include single binary cell image, single minimum exterior rectangle cell image, and single cell image with a size of n⁎n. After that, this paper defined atypia features which include auxiliary circularity, amendment circularity, and cell symmetry. Besides, this paper extracted some shape features, fractal dimension features, and several gray features like Local Binary Patterns (LBP feature, Gray Level Cooccurrence Matrix (GLCM feature, and Tamura features. Finally, this paper proposed a HCC image classification model based on random forests and further optimized the model by voting ranking method. The experiment results showed that the proposed features combined with VRRF method have a good performance in HCC image classification problem.

  19. Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis

    Directory of Open Access Journals (Sweden)

    Łukasz Augustyniak

    2015-12-01

    Full Text Available We propose a novel method for counting sentiment orientation that outperforms supervised learning approaches in time and memory complexity and is not statistically significantly different from them in accuracy. Our method consists of a novel approach to generating unigram, bigram and trigram lexicons. The proposed method, called frequentiment, is based on calculating the frequency of features (words in the document and averaging their impact on the sentiment score as opposed to documents that do not contain these features. Afterwards, we use ensemble classification to improve the overall accuracy of the method. What is important is that the frequentiment-based lexicons with sentiment threshold selection outperform other popular lexicons and some supervised learners, while being 3–5 times faster than the supervised approach. We compare 37 methods (lexicons, ensembles with lexicon’s predictions as input and supervised learners applied to 10 Amazon review data sets and provide the first statistical comparison of the sentiment annotation methods that include ensemble approaches. It is one of the most comprehensive comparisons of domain sentiment analysis in the literature.

  20. Object-Based Tree Species Classification in Urban Ecosystems Using LiDAR and Hyperspectral Data

    Directory of Open Access Journals (Sweden)

    Zhongya Zhang

    2016-06-01

    Full Text Available In precision forestry, tree species identification is key to evaluating the role of forest ecosystems in the provision of ecosystem services, such as carbon sequestration and assessing their effects on climate regulation and climate change. In this study, we investigated the effectiveness of tree species classification of urban forests using aerial-based HyMap hyperspectral imagery and light detection and ranging (LiDAR data. First, we conducted an object-based image analysis (OBIA to segment individual tree crowns present in LiDAR-derived Canopy Height Models (CHMs. Then, hyperspectral values for individual trees were extracted from HyMap data for band reduction through Minimum Noise Fraction (MNF transformation which allowed us to reduce the data to 20 significant bands out of 118 bands acquired. Finally, we compared several different classifications using Random Forest (RF and Multi Class Classifier (MCC methods. Seven tree species were classified using all 118 bands which resulted in 46.3% overall classification accuracy for RF versus 79.6% for MCC. Using only the 20 optimal bands extracted through MNF, both RF and MCC achieved an increase in overall accuracy to 87.0% and 88.9%, respectively. Thus, the MNF band selection process is a preferable approach for tree species classification when using hyperspectral data. Further, our work also suggests that RF is heavily disadvantaged by the high-dimensionality and noise present in hyperspectral data, while MCC is more robust when handling high-dimensional datasets with small sample sizes. Our overall results indicated that individual tree species identification in urban forests can be accomplished with the fusion of object-based LiDAR segmentation of crowns and hyperspectral characterization.

  1. 精神疲劳实时监测中多面部特征时序分类模型%Time-series classification model based on multiple facial feature for real-time mental fatigue monitoring

    Institute of Scientific and Technical Information of China (English)

    陈云华; 张灵; 丁伍洋; 严明玉

    2013-01-01

    针对现有疲劳监测方法仅根据单帧图像嘴巴形态进行哈欠识别准确率低,采用阈值法分析眨眼参数适应性较差,无法对疲劳状态的过渡进行实时监测等问题,提出一种新的进行精神疲劳实时监测的多面部特征时序分类模型.首先,通过面部视觉特征提取张口度曲线与虹膜似圆比曲线;然后,采用滑动窗口分段、隐马尔可夫模型(HMM)建模等方法在张口度曲线的基础上构建哈欠特征时序并进行类别标记,在虹膜似圆比曲线的基础上构建眨眼持续时间时序并进行类别标记;最后,在HMM的基础上增加时间戳,以便自适应地选取时序初始时刻点并进行多个特征时序的同步与标记结果的融合.实验结果表明,本文模型可降低哈欠误判率,对不同年龄的人群眨眼具有很好的适应性,并可实现对精神疲劳过渡状态的实时监测.%In computer vision based fatigue monitoring,there are still some unresolved issues remained,including low recognition accuracy in yawn detection based on a single-frame; poor adaptability in blink analysis because of the required threshold,the inability to monitor the transition stages of fatigue in real-time.Attempted to solve these problems,we propose a new classification model in this paper,which is based on two feature time-series for real-time mental fatigue monitoring.First,the mouth opening degree and iris circularity ratio are calculated through facial visual feature extraction.Based on this,we can generate a corresponding time-series called α (the proportion of the time during which mouth opening exceeds a given threshold) time series and eye blink time (EBT) time series.Then,using sliding window to partition and annotate the two kinds of time series and build hidden markov model (HMM) for EBT time series.Finally,add a time stamp on HMM to adaptively calculate the initial time point of the next time series,in addition,we can use it to perform the

  2. Tissue Classification

    DEFF Research Database (Denmark)

    Van Leemput, Koen; Puonti, Oula

    2015-01-01

    Computational methods for automatically segmenting magnetic resonance images of the brain have seen tremendous advances in recent years. So-called tissue classification techniques, aimed at extracting the three main brain tissue classes (white matter, gray matter, and cerebrospinal fluid), are now...... well established. In their simplest form, these methods classify voxels independently based on their intensity alone, although much more sophisticated models are typically used in practice. This article aims to give an overview of often-used computational techniques for brain tissue classification...

  3. Emotional textile image classification based on cross-domain convolutional sparse autoencoders with feature selection

    Science.gov (United States)

    Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin

    2017-01-01

    We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.

  4. Multi-Stage Feature Selection Based Intelligent Classifier for Classification of Incipient Stage Fire in Building

    Directory of Open Access Journals (Sweden)

    Allan Melvin Andrew

    2016-01-01

    Full Text Available In this study, an early fire detection algorithm has been proposed based on low cost array sensing system, utilising off- the shelf gas sensors, dust particles and ambient sensors such as temperature and humidity sensor. The odour or “smellprint” emanated from various fire sources and building construction materials at early stage are measured. For this purpose, odour profile data from five common fire sources and three common building construction materials were used to develop the classification model. Normalised feature extractions of the smell print data were performed before subjected to prediction classifier. These features represent the odour signals in the time domain. The obtained features undergo the proposed multi-stage feature selection technique and lastly, further reduced by Principal Component Analysis (PCA, a dimension reduction technique. The hybrid PCA-PNN based approach has been applied on different datasets from in-house developed system and the portable electronic nose unit. Experimental classification results show that the dimension reduction process performed by PCA has improved the classification accuracy and provided high reliability, regardless of ambient temperature and humidity variation, baseline sensor drift, the different gas concentration level and exposure towards different heating temperature range.

  5. Intelligent Video Object Classification Scheme using Offline Feature Extraction and Machine Learning based Approach

    Directory of Open Access Journals (Sweden)

    Chandra Mani Sharma

    2012-01-01

    Full Text Available Classification of objects in video stream is important because of its application in many emerging areas such as visual surveillance, content based video retrieval and indexing etc. The task is far more challenging because the video data is of heavy and highly variable nature. The processing of video data is required to be in real-time. This paper presents a multiclass object classification technique using machine learning approach. Haar-like features are used for training the classifier. The feature calculation is performed using Integral Image representation and we train the classifier offline using a Stage-wise Additive Modeling using a Multiclass Exponential loss function (SAMME. The validity of the method has been verified from the implementation of a real-time human-car detector. Experimental results show that the proposed method can accurately classify objects, in video, into their respective classes. The proposed object classifier works well in outdoor environment in presence of moderate lighting conditions and variable scene background. The proposed technique is compared, with other object classification techniques, based on various performance parameters.

  6. WordNet-based lexical semantic classification for text corpus analysis

    Institute of Scientific and Technical Information of China (English)

    LONG Jun; WANG Lu-da; LI Zu-de; ZHANG Zu-ping; YANG Liu

    2015-01-01

    Many text classifications depend on statistical term measures to implement document representation. Such document representations ignore the lexical semantic contents of terms and the distilled mutual information, leading to text classification errors. This work proposed a document representation method, WordNet-based lexical semantic VSM, to solve the problem. Using WordNet, this method constructed a data structure of semantic-element information to characterize lexical semantic contents, and adjusted EM modeling to disambiguate word stems. Then, in the lexical-semantic space of corpus, lexical-semantic eigenvector of document representation was built by calculating the weight of each synset, and applied to a widely-recognized algorithm NWKNN. On text corpus Reuter-21578 and its adjusted version of lexical replacement, the experimental results show that the lexical-semantic eigenvector performsF1 measure and scales of dimension better than term-statistic eigenvector based on TF-IDF. Formation of document representation eigenvectors ensures the method a wide prospect of classification applications in text corpus analysis.

  7. Defining and evaluating classification algorithm for high-dimensional data based on latent topics.

    Directory of Open Access Journals (Sweden)

    Le Luo

    Full Text Available Automatic text categorization is one of the key techniques in information retrieval and the data mining field. The classification is usually time-consuming when the training dataset is large and high-dimensional. Many methods have been proposed to solve this problem, but few can achieve satisfactory efficiency. In this paper, we present a method which combines the Latent Dirichlet Allocation (LDA algorithm and the Support Vector Machine (SVM. LDA is first used to generate reduced dimensional representation of topics as feature in VSM. It is able to reduce features dramatically but keeps the necessary semantic information. The Support Vector Machine (SVM is then employed to classify the data based on the generated features. We evaluate the algorithm on 20 Newsgroups and Reuters-21578 datasets, respectively. The experimental results show that the classification based on our proposed LDA+SVM model achieves high performance in terms of precision, recall and F1 measure. Further, it can achieve this within a much shorter time-frame. Our process improves greatly upon the previous work in this field and displays strong potential to achieve a streamlined classification process for a wide range of applications.

  8. A texton-based approach for the classification of lung parenchyma in CT images

    DEFF Research Database (Denmark)

    Gangeh, Mehrdad J.; Sørensen, Lauge; Shaker, Saher B.

    2010-01-01

    In this paper, a texton-based classification system based on raw pixel representation along with a support vector machine with radial basis function kernel is proposed for the classification of emphysema in computed tomography images of the lung. The proposed approach is tested on 168 annotated...

  9. Cell-graph mining for breast tissue modeling and classification.

    Science.gov (United States)

    Bilgin, Cagatay; Demir, Cigdem; Nagi, Chandandeep; Yener, Bulent

    2007-01-01

    We consider the problem of automated cancer diagnosis in the context of breast tissues. We present graph theoretical techniques that identify and compute quantitative metrics for tissue characterization and classification. We segment digital images of histopatological tissue samples using k-means algorithm. For each segmented image we generate different cell-graphs using positional coordinates of cells and surrounding matrix components. These cell-graphs have 500-2000 cells(nodes) with 1000-10000 links depending on the tissue and the type of cell-graph being used. We calculate a set of global metrics from cell-graphs and use them as the feature set for learning. We compare our technique, hierarchical cell graphs, with other techniques based on intensity values of images, Delaunay triangulation of the cells, the previous technique we proposed for brain tissue images and with the hybrid approach that we introduce in this paper. Among the compared techniques, hierarchical-graph approach gives 81.8% accuracy whereas we obtain 61.0%, 54.1% and 75.9% accuracy with intensity-based features, Delaunay triangulation and our previous technique, respectively.

  10. Classification of Polarimetric SAR Image Based on the Subspace Method

    Science.gov (United States)

    Xu, J.; Li, Z.; Tian, B.; Chen, Q.; Zhang, P.

    2013-07-01

    Land cover classification is one of the most significant applications in remote sensing. Compared to optical sensing technologies, synthetic aperture radar (SAR) can penetrate through clouds and have all-weather capabilities. Therefore, land cover classification for SAR image is important in remote sensing. The subspace method is a novel method for the SAR data, which reduces data dimensionality by incorporating feature extraction into the classification process. This paper uses the averaged learning subspace method (ALSM) method that can be applied to the fully polarimetric SAR image for classification. The ALSM algorithm integrates three-component decomposition, eigenvalue/eigenvector decomposition and textural features derived from the gray-level cooccurrence matrix (GLCM). The study site, locates in the Dingxing county, in Hebei Province, China. We compare the subspace method with the traditional supervised Wishart classification. By conducting experiments on the fully polarimetric Radarsat-2 image, we conclude the proposed method yield higher classification accuracy. Therefore, the ALSM classification method is a feasible and alternative method for SAR image.

  11. An approach for mechanical fault classification based on generalized discriminant analysis

    Institute of Scientific and Technical Information of China (English)

    LI Wei-hua; SHI Tie-lin; YANG Shu-zi

    2006-01-01

    To deal with pattern classification of complicated mechanical faults,an approach to multi-faults classification based on generalized discriminant analysis is presented.Compared with linear discriminant analysis (LDA),generalized discriminant analysis (GDA),one of nonlinear discriminant analysis methods,is more suitable for classifying the linear non-separable problem.The connection and difference between KPCA (Kernel Principal Component Analysis) and GDA is discussed.KPCA is good at detection of machine abnormality while GDA performs well in multi-faults classification based on the collection of historical faults symptoms.When the proposed method is applied to air compressor condition classification and gear fault classification,an excellent performance in complicated multi-faults classification is presented.

  12. Rapid Occupant Classification System Based Rough Sets Theory

    Directory of Open Access Journals (Sweden)

    Lin Chen

    2012-09-01

    Full Text Available In the intelligent airbag system, the correct classification of occupant type is the precondition and plays an important role in controlling the airbag release time and inflation strength during emergent accidents. In the paper, the novel rapid occupant classification system is proposed in which tens of pressure sensors are needed to real-time collect pressure distribution data and then the rough sets theory is combined to extract classification knowledge from data features. Furthermore, Experiments have been done to verify its efficiency and effectiviness.

  13. A NEW SVM BASED EMOTIONAL CLASSIFICATION OF IMAGE

    Institute of Scientific and Technical Information of China (English)

    Wang Weining; Yu Yinglin; Zhang Jianchao

    2005-01-01

    How high-level emotional representation of art paintings can be inferred from percep tual level features suited for the particular classes (dynamic vs. static classification)is presented. The key points are feature selection and classification. According to the strong relationship between notable lines of image and human sensations, a novel feature vector WLDLV (Weighted Line Direction-Length Vector) is proposed, which includes both orientation and length information of lines in an image. Classification is performed by SVM (Support Vector Machine) and images can be classified into dynamic and static. Experimental results demonstrate the effectiveness and superiority of the algorithm.

  14. Toward a use case based classification of mobile health applications.

    Science.gov (United States)

    Yasini, Mobin; Marchand, Guillaume

    2015-01-01

    Smartphones are growing in number and mobile health applications (apps) are becoming a commonly used way for improving the quality of health and healthcare delivery. Health related apps are mainly centralized in Medical and health&fitness categories in Google and Apple app stores. However, these apps are not easily accessible by the users. We decided to develop a system facilitating the access to these apps, to increase their visibility and usability. Various use cases for 567 health related apps in French were identified and listed incrementally. UML modeling was then used to represent these use cases and their relationships with each other and with the potential users of these apps. Thirty one different use cases were found that were then regrouped into six major categories: consulting medical information references, communicating and/or sharing the information, fulfilling a contextual need, educational tools, managing professional activities, health related management. A classification of this type would highlight the real purpose and functionalities of these apps and offers the user to search for the right app rapidly and to find it in a non-ambiguous context.

  15. Music Emotion Four Classification Research Based on Gaussian Mixture Model%基于高斯混合模型的音乐情绪四分类研究

    Institute of Scientific and Technical Information of China (English)

    陆阳; 郭滨; 白雪梅

    2015-01-01

    针对音乐情感复杂难以归类的问题,提出了一种在四分类坐标下建立高斯混合模型进行音乐信号归类的研究方法.在建立模型的基础上,创新地为表示情绪特性的轴两端建立模型使其转换成二层分类器进行加权判别.结果表明,为表示情绪特性的轴建立模型且权值分配在0.7和0.3的条件下,音乐的分类工作可以取得最优结果,其结果明显优于直接为每类情绪建立模型的结果.%For the problem of music emotional complexity and difficult to categorize, we proposed a method to estab-lish Gaussian mixture models in four classifications. On the basis of establish models, we innovated established GMM for shaft at both ends of the emotional model and converted it into two-layer weighted classifier discrimination. The re-sults shows that the GMM for shaft models and weight distribution under the condition of 0.7 and 0.3, the musical work can obtain the best classification result, and the result is better than the result of directly establish models for each type of emotion.

  16. Analysis on Design of Kohonen-network System Based on Classification of Complex Signals

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The key methods of detection and classification of the electroencephalogram(EEG) used in recent years are introduced . Taking EEG for example, the design plan of Kohonen neural network system based on detection and classification of complex signals is proposed, and both the network design and signal processing are analyzed, including pre-processing of signals, extraction of signal features, classification of signal and network topology, etc.

  17. The Discriminative validity of "nociceptive," "peripheral neuropathic," and "central sensitization" as mechanisms-based classifications of musculoskeletal pain.

    LENUS (Irish Health Repository)

    Smart, Keith M

    2012-02-01

    OBJECTIVES: Empirical evidence of discriminative validity is required to justify the use of mechanisms-based classifications of musculoskeletal pain in clinical practice. The purpose of this study was to evaluate the discriminative validity of mechanisms-based classifications of pain by identifying discriminatory clusters of clinical criteria predictive of "nociceptive," "peripheral neuropathic," and "central sensitization" pain in patients with low back (+\\/- leg) pain disorders. METHODS: This study was a cross-sectional, between-patients design using the extreme-groups method. Four hundred sixty-four patients with low back (+\\/- leg) pain were assessed using a standardized assessment protocol. After each assessment, patients\\' pain was assigned a mechanisms-based classification. Clinicians then completed a clinical criteria checklist indicating the presence\\/absence of various clinical criteria. RESULTS: Multivariate analyses using binary logistic regression with Bayesian model averaging identified a discriminative cluster of 7, 3, and 4 symptoms and signs predictive of a dominance of "nociceptive," "peripheral neuropathic," and "central sensitization" pain, respectively. Each cluster was found to have high levels of classification accuracy (sensitivity, specificity, positive\\/negative predictive values, positive\\/negative likelihood ratios). DISCUSSION: By identifying a discriminatory cluster of symptoms and signs predictive of "nociceptive," "peripheral neuropathic," and "central" pain, this study provides some preliminary discriminative validity evidence for mechanisms-based classifications of musculoskeletal pain. Classification system validation requires the accumulation of validity evidence before their use in clinical practice can be recommended. Further studies are required to evaluate the construct and criterion validity of mechanisms-based classifications of musculoskeletal pain.

  18. Trace elements based classification on clinkers. Application to Spanish clinkers

    Directory of Open Access Journals (Sweden)

    Tamás, F. D.

    2001-12-01

    Full Text Available The qualitative identification to determine the origin (i.e. manufacturing factory of Spanish clinkers is described. The classification of clinkers produced in different factories can be based on their trace element content. Approximately fifteen clinker sorts are analysed, collected from 11 Spanish cement factories to determine their Mg, Sr, Ba, Mn, Ti, Zr, Zn and V content. An expert system formulated by a binary decision tree is designed based on the collected data. The performance of the obtained classifier was measured by ten-fold cross validation. The results show that the proposed method is useful to identify an easy-to-use expert system that is able to determine the origin of the clinker based on its trace element content.

    En el presente trabajo se describe el procedimiento de identificación cualitativa de clínkeres españoles con el objeto de determinar su origen (fábrica. Esa clasificación de los clínkeres se basa en el contenido de sus elementos traza. Se analizaron 15 clínkeres diferentes procedentes de 11 fábricas de cemento españolas, determinándose los contenidos en Mg, Sr, Ba, Mn, Ti, Zr, Zn y V. Se ha diseñado un sistema experto mediante un árbol de decisión binario basado en los datos recogidos. La clasificación obtenida fue examinada mediante la validación cruzada de 10 valores. Los resultados obtenidos muestran que el modelo propuesto es válido para identificar, de manera fácil, un sistema experto capaz de determinar el origen de un clínker basándose en el contenido de sus elementos traza.

  19. Knowledge-based sea ice classification by polarimetric SAR

    DEFF Research Database (Denmark)

    Skriver, Henning; Dierking, Wolfgang

    2004-01-01

    Polarimetric SAR images acquired at C- and L-band over sea ice in the Greenland Sea, Baltic Sea, and Beaufort Sea have been analysed with respect to their potential for ice type classification. The polarimetric data were gathered by the Danish EMISAR and the US AIRSAR which both are airborne...... systems. A hierarchical classification scheme was chosen for sea ice because our knowledge about magnitudes, variations, and dependences of sea ice signatures can be directly considered. The optimal sequence of classification rules and the rules themselves depend on the ice conditions/regimes. The use...... of the polarimetric phase information improves the classification only in the case of thin ice types but is not necessary for thicker ice (above about 30 cm thickness)...

  20. Toward Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models.

    Science.gov (United States)

    Li, Congcong; Kowdle, Adarsh; Saxena, Ashutosh; Chen, Tsuhan

    2012-07-01

    Scene understanding includes many related subtasks, such as scene categorization, depth estimation, object detection, etc. Each of these subtasks is often notoriously hard, and state-of-the-art classifiers already exist for many of them. These classifiers operate on the same raw image and provide correlated outputs. It is desirable to have an algorithm that can capture such correlation without requiring any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that jointly optimizes all the subtasks while requiring only a "black box" interface to the original classifier for each subtask. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about which error modes to focus on. We show that our method significantly improves performance in all the subtasks in the domain of scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling, and saliency detection. Our method also improves performance in two robotic applications: an object-grasping robot and an object-finding robot.

  1. A Binomial Mixture Model for Classification Performance: A Commentary on Waxman, Chambers, Yntema, and Gelman (1989).

    Science.gov (United States)

    Thomas, Hoben

    1989-01-01

    Individual differences in children's performance on a classification task are modeled by a two component binomial mixture distribution. The model accounts for data well, with variance accounted for ranging from 87 to 95 percent. (RJC)

  2. Plant Electrical Signal Classification Based on Waveform Similarity

    Directory of Open Access Journals (Sweden)

    Yang Chen

    2016-10-01

    Full Text Available (1 Background: Plant electrical signals are important physiological traits which reflect plant physiological state. As a kind of phenotypic data, plant action potential (AP evoked by external stimuli—e.g., electrical stimulation, environmental stress—may be associated with inhibition of gene expression related to stress tolerance. However, plant AP is a response to environment changes and full of variability. It is an aperiodic signal with refractory period, discontinuity, noise, and artifacts. In consequence, there are still challenges to automatically recognize and classify plant AP; (2 Methods: Therefore, we proposed an AP recognition algorithm based on dynamic difference threshold to extract all waveforms similar to AP. Next, an incremental template matching algorithm was used to classify the AP and non-AP waveforms; (3 Results: Experiment results indicated that the template matching algorithm achieved a classification rate of 96.0%, and it was superior to backpropagation artificial neural networks (BP-ANNs, supported vector machine (SVM and deep learning method; (4 Conclusion: These findings imply that the proposed methods are likely to expand possibilities for rapidly recognizing and classifying plant action potentials in the database in the future.

  3. Radar-Derived Quantitative Precipitation Estimation Based on Precipitation Classification

    Directory of Open Access Journals (Sweden)

    Lili Yang

    2016-01-01

    Full Text Available A method for improving radar-derived quantitative precipitation estimation is proposed. Tropical vertical profiles of reflectivity (VPRs are first determined from multiple VPRs. Upon identifying a tropical VPR, the event can be further classified as either tropical-stratiform or tropical-convective rainfall by a fuzzy logic (FL algorithm. Based on the precipitation-type fields, the reflectivity values are converted into rainfall rate using a Z-R relationship. In order to evaluate the performance of this rainfall classification scheme, three experiments were conducted using three months of data and two study cases. In Experiment I, the Weather Surveillance Radar-1988 Doppler (WSR-88D default Z-R relationship was applied. In Experiment II, the precipitation regime was separated into convective and stratiform rainfall using the FL algorithm, and corresponding Z-R relationships were used. In Experiment III, the precipitation regime was separated into convective, stratiform, and tropical rainfall, and the corresponding Z-R relationships were applied. The results show that the rainfall rates obtained from all three experiments match closely with the gauge observations, although Experiment II could solve the underestimation, when compared to Experiment I. Experiment III significantly reduced this underestimation and generated the most accurate radar estimates of rain rate among the three experiments.

  4. A tentative classification of paleoweathering formations based on geomorphological criteria

    Science.gov (United States)

    Battiau-Queney, Yvonne

    1996-05-01

    A geomorphological classification is proposed that emphasizes the usefulness of paleoweathering records in any reconstruction of past landscapes. Four main paleoweathering records are recognized: 1. Paleoweathering formations buried beneath a sedimentary or volcanic cover. Most of them are saprolites, sometimes with preserved overlying soils. Ages range from Archean to late Cenozoic times; 2. Paleoweathering formations trapped in karst: some of them have buried pre-existent karst landforms, others have developed simultaneously with the subjacent karst; 3. Relict paleoweathering formations: although inherited, they belong to the present landscape. Some of them are indurated (duricrusts, silcretes, ferricretes,…); others are not and owe their preservation to a stable morphotectonic environment; 4. Polyphased weathering mantles: weathering has taken place in changing geochemical conditions. After examples of each type are provided, the paper considers the relations between chemical weathering and landform development. The climatic significance of paleoweathering formations is discussed. Some remote morphogenic systems have no present equivalent. It is doubtful that chemical weathering alone might lead to widespread planation surfaces. Moreover, classical theories based on sea-level and rivers as the main factors of erosion are not really adequate to explain the observed landscapes.

  5. Classification of CT brain images based on deep learning networks.

    Science.gov (United States)

    Gao, Xiaohong W; Hui, Rui; Tian, Zengmin

    2017-01-01

    While computerised tomography (CT) may have been the first imaging tool to study human brain, it has not yet been implemented into clinical decision making process for diagnosis of Alzheimer's disease (AD). On the other hand, with the nature of being prevalent, inexpensive and non-invasive, CT does present diagnostic features of AD to a great extent. This study explores the significance and impact on the application of the burgeoning deep learning techniques to the task of classification of CT brain images, in particular utilising convolutional neural network (CNN), aiming at providing supplementary information for the early diagnosis of Alzheimer's disease. Towards this end, three categories of CT images (N = 285) are clustered into three groups, which are AD, lesion (e.g. tumour) and normal ageing. In addition, considering the characteristics of this collection with larger thickness along the direction of depth (z) (~3-5 mm), an advanced CNN architecture is established integrating both 2D and 3D CNN networks. The fusion of the two CNN networks is subsequently coordinated based on the average of Softmax scores obtained from both networks consolidating 2D images along spatial axial directions and 3D segmented blocks respectively. As a result, the classification accuracy rates rendered by this elaborated CNN architecture are 85.2%, 80% and 95.3% for classes of AD, lesion and normal respectively with an average of 87.6%. Additionally, this improved CNN network appears to outperform the others when in comparison with 2D version only of CNN network as well as a number of state of the art hand-crafted approaches. As a result, these approaches deliver accuracy rates in percentage of 86.3, 85.6 ± 1.10, 86.3 ± 1.04, 85.2 ± 1.60, 83.1 ± 0.35 for 2D CNN, 2D SIFT, 2D KAZE, 3D SIFT and 3D KAZE respectively. The two major contributions of the paper constitute a new 3-D approach while applying deep learning technique to extract signature information

  6. Predictive model of biliocystic communication in liver hydatid cysts using classification and regression tree analysis

    Directory of Open Access Journals (Sweden)

    Souadka Amine

    2010-04-01

    Full Text Available Abstract Background Incidence of liver hydatid cyst (LHC rupture ranged 15%-40% of all cases and most of them concern the bile duct tree. Patients with biliocystic communication (BCC had specific clinic and therapeutic aspect. The purpose of this study was to determine witch patients with LHC may develop BCC using classification and regression tree (CART analysis Methods A retrospective study of 672 patients with liver hydatid cyst treated at the surgery department "A" at Ibn Sina University Hospital, Rabat Morocco. Four-teen risk factors for BCC occurrence were entered into CART analysis to build an algorithm that can predict at the best way the occurrence of BCC. Results Incidence of BCC was 24.5%. Subgroups with high risk were patients with jaundice and thick pericyst risk at 73.2% and patients with thick pericyst, with no jaundice 36.5 years and younger with no past history of LHC risk at 40.5%. Our developed CART model has sensitivity at 39.6%, specificity at 93.3%, positive predictive value at 65.6%, a negative predictive value at 82.6% and accuracy of good classification at 80.1%. Discriminating ability of the model was good 82%. Conclusion we developed a simple classification tool to identify LHC patients with high risk BCC during a routine clinic visit (only on clinical history and examination followed by an ultrasonography. Predictive factors were based on pericyst aspect, jaundice, age, past history of liver hydatidosis and morphological Gharbi cyst aspect. We think that this classification can be useful with efficacy to direct patients at appropriated medical struct's.

  7. IMPROVEMENT OF TCAM-BASED PACKET CLASSIFICATION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Xu Zhen; Zhang Jun; Rui Liyang; Sun Jun

    2008-01-01

    The feature of Ternary Content Addressable Memories (TCAMs) makes them particularly attractive for IP address lookup and packet classification applications in a router system. However, the limitations of TCAMs impede their utilization. In this paper, the solutions for decreasing the power consumption and avoiding entry expansion in range matching are addressed. Experimental results demonstrate that the proposed techniques can make some big improvements on the performance of TCAMs in IP address lookup and packet classification.

  8. Colored petri nets to model gene mutation and amino acids classification.

    Science.gov (United States)

    Yang, Jinliang; Gao, Rui; Meng, Max Q-H; Tarn, Tzyh-Jong

    2012-05-07

    The genetic code is the triplet code based on the three-letter codons, which determines the specific amino acid sequences in proteins synthesis. Choosing an appropriate model for processing these codons is a useful method to study genetic processes in Molecular Biology. As an effective modeling tool of discrete event dynamic systems (DEDS), colored petri net (CPN) has been used for modeling several biological systems, such as metabolic pathways and genetic regulatory networks. According to the genetic code table, CPN is employed to model the process of genetic information transmission. In this paper, we propose a CPN model of amino acids classification, and further present the improved CPN model. Based on the model mentioned above, we give another CPN model to classify the type of gene mutations via contrasting the bases of DNA strands and the codons of amino acids along the polypeptide chain. This model is helpful in determining whether a certain gene mutation will cause the changes of the structures and functions of protein molecules. The effectiveness and accuracy of the presented model are illustrated by the examples in this paper.

  9. Wavelength-Adaptive Dehazing Using Histogram Merging-Based Classification for UAV Images

    Directory of Open Access Journals (Sweden)

    Inhye Yoon

    2015-03-01

    Full Text Available Since incoming light to an unmanned aerial vehicle (UAV platform can be scattered by haze and dust in the atmosphere, the acquired image loses the original color and brightness of the subject. Enhancement of hazy images is an important task in improving the visibility of various UAV images. This paper presents a spatially-adaptive dehazing algorithm that merges color histograms with consideration of the wavelength-dependent atmospheric turbidity. Based on the wavelength-adaptive hazy image acquisition model, the proposed dehazing algorithm consists of three steps: (i image segmentation based on geometric classes; (ii generation of the context-adaptive transmission map; and (iii intensity transformation for enhancing a hazy UAV image. The major contribution of the research is a novel hazy UAV image degradation model by considering the wavelength of light sources. In addition, the proposed transmission map provides a theoretical basis to differentiate visually important regions from others based on the turbidity and merged classification results.

  10. Wavelength-adaptive dehazing using histogram merging-based classification for UAV images.

    Science.gov (United States)

    Yoon, Inhye; Jeong, Seokhwa; Jeong, Jaeheon; Seo, Doochun; Paik, Joonki

    2015-03-19

    Since incoming light to an unmanned aerial vehicle (UAV) platform can be scattered by haze and dust in the atmosphere, the acquired image loses the original color and brightness of the subject. Enhancement of hazy images is an important task in improving the visibility of various UAV images. This paper presents a spatially-adaptive dehazing algorithm that merges color histograms with consideration of the wavelength-dependent atmospheric turbidity. Based on the wavelength-adaptive hazy image acquisition model, the proposed dehazing algorithm consists of three steps: (i) image segmentation based on geometric classes; (ii) generation of the context-adaptive transmission map; and (iii) intensity transformation for enhancing a hazy UAV image. The major contribution of the research is a novel hazy UAV image degradation model by considering the wavelength of light sources. In addition, the proposed transmission map provides a theoretical basis to differentiate visually important regions from others based on the turbidity and merged classification results.

  11. Fuzzy Pattern Classification Based Detection of Faulty Electronic Fuel Control (EFC Valves Used in Diesel Engines

    Directory of Open Access Journals (Sweden)

    Umut Tugsal

    2014-05-01

    Full Text Available In this paper, we develop mathematical models of a rotary Electronic Fuel Control (EFC valve used in a Diesel engine based on dynamic performance test data and system identification methodology in order to detect the faulty EFC valves. The model takes into account the dynamics of the electrical and mechanical portions of the EFC valves. A recursive least squares (RLS type system identification methodology has been utilized to determine the transfer functions of the different types of EFC valves that were investigated in this study. Both in frequency domain and time domain methods have been utilized for this purpose. Based on the characteristic patterns exhibited by the EFC valves, a fuzzy logic based pattern classification method was utilized to evaluate the residuals and identify faulty EFC valves from good ones. The developed methodology has been shown to provide robust diagnostics for a wide range of EFC valves.

  12. Robust Texture Classification via Group-Collaboratively Representation-Based Strategy

    Institute of Scientific and Technical Information of China (English)

    Xiao-Ling Xia; Hang-Hui Huang

    2013-01-01

    In this paper, we present a simple but powerful ensemble for robust texture classification. The proposed method uses a single type of feature descriptor, i.e. scale-invariant feature transform (SIFT), and inherits the spirit of the spatial pyramid matching model (SPM). In a flexible way of partitioning the original texture images, our approach can produce sufficient informative local features and thereby form a reliable feature pond or train a new class-specific dictionary. To take full advantage of this feature pond, we develop a group-collaboratively representation-based strategy (GCRS) for the final classification. It is solved by the well-known group lasso. But we go beyond of this and propose a locality-constraint method to speed up this, named local constraint-GCRS (LC-GCRS). Experimental results on three public texture datasets demonstrate the proposed approach achieves competitive outcomes and even outperforms the state-of-the-art methods. Particularly, most of methods cannot work well when only a few samples of each category are available for training, but our approach still achieves very high classification accuracy, e.g. an average accuracy of 92.1%for the Brodatz dataset when only one image is used for training, significantly higher than any other methods.

  13. Association Technique based on Classification for Classifying Microcalcification and Mass in Mammogram

    Directory of Open Access Journals (Sweden)

    Herwanto

    2013-01-01

    Full Text Available Currently, mammography is recognized as the most effective imaging modality for breast cancer screening. The challenge of using mammography is how to locate the area, which is indeed a solitary geographic abnormality. In mammography screening it is important to define the risk for women who have radiologically negative findings and for those who might develop malignancy later in life. Microcalcification and mass segmentation are used frequently as the first step in mammography screening. The main objective of this paper is to apply association technique based on classification algorithm to classify microcalcification and mass in mammogram. The system that we propose consists of: (i a preprocessing phase to enhance the quality of the image and followed by segmentating region of interest; (ii a phase for mining a transactional table; and (iii a phase for organizing the resulted association rules in a classification model. This paper also illustrates how important the data cleaning phase in building the data mining process for image classification. The proposed method was evaluated using the mammogram data from Mammographic Image Analysis Society (MIAS. The MIAS data consist of 207 images of normal breast, 64 benign, and 51 malignant. 85 mammograms of MIAS data have mass, and 25 mammograms have microcalcification. The features of mean and Gray Level Co-occurrence Matrix homogeneity have been proved to be potential for discriminating microcalcification from mass. The accuracy obtained by this method is 83%.

  14. Forward selection radial basis function networks applied to bacterial classification based on MALDI-TOF-MS.

    Science.gov (United States)

    Zhang, Zhuoyong; Wang, Dan; Harrington, Peter de B; Voorhees, Kent J; Rees, Jon

    2004-06-17

    Forward selection improved radial basis function (RBF) network was applied to bacterial classification based on the data obtained by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS). The classification of each bacterium cultured at different time was discussed and the effect of parameters of the RBF network was investigated. The new method involves forward selection to prevent overfitting and generalized cross-validation (GCV) was used as model selection criterion (MSC). The original data was compressed by using wavelet transformation to speed up the network training and reduce the number of variables of the original MS data. The data was normalized prior training and testing a network to define the area the neural network to be trained in, accelerate the training rate, and reduce the range the parameters to be selected in. The one-out-of-n method was used to split the data set of p samples into a training set of size p-1 and a test set of size 1. With the improved method, the classification correctness for the five bacteria discussed in the present paper are 87.5, 69.2, 80, 92.3, and 92.8%, respectively.

  15. Efficient multilevel brain tumor segmentation with integrated bayesian model classification.

    Science.gov (United States)

    Corso, J J; Sharon, E; Dube, S; El-Saden, S; Sinha, U; Yuille, A

    2008-05-01

    We present a new method for automatic segmentation of heterogeneous image data that takes a step toward bridging the gap between bottom-up affinity-based segmentation methods and top-down generative model based approaches. The main contribution of the paper is a Bayesian formulation for incorporating soft model assignments into the calculation of affinities, which are conventionally model free. We integrate the resulting model-aware affinities into the multilevel segmentation by weighted aggregation algorithm, and apply the technique to the task of detecting and segmenting brain tumor and edema in multichannel magnetic resonance (MR) volumes. The computationally efficient method runs orders of magnitude faster than current state-of-the-art techniques giving comparable or improved results. Our quantitative results indicate the benefit of incorporating model-aware affinities into the segmentation process for the difficult case of glioblastoma multiforme brain tumor.

  16. Object Classification based Context Management for Identity Management in Internet of Things

    DEFF Research Database (Denmark)

    Mahalle, Parikshit N.; Prasad, Neeli R.; Prasad, Ramjee

    2013-01-01

    , and there is a need of context-aware access control solution for IdM. Confronting uncertainty of different types of objects in IoT is not easy. This paper presents the logical framework for object classification in context aware IoT, as richer contextual information creates an impact on the access control. This paper...... proposes decision theory based object classification to provide contextual information and context management. Simulation results show that the proposed object classification is useful to improve network lifetime. Results also give motivation of object classification in terms of energy consumption...

  17. Speech/Music Classification Enhancement for 3GPP2 SMV Codec Based on Support Vector Machine

    Science.gov (United States)

    Kim, Sang-Kyun; Chang, Joon-Hyuk

    In this letter, we propose a novel approach to speech/music classification based on the support vector machine (SVM) to improve the performance of the 3GPP2 selectable mode vocoder (SMV) codec. We first analyze the features and the classification method used in real time speech/music classification algorithm in SMV, and then apply the SVM for enhanced speech/music classification. For evaluation of performance, we compare the proposed algorithm and the traditional algorithm of the SMV. The performance of the proposed system is evaluated under the various environments and shows better performance compared to the original method in the SMV.

  18. Signal classification method based on data mining for multi-mode radar

    Institute of Scientific and Technical Information of China (English)

    Qiang Guo; Pulong Nan; Jian Wan

    2016-01-01

    For the multi-mode radar working in the modern elec-tronic battlefield, different working states of one single radar are prone to being classified as multiple emitters when adopting traditional classification methods to process intercepted signals, which has a negative effect on signal classification. A classification method based on spatial data mining is presented to address the above chal enge. Inspired by the idea of spatial data mining, the classification method applies nuclear field to depicting the distribu-tion information of pulse samples in feature space, and digs out the hidden cluster information by analyzing distribution characteristics. In addition, a membership-degree criterion to quantify the correla-tion among al classes is established, which ensures classification accuracy of signal samples. Numerical experiments show that the presented method can effectively prevent different working states of multi-mode emitter from being classified as several emitters, and achieve higher classification accuracy.

  19. Classification of Noisy Data: An Approach Based on Genetic Algorithms and Voronoi Tessellation

    DEFF Research Database (Denmark)

    Khan, Abdul Rauf; Schiøler, Henrik; Knudsen, Torben;

    2016-01-01

    Classification is one of the major constituents of the data-mining toolkit. The well-known methods for classification are built on either the principle of logic or statistical/mathematical reasoning for classification. In this article we propose: (1) a different strategy, which is based......). The results of this study suggest that our proposed methodology is specialized to deal with the classification problem of highly imbalanced classes with significant overlap....... on the portioning of information space; and (2) use of the genetic algorithm to solve combinatorial problems for classification. In particular, we will implement our methodology to solve complex classification problems and compare the performance of our classifier with other well-known methods (SVM, KNN, and ANN...

  20. SAR images classification method based on Dempster-Shafer theory and kernel estimate

    Institute of Scientific and Technical Information of China (English)

    He Chu; Xia Guisong; Sun Hong

    2007-01-01

    To study the scene classification in the Synthetic Aperture Radar (SAR) image, a novel method based on kernel estimate, with the Markov context and Dempster-Shafer evidence theory is proposed.Initially, a nonparametric Probability Density Function (PDF) estimate method is introduced, to describe the scene of SAR images.And then under the Markov context, both the determinate PDF and the kernel estimate method are adopted respectively, to form a primary classification.Next, the primary classification results are fused using the evidence theory in an unsupervised way to get the scene classification.Finally, a regularization step is used, in which an iterated maximum selecting approach is introduced to control the fragments and modify the errors of the classification.Use of the kernel estimate and evidence theory can describe the complicated scenes with little prior knowledge and eliminate the ambiguities of the primary classification results.Experimental results on real SAR images illustrate a rather impressive performance.

  1. Classification of PolSAR image based on quotient space theory

    Science.gov (United States)

    An, Zhihui; Yu, Jie; Liu, Xiaomeng; Liu, Limin; Jiao, Shuai; Zhu, Teng; Wang, Shaohua

    2015-12-01

    In order to improve the classification accuracy, quotient space theory was applied in the classification of polarimetric SAR (PolSAR) image. Firstly, Yamaguchi decomposition method is adopted, which can get the polarimetric characteristic of the image. At the same time, Gray level Co-occurrence Matrix (GLCM) and Gabor wavelet are used to get texture feature, respectively. Secondly, combined with texture feature and polarimetric characteristic, Support Vector Machine (SVM) classifier is used for initial classification to establish different granularity spaces. Finally, according to the quotient space granularity synthetic theory, we merge and reason the different quotient spaces to get the comprehensive classification result. Method proposed in this paper is tested with L-band AIRSAR of San Francisco bay. The result shows that the comprehensive classification result based on the theory of quotient space is superior to the classification result of single granularity space.

  2. Deep-Learning-Based Classification for DTM Extraction from ALS Point Cloud

    Directory of Open Access Journals (Sweden)

    Xiangyun Hu

    2016-09-01

    Full Text Available Airborne laser scanning (ALS point cloud data are suitable for digital terrain model (DTM extraction given its high accuracy in elevation. Existing filtering algorithms that eliminate non-ground points mostly depend on terrain feature assumptions or representations; these assumptions result in errors when the scene is complex. This paper proposes a new method for ground point extraction based on deep learning using deep convolutional neural networks (CNN. For every point with spatial context, the neighboring points within a window are extracted and transformed into an image. Then, the classification of a point can be treated as the classification of an image; the point-to-image transformation is carefully crafted by considering the height information in the neighborhood area. After being trained on approximately 17 million labeled ALS points, the deep CNN model can learn how a human operator recognizes a point as a ground point or not. The model performs better than typical existing algorithms in terms of error rate, indicating the significant potential of deep-learning-based methods in feature extraction from a point cloud.

  3. [A cold/heat property classification strategy based on bio-effects of herbal medicines].

    Science.gov (United States)

    Jiang, Miao; Lv, Ai-Ping

    2014-06-01

    The property theory of Chinese herbal medicine (CHM) is regarded as the core and basic of Chinese medical theory, however, the underlying mechanism of the properties in CHMs remains unclear, which impedes a barrier for the modernization of Chinese herbal medicine. The properties of CHM are often categorized into cold and heat according to the theory of Chinese medicine, which are essential to guide the clinical application of CHMs. There is an urgent demand to build a cold/heat property classification model to facilitate the property theory of Chinese herbal medicine, as well as to clarify the controversial properties of some herbs. Based on previous studies on the cold/heat properties of CHM, in this paper, we described a novel strategy on building a cold/heat property classification model based on herbal bio-effect. The interdisciplinary cooperation of systems biology, pharmacological network, and pattern recognition technique might lighten the study on cold/heat property theory, provide a scientific model for determination the cold/heat property of herbal medicines, and a new strategy for expanding the Chinese herbal medicine resources as well.

  4. Review of Remotely Sensed Imagery Classification Patterns Based on Object-oriented Image Analysis

    Institute of Scientific and Technical Information of China (English)

    LIU Yongxue; LI Manchun; MAO Liang; XU Feifei; HUANG Shuo

    2006-01-01

    With the wide use of high-resolution remotely sensed imagery, the object-oriented remotely sensed information classification pattern has been intensively studied. Starting with the definition of object-oriented remotely sensed information classification pattern and a literature review of related research progress, this paper sums up 4 developing phases of object-oriented classification pattern during the past 20 years. Then, we discuss the three aspects of methodology in detail, namely remotely sensed imagery segmentation, feature analysis and feature selection, and classification rule generation, through comparing them with remotely sensed information classification method based on per-pixel. At last, this paper presents several points that need to be paid attention to in the future studies on object-oriented RS information classification pattern: 1) developing robust and highly effective image segmentation algorithm for multi-spectral RS imagery; 2) improving the feature-set including edge, spatial-adjacent and temporal characteristics; 3) discussing the classification rule generation classifier based on the decision tree; 4) presenting evaluation methods for classification result by object-oriented classification pattern.

  5. Land cover classification using random forest with genetic algorithm-based parameter optimization

    Science.gov (United States)

    Ming, Dongping; Zhou, Tianning; Wang, Min; Tan, Tian

    2016-07-01

    Land cover classification based on remote sensing imagery is an important means to monitor, evaluate, and manage land resources. However, it requires robust classification methods that allow accurate mapping of complex land cover categories. Random forest (RF) is a powerful machine-learning classifier that can be used in land remote sensing. However, two important parameters of RF classification, namely, the number of trees and the number of variables tried at each split, affect classification accuracy. Thus, optimal parameter selection is an inevitable problem in RF-based image classification. This study uses the genetic algorithm (GA) to optimize the two parameters of RF to produce optimal land cover classification accuracy. HJ-1B CCD2 image data are used to classify six different land cover categories in Changping, Beijing, China. Experimental results show that GA-RF can avoid arbitrariness in the selection of parameters. The experiments also compare land cover classification results by using GA-RF method, traditional RF method (with default parameters), and support vector machine method. When the GA-RF method is used, classification accuracies, respectively, improved by 1.02% and 6.64%. The comparison results show that GA-RF is a feasible solution for land cover classification without compromising accuracy or incurring excessive time.

  6. Classification and Identification of Over-voltage Based on HHT and SVM

    Institute of Scientific and Technical Information of China (English)

    WANG Jing; YANG Qing; CHEN Lin; SIMA Wenxia

    2012-01-01

    This paper proposes an effective method for over-voltage classification based on the Hilbert-Huang transform(HHT) method.Hilbert-Huang transform method is composed of empirical mode decomposition(EMD) and Hilbert transform.Nine kinds of common power system over-voltages are calculated and analyzed by HHT.Based on the instantaneous amplitude spectrum,Hilbert marginal spectrum and Hilbert time-frequency spectrum,three kinds of over-voltage characteristic quantities are obtained.A hierarchical classification system is built based on HHT and support vector machine(SVM).This classification system is tested by 106 field over-voltage signals,and the average classification rate is 94.3%.This research shows that HHT is an effective time-frequency analysis algorithms in the application of over-voltage classification and identification.

  7. 基于自回归模型和关联向量机的癫痫脑电信号自动分类%Automatic Classification of Epileptic EEG Signals Based on AR Model and Relevance Vector Machine

    Institute of Scientific and Technical Information of China (English)

    韩敏; 孙磊磊; 洪晓军; 韩杰

    2011-01-01

    癫痫脑电信号自动分类方法的研究具有重要意义.基于自回归模型和关联向量机,实现癫痫脑电信号的自动分类.采用自回归模型,进行脑电信号特征提取;通过引入主成分分析和线性判别分析两种特征变换方法,降低特征空间维数;采用关联向量机作为分类器,提高模型稀疏性并可以得到概率式输出.在对波恩大学癫痫研究中心脑电信号的分类中,所提出的方法最高准确率可以达到99.875%;在将特征空间维数降至原始维数的1/15时,分类准确率仍可达到99.500%;采用关联向量机作为分类器,模型稀疏性大幅提高,与支持向量机相比,同等条件下关联向量数仅为支持向量数的几十分之一.所提方法可以很好地应用于癫痫脑电信号的自动分类.%Automatic classification system of epileptic EEG signals is one very important issue. In this paper a new epileptic EEG signal classification method was proposed on the basis of AR model and relevance vector machine. AR model was used to extract EEG features, and then principle components analysis and linear discriminant analysis were adopted to reduce the dimensionality of feature space. In order to obtain a sparser model and a model with probabilistic outputs, relevance vector machine was chosen as classifier. A publicly-available database was used to test the proposed method: the highest accuracy obtained in this paper is 99. 875% ; and even if the dimensionality of feature space is reduced to 1/15 of the original dimensionality, the classification accuracy was still able to reach 99. 500% . The introduction of relevance vector machine makes the model sparser; the number of relevance vectors is just a few tenths of that of support vectors. The results mentioned above suggest that the method can be well applied in epileptic EEG signal classification.

  8. A Bayesian Based Search and Classification System for Product Information of Agricultural Logistics Information Technology

    OpenAIRE

    2011-01-01

    Part 1: Decision Support Systems, Intelligent Systems and Artificial Intelligence Applications; International audience; In order to meet the needs of users who search agricultural products logistics information technology, this paper introduces a search and classification system of agricultural products logistics information technology search and classification. Firstly, the dictionary of field concept word was built based on analyzing the characteristics of agricultural products logistics in...

  9. A Kernel-Based Nonlinear Representor with Application to Eigenface Classification

    Institute of Scientific and Technical Information of China (English)

    ZHANG Jing; LIU Ben-yong; TAN Hao

    2004-01-01

    This paper presents a classifier named kernel-based nonlinear representor (KNR) for optimal representation of pattern features. Adopting the Gaussian kernel, with the kernel width adaptively estimated by a simple technique, it is applied to eigenface classification. Experimental results on the ORL face database show that it improves performance by around 6 points, in classification rate, over the Euclidean distance classifier.

  10. Initial steps towards an evidence-based classification system for golfers with a physical impairment

    NARCIS (Netherlands)

    Stoter, Inge K; Hettinga, Florentina J; Altmann, Viola; Eisma, Wim; Arendzen, Hans; Bennett, Tony; van der Woude, Lucas H; Dekker, Rienk

    2015-01-01

    PURPOSE: The present narrative review aims to make a first step towards an evidence-based classification system in handigolf following the International Paralympic Committee (IPC). It intends to create a conceptual framework of classification for handigolf and an agenda for future research. METHOD:

  11. Initial steps towards an evidence-based classification system for golfers with a physical impairment

    NARCIS (Netherlands)

    Stoter, Inge K.; Hettinga, Florentina J.; Altmann, Viola; Eisma, Wim; Arendzen, Hans; Bennett, Tony; van der Woude, Lucas H.; Dekker, Rienk

    2017-01-01

    Purpose: The present narrative review aims to make a first step towards an evidence-based classification system in handigolf following the International Paralympic Committee (IPC). It intends to create a conceptual framework of classification for handigolf and an agenda for future research. Method:

  12. Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification

    Science.gov (United States)

    Sladojevic, Srdjan; Arsenovic, Marko; Culibrk, Dubravko; Stefanovic, Darko

    2016-01-01

    The latest generation of convolutional neural networks (CNNs) has achieved impressive results in the field of image classification. This paper is concerned with a new approach to the development of plant disease recognition model, based on leaf image classification, by the use of deep convolutional networks. Novel way of training and the methodology used facilitate a quick and easy system implementation in practice. The developed model is able to recognize 13 different types of plant diseases out of healthy leaves, with the ability to distinguish plant leaves from their surroundings. According to our knowledge, this method for plant disease recognition has been proposed for the first time. All essential steps required for implementing this disease recognition model are fully described throughout the paper, starting from gathering images in order to create a database, assessed by agricultural experts. Caffe, a deep learning framework developed by Berkley Vision and Learning Centre, was used to perform the deep CNN training. The experimental results on the developed model achieved precision between 91% and 98%, for separate class tests, on average 96.3%. PMID:27418923

  13. Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification

    Directory of Open Access Journals (Sweden)

    Srdjan Sladojevic

    2016-01-01

    Full Text Available The latest generation of convolutional neural networks (CNNs has achieved impressive results in the field of image classification. This paper is concerned with a new approach to the development of plant disease recognition model, based on leaf image classification, by the use of deep convolutional networks. Novel way of training and the methodology used facilitate a quick and easy system implementation in practice. The developed model is able to recognize 13 different types of plant diseases out of healthy leaves, with the ability to distinguish plant leaves from their surroundings. According to our knowledge, this method for plant disease recognition has been proposed for the first time. All essential steps required for implementing this disease recognition model are fully described throughout the paper, starting from gathering images in order to create a database, assessed by agricultural experts. Caffe, a deep learning framework developed by Berkley Vision and Learning Centre, was used to perform the deep CNN training. The experimental results on the developed model achieved precision between 91% and 98%, for separate class tests, on average 96.3%.

  14. Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes

    Directory of Open Access Journals (Sweden)

    Eils Roland

    2005-11-01

    Full Text Available Abstract Background The extensive use of DNA microarray technology in the characterization of the cell transcriptome is leading to an ever increasing amount of microarray data from cancer studies. Although similar questions for the same type of cancer are addressed in these different studies, a comparative analysis of their results is hampered by the use of heterogeneous microarray platforms and analysis methods. Results In contrast to a meta-analysis approach where results of different studies are combined on an interpretative level, we investigate here how to directly integrate raw microarray data from different studies for the purpose of supervised classification analysis. We use median rank scores and quantile discretization to derive numerically comparable measures of gene expression from different platforms. These transformed data are then used for training of classifiers based on support vector machines. We apply this approach to six publicly available cancer microarray gene expression data sets, which consist of three pairs of studies, each examining the same type of cancer, i.e. breast cancer, prostate cancer or acute myeloid leukemia. For each pair, one study was performed by means of cDNA microarrays and the other by means of oligonucleotide microarrays. In each pair, high classification accuracies (> 85% were achieved with training and testing on data instances randomly chosen from both data sets in a cross-validation analysis. To exemplify the potential of this cross-platform classification analysis, we use two leukemia microarray data sets to show that important genes with regard to the biology of leukemia are selected in an integrated analysis, which are missed in either single-set analysis. Conclusion Cross-platform classification of multiple cancer microarray data sets yields discriminative gene expression signatures that are found and validated on a large number of microarray samples, generated by different laboratories and

  15. Extreme learning machine-based classification of ADHD using brain structural MRI data.

    Directory of Open Access Journals (Sweden)

    Xiaolong Peng

    Full Text Available BACKGROUND: Effective and accurate diagnosis of attention-deficit/hyperactivity disorder (ADHD is currently of significant interest. ADHD has been associated with multiple cortical features from structural MRI data. However, most existing learning algorithms for ADHD identification contain obvious defects, such as time-consuming training, parameters selection, etc. The aims of this study were as follows: (1 Propose an ADHD classification model using the extreme learning machine (ELM algorithm for automatic, efficient and objective clinical ADHD diagnosis. (2 Assess the computational efficiency and the effect of sample size on both ELM and support vector machine (SVM methods and analyze which brain segments are involved in ADHD. METHODS: High-resolution three-dimensional MR images were acquired from 55 ADHD subjects and 55 healthy controls. Multiple brain measures (cortical thickness, etc. were calculated using a fully automated procedure in the FreeSurfer software package. In total, 340 cortical features were automatically extracted from 68 brain segments with 5 basic cortical features. F-score and SFS methods were adopted to select the optimal features for ADHD classification. Both ELM and SVM were evaluated for classification accuracy using leave-one-out cross-validation. RESULTS: We achieved ADHD prediction accuracies of 90.18% for ELM using eleven combined features, 84.73% for SVM-Linear and 86.55% for SVM-RBF. Our results show that ELM has better computational efficiency and is more robust as sample size changes than is SVM for ADHD classification. The most pronounced differences between ADHD and healthy subjects were observed in the frontal lobe, temporal lobe, occipital lobe and insular. CONCLUSION: Our ELM-based algorithm for ADHD diagnosis performs considerably better than the traditional SVM algorithm. This result suggests that ELM may be used for the clinical diagnosis of ADHD and the investigation of different brain diseases.

  16. Material classification and automatic content enrichment of images using supervised learning and knowledge bases

    Science.gov (United States)

    Mallepudi, Sri Abhishikth; Calix, Ricardo A.; Knapp, Gerald M.

    2011-02-01

    In recent years there has been a rapid increase in the size of video and image databases. Effective searching and retrieving of images from these databases is a significant current research area. In particular, there is a growing interest in query capabilities based on semantic image features such as objects, locations, and materials, known as content-based image retrieval. This study investigated mechanisms for identifying materials present in an image. These capabilities provide additional information impacting conditional probabilities about images (e.g. objects made of steel are more likely to be buildings). These capabilities are useful in Building Information Modeling (BIM) and in automatic enrichment of images. I2T methodologies are a way to enrich an image by generating text descriptions based on image analysis. In this work, a learning model is trained to detect certain materials in images. To train the model, an image dataset was constructed containing single material images of bricks, cloth, grass, sand, stones, and wood. For generalization purposes, an additional set of 50 images containing multiple materials (some not used in training) was constructed. Two different supervised learning classification models were investigated: a single multi-class SVM classifier, and multiple binary SVM classifiers (one per material). Image features included Gabor filter parameters for texture, and color histogram data for RGB components. All classification accuracy scores using the SVM-based method were above 85%. The second model helped in gathering more information from the images since it assigned multiple classes to the images. A framework for the I2T methodology is presented.

  17. Simulation of Single Chip Microcomputer Efficient Scheduling Model Based on Partition Thinking Classification%基于分区思维分类下的单片机节能调度模型仿真

    Institute of Scientific and Technical Information of China (English)

    马宏骞

    2015-01-01

    In order to lower the total energy consumption of single chip microcomputer system scheduling and put forward a partition thinking under the classification fusion energy-saving scheduling method of traveling salesman algorithm and genetic algorithm. Analyzes the single chip microcomputer scheduling three key part of the total energy consumption, process switching transition energy consumption, adjust the energy consumption of energy consumption and stable operation, mold the MCU to complete the transition process scheduling model of energy consumption, smooth process based on single mode for the node, transition mode for branch, build SCM process scheduling total energy consumption of the directed graph model, a single set of process energy consumption optimization process as a classical traveling salesman problem, through improved step by step a multi-objective genetic algorithm (ga) and traveling salesman algorithm path optimization principle, query the best processing parameters under different SCM process scheduling and the best production order of multiple processes, which lower the total energy consumption in the SCM process scheduling. Experimental results indicate that the proposed model can improve the efficiency of single chip microcomputer process scheduling, reduce energy consumption of scheduling.%为了降低单片机系统调度过程的总能耗,提出一种分区思维分类下融合旅行商算法以及遗传算法的节能调度方法。基于分区思维分类方法,将单片机进程调度总能耗,划分成进程切换能耗、进程过渡调整能耗以及进程稳定调度能耗,将单片机进程节能调度问题,转化成单片机多进程调度的能耗优化问题,将单进程调度平稳模态作为节点、进程调度过渡模态作为支路,构建单片机进程调度总能耗的有向图模型,将单片机进程能耗优化过程看成旅行商问题,通过遗传算法的多目标逐层改进以及旅行商算法

  18. Quantitative measurement of retinal ganglion cell populations via histology-based random forest classification.

    Science.gov (United States)

    Hedberg-Buenz, Adam; Christopher, Mark A; Lewis, Carly J; Fernandes, Kimberly A; Dutca, Laura M; Wang, Kai; Scheetz, Todd E; Abràmoff, Michael D; Libby, Richard T; Garvin, Mona K; Anderson, Michael G

    2016-05-01

    The inner surface of the retina contains a complex mixture of neurons, glia, and vasculature, including retinal ganglion cells (RGCs), the final output neurons of the retina and primary neurons that are damaged in several blinding diseases. The goal of the current work was two-fold: to assess the feasibility of using computer-assisted detection of nuclei and random forest classification to automate the quantification of RGCs in hematoxylin/eosin (H&E)-stained retinal whole-mounts; and if possible, to use the approach to examine how nuclear size influences disease susceptibility among RGC populations. To achieve this, data from RetFM-J, a semi-automated ImageJ-based module that detects, counts, and collects quantitative data on nuclei of H&E-stained whole-mounted retinas, were used in conjunction with a manually curated set of images to train a random forest classifier. To test performance, computer-derived outputs were compared to previously published features of several well-characterized mouse models of ophthalmic disease and their controls: normal C57BL/6J mice; Jun-sufficient and Jun-deficient mice subjected to controlled optic nerve crush (CONC); and DBA/2J mice with naturally occurring glaucoma. The result of these efforts was development of RetFM-Class, a command-line-based tool that uses data output from RetFM-J to perform random forest classification of cell type. Comparative testing revealed that manual and automated classifications by RetFM-Class correlated well, with 83.2% classification accuracy for RGCs. Automated characterization of C57BL/6J retinas predicted 54,642 RGCs per normal retina, and identified a 48.3% Jun-dependent loss of cells at 35 days post CONC and a 71.2% loss of RGCs among 16-month-old DBA/2J mice with glaucoma. Output from automated analyses was used to compare nuclear area among large numbers of RGCs from DBA/2J mice (n = 127,361). In aged DBA/2J mice with glaucoma, RetFM-Class detected a decrease in median and mean nucleus size

  19. Multi-label literature classification based on the Gene Ontology graph

    Directory of Open Access Journals (Sweden)

    Lu Xinghua

    2008-12-01

    Full Text Available Abstract Background The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification meth