WorldWideScience

Sample records for feature selection algorithm

  1. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    LiuYongguo; LiXueming; 等

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database.However,some redundant irrelevant attributes,which result in low performance and high computing complexity,are included in the very large database in general.So,Feature Selection(FSS)becomes one important issue in the field of data mining.In this letter,an Fss model based on the filter approach is built,which uses the simulated annealing gentic algorithm.Experimental results show that convergence and stability of this algorithm are adequately achieved.

  2. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Liu Yongguo; Li Xueming; Wu Zhongfu

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing complexity, are included in the very large database in general.So, Feature Subset Selection (FSS) becomes one important issue in the field of data mining. In this letter, an FSS model based on the filter approach is built, which uses the simulated annealing genetic algorithm. Experimental results show that convergence and stability of this algorithm are adequately achieved.

  3. Feature Selection: Algorithms and Challenges

    Institute of Scientific and Technical Information of China (English)

    Xindong Wu; Yanglan Gan; Hao Wang; Xuegang Hu

    2006-01-01

    Feature selection is an active area in data mining research and development. It consists of efforts and contributions from a wide variety of communities, including statistics, machine learning, and pattern recognition. The diversity, on one hand, equips us with many methods and tools. On the other hand, the profusion of options causes confusion. This paper reviews various feature selection methods and identifies research challenges that are at the forefront of this exciting area.

  4. CBFS: high performance feature selection algorithm based on feature clearness.

    Directory of Open Access Journals (Sweden)

    Minseok Seo

    Full Text Available BACKGROUND: The goal of feature selection is to select useful features and simultaneously exclude garbage features from a given dataset for classification purposes. This is expected to bring reduction of processing time and improvement of classification accuracy. METHODOLOGY: In this study, we devised a new feature selection algorithm (CBFS based on clearness of features. Feature clearness expresses separability among classes in a feature. Highly clear features contribute towards obtaining high classification accuracy. CScore is a measure to score clearness of each feature and is based on clustered samples to centroid of classes in a feature. We also suggest combining CBFS and other algorithms to improve classification accuracy. CONCLUSIONS/SIGNIFICANCE: From the experiment we confirm that CBFS is more excellent than up-to-date feature selection algorithms including FeaLect. CBFS can be applied to microarray gene selection, text categorization, and image classification.

  5. A Genetic Algorithm-Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Babatunde Oluleye

    2014-07-01

    Full Text Available This article details the exploration and application of Genetic Algorithm (GA for feature selection. Particularly a binary GA was used for dimensionality reduction to enhance the performance of the concerned classifiers. In this work, hundred (100 features were extracted from set of images found in the Flavia dataset (a publicly available dataset. The extracted features are Zernike Moments (ZM, Fourier Descriptors (FD, Lengendre Moments (LM, Hu 7 Moments (Hu7M, Texture Properties (TP and Geometrical Properties (GP. The main contributions of this article are (1 detailed documentation of the GA Toolbox in MATLAB and (2 the development of a GA-based feature selector using a novel fitness function (kNN-based classification error which enabled the GA to obtain a combinatorial set of feature giving rise to optimal accuracy. The results obtained were compared with various feature selectors from WEKA software and obtained better results in many ways than WEKA feature selectors in terms of classification accuracy

  6. Novel Feature Selection by Differential Evolution Algorithm

    Directory of Open Access Journals (Sweden)

    Ali Ghareaghaji

    2013-11-01

    Full Text Available Iris scan biometrics employs the unique characteristic and features of the human iris in order to verify the identity of in individual. In today's world, where terrorist attacks are on the rise employment of infallible security systems is a must. This makes Iris recognition systems unavoidable in emerging security. Authentication the objective function is minimized using Differential Evolutionary (DE Algorithm where the population vector is encoded using Binary Encoded Decimal to avoid the float number optimization problem. An automatic clustering of the possible values of the Lagrangian multiplier provides a detailed insight of the selected features during the proposed DE based optimization process. The classification accuracy of Support Vector Machine (SVM is used to measure the performance of the selected features. The proposed algorithm outperforms the existing DE based approaches when tested on IRIS, Wine, Wisconsin Breast Cancer, Sonar and Ionosphere datasets. The same algorithm when applied on gait based people identification, using skeleton data points obtained from Microsoft Kinect sensor, exceeds the previously reported accuracies.

  7. Naive Bayes-Guided Bat Algorithm for Feature Selection

    Directory of Open Access Journals (Sweden)

    Ahmed Majid Taha

    2013-01-01

    Full Text Available When the amount of data and information is said to double in every 20 months or so, feature selection has become highly important and beneficial. Further improvements in feature selection will positively affect a wide array of applications in fields such as pattern recognition, machine learning, or signal processing. Bio-inspired method called Bat Algorithm hybridized with a Naive Bayes classifier has been presented in this work. The performance of the proposed feature selection algorithm was investigated using twelve benchmark datasets from different domains and was compared to three other well-known feature selection algorithms. Discussion focused on four perspectives: number of features, classification accuracy, stability, and feature generalization. The results showed that BANB significantly outperformed other algorithms in selecting lower number of features, hence removing irrelevant, redundant, or noisy features while maintaining the classification accuracy. BANB is also proven to be more stable than other methods and is capable of producing more general feature subsets.

  8. FEATURE SELECTION USING GENETIC ALGORITHMS FOR HANDWRITTEN CHARACTER RECOGNITION

    NARCIS (Netherlands)

    Kim, G.; Kim, S.

    2004-01-01

    A feature selection method using genetic algorithms which are suitable means for selecting appropriate set of features from ones with huge dimension is proposed. SGA (Simple Genetic Algorithm) and its modified methods are applied to improve the recognition speed as well as the recognition accuracy.

  9. Lazy learner text categorization algorithm based on embedded feature selection

    Institute of Scientific and Technical Information of China (English)

    Yan Peng; Zheng Xuefeng; Zhu Jianyong; Xiao Yunhong

    2009-01-01

    To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.

  10. Feature Selection Criteria for Real Time EKF-SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando Auat Cheein

    2010-02-01

    Full Text Available This paper presents a seletion procedure for environmet features for the correction stage of a SLAM (Simultaneous Localization and Mapping algorithm based on an Extended Kalman Filter (EKF. This approach decreases the computational time of the correction stage which allows for real and constant-time implementations of the SLAM. The selection procedure consists in chosing the features the SLAM system state covariance is more sensible to. The entire system is implemented on a mobile robot equipped with a range sensor laser. The features extracted from the environment correspond to lines and corners. Experimental results of the real time SLAM algorithm and an analysis of the processing-time consumed by the SLAM with the feature selection procedure proposed are shown. A comparison between the feature selection approach proposed and the classical sequential EKF-SLAM along with an entropy feature selection approach is also performed.

  11. Protein fold classification with genetic algorithms and feature selection.

    Science.gov (United States)

    Chen, Peng; Liu, Chunmei; Burge, Legand; Mahmood, Mohammad; Southerland, William; Gloster, Clay

    2009-10-01

    Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors.

  12. Feature selection for optimized skin tumor recognition using genetic algorithms.

    Science.gov (United States)

    Handels, H; Ross, T; Kreusch, J; Wolff, H H; Pöppl, S J

    1999-07-01

    In this paper, a new approach to computer supported diagnosis of skin tumors in dermatology is presented. High resolution skin surface profiles are analyzed to recognize malignant melanomas and nevocytic nevi (moles), automatically. In the first step, several types of features are extracted by 2D image analysis methods characterizing the structure of skin surface profiles: texture features based on cooccurrence matrices, Fourier features and fractal features. Then, feature selection algorithms are applied to determine suitable feature subsets for the recognition process. Feature selection is described as an optimization problem and several approaches including heuristic strategies, greedy and genetic algorithms are compared. As quality measure for feature subsets, the classification rate of the nearest neighbor classifier computed with the leaving-one-out method is used. Genetic algorithms show the best results. Finally, neural networks with error back-propagation as learning paradigm are trained using the selected feature sets. Different network topologies, learning parameters and pruning algorithms are investigated to optimize the classification performance of the neural classifiers. With the optimized recognition system a classification performance of 97.7% is achieved.

  13. Feature Selection for Image Retrieval based on Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Preeti Kushwaha

    2016-12-01

    Full Text Available This paper describes the development and implementation of feature selection for content based image retrieval. We are working on CBIR system with new efficient technique. In this system, we use multi feature extraction such as colour, texture and shape. The three techniques are used for feature extraction such as colour moment, gray level co- occurrence matrix and edge histogram descriptor. To reduce curse of dimensionality and find best optimal features from feature set using feature selection based on genetic algorithm. These features are divided into similar image classes using clustering for fast retrieval and improve the execution time. Clustering technique is done by k-means algorithm. The experimental result shows feature selection using GA reduces the time for retrieval and also increases the retrieval precision, thus it gives better and faster results as compared to normal image retrieval system. The result also shows precision and recall of proposed approach compared to previous approach for each image class. The CBIR system is more efficient and better performs using feature selection based on Genetic Algorithm.

  14. Review and Evaluation of Feature Selection Algorithms in Synthetic Problems

    CERN Document Server

    Belanche, L A

    2011-01-01

    The main purpose of Feature Subset Selection is to find a reduced subset of attributes from a data set described by a feature set. The task of a feature selection algorithm (FSA) is to provide with a computational solution motivated by a certain definition of relevance or by a reliable evaluation measure. In this paper several fundamental algorithms are studied to assess their performance in a controlled experimental scenario. A measure to evaluate FSAs is devised that computes the degree of matching between the output given by a FSA and the known optimal solutions. An extensive experimental study on synthetic problems is carried out to assess the behaviour of the algorithms in terms of solution accuracy and size as a function of the relevance, irrelevance, redundancy and size of the data samples. The controlled experimental conditions facilitate the derivation of better-supported and meaningful conclusions.

  15. Feature selection using genetic algorithms for fetal heart rate analysis.

    Science.gov (United States)

    Xu, Liang; Redman, Christopher W G; Payne, Stephen J; Georgieva, Antoniya

    2014-07-01

    The fetal heart rate (FHR) is monitored on a paper strip (cardiotocogram) during labour to assess fetal health. If necessary, clinicians can intervene and assist with a prompt delivery of the baby. Data-driven computerized FHR analysis could help clinicians in the decision-making process. However, selecting the best computerized FHR features that relate to labour outcome is a pressing research problem. The objective of this study is to apply genetic algorithms (GA) as a feature selection method to select the best feature subset from 64 FHR features and to integrate these best features to recognize unfavourable FHR patterns. The GA was trained on 404 cases and tested on 106 cases (both balanced datasets) using three classifiers, respectively. Regularization methods and backward selection were used to optimize the GA. Reasonable classification performance is shown on the testing set for the best feature subset (Cohen's kappa values of 0.45 to 0.49 using different classifiers). This is, to our knowledge, the first time that a feature selection method for FHR analysis has been developed on a database of this size. This study indicates that different FHR features, when integrated, can show good performance in predicting labour outcome. It also gives the importance of each feature, which will be a valuable reference point for further studies.

  16. Online Feature Selection of Class Imbalance via PA Algorithm

    Institute of Scientific and Technical Information of China (English)

    Chao Han; Yun-Kun Tan; Jin-Hui Zhu; Yong Guo; Jian Chen; Qing-Yao Wu

    2016-01-01

    Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.

  17. Improving permafrost distribution modelling using feature selection algorithms

    Science.gov (United States)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2016-04-01

    The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its

  18. HYBRID FEATURE SELECTION ALGORITHM FOR INTRUSION DETECTION SYSTEM

    Directory of Open Access Journals (Sweden)

    Seyed Reza Hasani

    2014-01-01

    Full Text Available Network security is a serious global concern. Usefulness Intrusion Detection Systems (IDS are increasing incredibly in Information Security research using Soft computing techniques. In the previous researches having irrelevant and redundant features are recognized causes of increasing the processing speed of evaluating the known intrusive patterns. In addition, an efficient feature selection method eliminates dimension of data and reduce redundancy and ambiguity caused by none important attributes. Therefore, feature selection methods are well-known methods to overcome this problem. There are various approaches being utilized in intrusion detections, they are able to perform their method and relatively they are achieved with some improvements. This work is based on the enhancement of the highest Detection Rate (DR algorithm which is Linear Genetic Programming (LGP reducing the False Alarm Rate (FAR incorporates with Bees Algorithm. Finally, Support Vector Machine (SVM is one of the best candidate solutions to settle IDSs problems. In this study four sample dataset containing 4000 random records are excluded randomly from this dataset for training and testing purposes. Experimental results show that the LGP_BA method improves the accuracy and efficiency compared with the previous related research and the feature subcategory offered by LGP_BA gives a superior representation of data.

  19. Feature selection for face recognition: a memetic algorithmic approach

    Institute of Scientific and Technical Information of China (English)

    Dinesh KUMAR; Shakti KUMAR; C. S. RAI

    2009-01-01

    The eigenface method that uses principal component analysis (PCA) has been the standard and popular method used in face recognition. This paper presents a PCA-memetic algorithm (PCA-MA) approach for feature selection. PCA has been extended by MAs where the former was used for feature extraction/dimensionality reduction and the latter exploited for feature selection. Simulations were performed over ORL and YaleB face databases using Euclidean norm as the classifier. It was found that as far as the recognition rate is concerned, PCA-MA completely outperforms the eigenface method. We compared the performance of PCA extended with genetic algorithm (PCA-GA) with our proposed PCA-MA method. The results also clearly established the supremacy of the PCA-MA method over the PCA-GA method. We further extended linear discriminant analysis (LDA) and kernel principal component analysis (KPCA) approaches with the MA and observed significant improvement in recognition rate with fewer features. This paper also compares the performance of PCA-MA, LDA-MA and KPCA-MA approaches.

  20. Use of genetic algorithm for the selection of EEG features

    Science.gov (United States)

    Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.

    2015-09-01

    Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.

  1. Using PSO-Based Hierarchical Feature Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Zhiwei Ji

    2014-01-01

    Full Text Available Hepatocellular carcinoma (HCC is one of the most common malignant tumors. Clinical symptoms attributable to HCC are usually absent, thus often miss the best therapeutic opportunities. Traditional Chinese Medicine (TCM plays an active role in diagnosis and treatment of HCC. In this paper, we proposed a particle swarm optimization-based hierarchical feature selection (PSOHFS model to infer potential syndromes for diagnosis of HCC. Firstly, the hierarchical feature representation is developed by a three-layer tree. The clinical symptoms and positive score of patient are leaf nodes and root in the tree, respectively, while each syndrome feature on the middle layer is extracted from a group of symptoms. Secondly, an improved PSO-based algorithm is applied in a new reduced feature space to search an optimal syndrome subset. Based on the result of feature selection, the causal relationships of symptoms and syndromes are inferred via Bayesian networks. In our experiment, 147 symptoms were aggregated into 27 groups and 27 syndrome features were extracted. The proposed approach discovered 24 syndromes which obviously improved the diagnosis accuracy. Finally, the Bayesian approach was applied to represent the causal relationships both at symptom and syndrome levels. The results show that our computational model can facilitate the clinical diagnosis of HCC.

  2. Feature Subset Selection by Estimation of Distribution Algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E

    2002-01-17

    This paper describes the application of four evolutionary algorithms to the identification of feature subsets for classification problems. Besides a simple GA, the paper considers three estimation of distribution algorithms (EDAs): a compact GA, an extended compact GA, and the Bayesian Optimization Algorithm. The objective is to determine if the EDAs present advantages over the simple GA in terms of accuracy or speed in this problem. The experiments used a Naive Bayes classifier and public-domain and artificial data sets. In contrast with previous studies, we did not find evidence to support or reject the use of EDAs for this problem.

  3. A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks

    Data.gov (United States)

    National Aeronautics and Space Administration — In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used...

  4. Evaluation of Meta-Heuristic Algorithms for Stable Feature Selection

    Directory of Open Access Journals (Sweden)

    Maysam Toghraee

    2016-07-01

    Full Text Available Now a days, developing the science and technology and technology tools, the ability of reviewing and saving the important data has been provided. It is needed to have knowledge for searching the data to reach the necessary useful results. Data mining is searching for big data sources automatically to find patterns and dependencies which are not done by simple statistical analysis. The scope is to study the predictive role and usage domain of data mining in medical science and suggesting a frame for creating, assessing and exploiting the data mining patterns in this field. As it has been found out from previous researches that assessing methods can not be used to specify the data discrepancies, our suggestion is a new approach for assessing the data similarities to find out the relations between the variation in data and stability in selection. Therefore we have chosen meta heuristic methods to be able to choose the best and the stable algorithms among a set of algorithms

  5. Feature selection method based on multi-fractal dimension and harmony search algorithm and its application

    Science.gov (United States)

    Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na

    2016-10-01

    Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.

  6. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    Science.gov (United States)

    Wang, Xiaojia; Mao, Qirong; Zhan, Yongzhao

    2008-11-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  7. A FEATURE SELECTION ALGORITHM DESIGN AND ITS IMPLEMENTATION IN INTRUSION DETECTION SYSTEM

    Institute of Scientific and Technical Information of China (English)

    杨向荣; 沈钧毅

    2003-01-01

    Objective Present a new features selection algorithm. Methods based on rule induction and field knowledge. Results This algorithm can be applied in catching dataflow when detecting network intrusions, only the sub-dataset including discriminating features is catched. Then the time spend in following behavior patterns mining is reduced and the patterns mined are more precise. Conclusion The experiment results show that the feature subset catched by this algorithm is more informative and the dataset's quantity is reduced significantly.

  8. Using genetic algorithms to select and create features for pattern classification. Technical report

    Energy Technology Data Exchange (ETDEWEB)

    Chang, E.I.; Lippmann, R.P.

    1991-03-11

    Genetic algorithms were used to select and create features and to select reference exemplar patterns for machine vision and speech pattern classification tasks. On a 15-feature machine-vision inspection task, it was found that genetic algorithms performed no better than conventional approaches to feature selection but required much more computation. For a speech recognition task, genetic algorithms required no more computation time than traditional approaches but reduced the number of features required by a factor of five (from 153 to 33 features). On a difficult artificial machine-vision task, genetic algorithms were able to create new features (polynomial functions of the original features) that reduced classification error rates from 10 to almost 0 percent. Neural net and nearest-neighbor classifiers were unable to provide such low error rates using only the original features. Genetic algorithms were also used to reduce the number of reference exemplar patterns and to select the value of k for a k-nearest-neighbor classifier. On a .338 training pattern vowel recognition problem with 10 classes, genetic algorithms simultaneously reduced the number of stored exemplars from 338 to 63 and selected k without significantly decreasing classification accuracy. In all applications, genetic algorithms were easy to apply and found good solutions in many fewer trials than would be required by an exhaustive search. Run times were long but not unreasonable. These results suggest that genetic algorithms may soon be practical for pattern classification problems as faster serial and parallel computers are developed.

  9. Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

    Science.gov (United States)

    Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor

    2010-08-01

    Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods.

  10. Analysis of Different Feature Selection Criteria Based on a Covariance Convergence Perspective for a SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando A. Auat Cheein

    2010-12-01

    Full Text Available This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment  composed by trees, although the results shown herein are not restricted to a special type of features.

  11. Analysis of different feature selection criteria based on a covariance convergence perspective for a SLAM algorithm.

    Science.gov (United States)

    Auat Cheein, Fernando A; Carelli, Ricardo

    2011-01-01

    This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM) algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter) SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment composed by trees, although the results shown herein are not restricted to a special type of features.

  12. A Rank Aggregation Algorithm for Ensemble of Multiple Feature Selection Techniques in Credit Risk Evaluation

    Directory of Open Access Journals (Sweden)

    Shashi Dahiya

    2016-10-01

    Full Text Available In credit risk evaluation the accuracy of a classifier is very significant for classifying the high-risk loan applicants correctly. Feature selection is one way of improving the accuracy of a classifier. It provides the classifier with important and relevant features for model development. This study uses the ensemble of multiple feature ranking techniques for feature selection of credit data. It uses five individual rank based feature selection methods. It proposes a novel rank aggregation algorithm for combining the ranks of the individual feature selection methods of the ensemble. This algorithm uses the rank order along with the rank score of the features in the ranked list of each feature selection method for rank aggregation. The ensemble of multiple feature selection techniques uses the novel rank aggregation algorithm and selects the relevant features using the 80%, 60%, 40% and 20% thresholds from the top of the aggregated ranked list for building the C4.5, MLP, C4.5 based Bagging and MLP based Bagging models. It was observed that the performance of models using the ensemble of multiple feature selection techniques is better than the performance of 5 individual rank based feature selection methods. The average performance of all the models was observed as best for the ensemble of feature selection techniques at 60% threshold. Also, the bagging based models outperformed the individual models most significantly for the 60% threshold. This increase in performance is more significant from the fact that the number of features were reduced by 40% for building the highest performing models. This reduces the data dimensions and hence the overall data size phenomenally for model building. The use of the ensemble of feature selection techniques using the novel aggregation algorithm provided more accurate models which are simpler, faster and easy to interpret.

  13. An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem

    Directory of Open Access Journals (Sweden)

    Jing Bian

    2016-01-01

    Full Text Available In the era of big data, feature selection is an essential process in machine learning. Although the class imbalance problem has recently attracted a great deal of attention, little effort has been undertaken to develop feature selection techniques. In addition, most applications involving feature selection focus on classification accuracy but not cost, although costs are important. To cope with imbalance problems, we developed a cost-sensitive feature selection algorithm that adds the cost-based evaluation function of a filter feature selection using a chaos genetic algorithm, referred to as CSFSG. The evaluation function considers both feature-acquiring costs (test costs and misclassification costs in the field of network security, thereby weakening the influence of many instances from the majority of classes in large-scale datasets. The CSFSG algorithm reduces the total cost of feature selection and trades off both factors. The behavior of the CSFSG algorithm is tested on a large-scale dataset of network security, using two kinds of classifiers: C4.5 and k-nearest neighbor (KNN. The results of the experimental research show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

  14. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis.

    Science.gov (United States)

    Al-Rajab, Murad; Lu, Joan; Xu, Qiang

    2017-07-01

    This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Using genetic algorithm feature selection in neural classification systems for image pattern recognition

    Directory of Open Access Journals (Sweden)

    Margarita R. Gamarra A.

    2012-09-01

    Full Text Available Pattern recognition performance depends on variations during extraction, selection and classification stages. This paper presents an approach to feature selection by using genetic algorithms with regard to digital image recognition and quality control. Error rate and kappa coefficient were used for evaluating the genetic algorithm approach Neural networks were used for classification, involving the features selected by the genetic algorithms. The neural network approach was compared to a K-nearest neighbor classifier. The proposed approach performed better than the other methods.

  16. Multi-Objective Feature Subset Selection using Non-dominated Sorting Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    A. Khan

    2015-02-01

    Full Text Available This paper presents an evolutionary algorithm based technique to solve multi-objective feature subset selection problem. The data used for classification contains large number of features called attributes. Some of these attributes are not relevant and needs to be eliminated. In classification procedure, each feature has an effect on the accuracy, cost and learning time of the classifier. So, there is a strong requirement to select a subset of the features before building the classifier. This proposed technique treats feature subset selection as multi-objective optimization problem. This research uses one of the latest multi-objective genetic algorithms (NSGA - II. The fitness value of a particular feature subset is measured by using ID3. The testing accuracy acquired is then assigned to the fitness value. This technique is tested on several datasets taken from the UCI machine repository. The experiments demonstrate the feasibility of using NSGA-II for feature subset selection.

  17. Selection of individual features of a speech signal using genetic algorithms

    Directory of Open Access Journals (Sweden)

    Kamil Kamiński

    2016-03-01

    Full Text Available The paper presents an automatic speaker’s recognition system, implemented in the Matlab environment, and demonstrates how to achieve and optimize various elements of the system. The main emphasis was put on features selection of a speech signal using a genetic algorithm which takes into account synergy of features. The results of optimization of selected elements of a classifier have been also shown, including the number of Gaussian distributions used to model each of the voices. In addition, for creating voice models, a universal voice model has been used.[b]Keywords[/b]: biometrics, automatic speaker recognition, genetic algorithms, feature selection

  18. Discharges Classification using Genetic Algorithms and Feature Selection Algorithms on Time and Frequency Domain Data Extracted from Leakage Current Measurements

    Directory of Open Access Journals (Sweden)

    D. Pylarinos

    2013-12-01

    Full Text Available A number of 387 discharge portraying waveforms recorded on 18 different 150 kV post insulators installed at two different Substations in Crete, Greece are considered in this paper. Twenty different features are extracted from each waveform and two feature selection algorithms (t-test and mRMR are employed. Genetic algorithms are used to classify waveforms in two different classes related to the portrayed discharges. Five different data sets are employed (1. the original feature vector, 2. time domain features, 3. frequency domain features, 4. t-test selected features 5. mRMR selected features. Results are discussed and compared with previous classification implementations on this particular data group.

  19. Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

    Science.gov (United States)

    Koromyslova, A.; Semenkina, M.; Sergienko, R.

    2017-02-01

    The text classification problem for natural language call routing was considered in the paper. Seven different term weighting methods were applied. As dimensionality reduction methods, the feature selection based on self-adaptive GA is considered. k-NN, linear SVM and ANN were used as classification algorithms. The tasks of the research are the following: perform research of text classification for natural language call routing with different term weighting methods and classification algorithms and investigate the feature selection method based on self-adaptive GA. The numerical results showed that the most effective term weighting is TRR. The most effective classification algorithm is ANN. Feature selection with self-adaptive GA provides improvement of classification effectiveness and significant dimensionality reduction with all term weighting methods and with all classification algorithms.

  20. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.

    Science.gov (United States)

    Ma, Li; Fan, Suohai

    2017-03-14

    The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.

  1. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    Science.gov (United States)

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM.

  2. A New Feature Selection Algorithm Based on the Mean Impact Variance

    Directory of Open Access Journals (Sweden)

    Weidong Cheng

    2014-01-01

    Full Text Available The selection of fewer or more representative features from multidimensional features is important when the artificial neural network (ANN algorithm is used as a classifier. In this paper, a new feature selection method called the mean impact variance (MIVAR method is proposed to determine the feature that is more suitable for classification. Moreover, this method is constructed on the basis of the training process of the ANN algorithm. To verify the effectiveness of the proposed method, the MIVAR value is used to rank the multidimensional features of the bearing fault diagnosis. In detail, (1 70-dimensional all waveform features are extracted from a rolling bearing vibration signal with four different operating states, (2 the corresponding MIVAR values of all 70-dimensional features are calculated to rank all features, (3 14 groups of 10-dimensional features are separately generated according to the ranking results and the principal component analysis (PCA algorithm and a back propagation (BP network is constructed, and (4 the validity of the ranking result is proven by training this BP network with these seven groups of 10-dimensional features and by comparing the corresponding recognition rates. The results prove that the features with larger MIVAR value can lead to higher recognition rates.

  3. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    Science.gov (United States)

    Aalaei, Shokoufeh; Shahraki, Hadi; Rowhanimanesh, Alireza; Eslami, Saeid

    2016-01-01

    Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we employed three different classifiers artificial neural network (ANN) and PS-classifier and genetic algorithm based classifier (GA-classifier) on Wisconsin breast cancer datasets include Wisconsin breast cancer dataset (WBC), Wisconsin diagnosis breast cancer (WDBC), and Wisconsin prognosis breast cancer (WPBC). Results: For WBC dataset, it is observed that feature selection improved the accuracy of all classifiers expect of ANN and the best accuracy with feature selection achieved by PS-classifier. For WDBC and WPBC, results show feature selection improved accuracy of all three classifiers and the best accuracy with feature selection achieved by ANN. Also specificity and sensitivity improved after feature selection. Conclusion: The results show that feature selection can improve accuracy, specificity and sensitivity of classifiers. Result of this study is comparable with the other studies on Wisconsin breast cancer datasets. PMID:27403253

  4. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    Directory of Open Access Journals (Sweden)

    Shokoufeh Aalaei

    2016-05-01

    Full Text Available Objective(s: This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we employed three different classifiers artificial neural network (ANN and PS-classifier and genetic algorithm based classifier (GA-classifier on Wisconsin breast cancer datasets include Wisconsin breast cancer dataset (WBC, Wisconsin diagnosis breast cancer (WDBC, and Wisconsin prognosis breast cancer (WPBC. Results: For WBC dataset, it is observed that feature selection improved the accuracy of all classifiers expect of ANN and the best accuracy with feature selection achieved by PS-classifier. For WDBC and WPBC, results show feature selection improved accuracy of all three classifiers and the best accuracy with feature selection achieved by ANN. Also specificity and sensitivity improved after feature selection. Conclusion: The results show that feature selection can improve accuracy, specificity and sensitivity of classifiers. Result of this study is comparable with the other studies on Wisconsin breast cancer datasets.

  5. Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection

    Science.gov (United States)

    Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline

    2014-07-01

    Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.

  6. Diagnosis of Hepatocellular Carcinoma Spectroscopy Based on the Feature Selection Approach of the Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Shao-qing Wang

    2013-06-01

    Full Text Available This paper aims to study the application of medical imaging technology with artificial intelligence technology on how to improve the diagnostic accuracy rate for hepatocellular carcinoma. The   recognition method based on genetic algorithm (GA and Neural Network are presented. GA was used to select 20 optimal features from the 401 initial features. BP (Back-propagation Neural Network, BP and PNN (Probabilistic Neural Network, PNN were used to classify tested samples based on these optimized features, and make comparison between results based on 20 optimal features and the all 401 features. The results of the experiment show that the method can improve the recognition rate.

  7. Applications of feature selection. [development of classification algorithms for LANDSAT data

    Science.gov (United States)

    Guseman, L. F., Jr.

    1976-01-01

    The use of satellite-acquired (LANDSAT) multispectral scanner (MSS) data to conduct an inventory of some crop of economic interest such as wheat over a large geographical area is considered in relation to the development of accurate and efficient algorithms for data classification. The dimension of the measurement space and the computational load for a classification algorithm is increased by the use of multitemporal measurements. Feature selection/combination techniques used to reduce the dimensionality of the problem are described.

  8. Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification

    CERN Document Server

    Nongmeikapam, Kishorjit; 10.5121/ijcsit.2011.350

    2011-01-01

    This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an important factor in the recognition of Manipuri MWEs using Conditional Random Field (CRF). The disadvantage of manual selection and choosing of the appropriate features for running CRF motivates us to think of Genetic Algorithm (GA). Using GA we are able to find the optimal features to run the CRF. We have tried with fifty generations in feature selection along with three fold cross validation as fitness function. This model demonstrated the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%, showing an improvement over the CRF based Manipuri MWE identification without GA application.

  9. Cost effective approach on feature selection using genetic algorithms and fuzzy logic for diabetes diagnosis

    CERN Document Server

    Ephzibah, E P

    2011-01-01

    A way to enhance the performance of a model that combines genetic algorithms and fuzzy logic for feature selection and classification is proposed. Early diagnosis of any disease with less cost is preferable. Diabetes is one such disease. Diabetes has become the fourth leading cause of death in developed countries and there is substantial evidence that it is reaching epidemic proportions in many developing and newly industrialized nations. In medical diagnosis, patterns consist of observable symptoms along with the results of diagnostic tests. These tests have various associated costs and risks. In the automated design of pattern classification, the proposed system solves the feature subset selection problem. It is a task of identifying and selecting a useful subset of pattern-representing features from a larger set of features. Using fuzzy rule-based classification system, the proposed system proves to improve the classification accuracy.

  10. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data

    Science.gov (United States)

    Aorigele; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323

  11. Optimal Feature Space Selection in Detecting Epileptic Seizure based on Recurrent Quantification Analysis and Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Saleh LAshkari

    2016-06-01

    Full Text Available Selecting optimal features based on nature of the phenomenon and high discriminant ability is very important in the data classification problems. Since it doesn't require any assumption about stationary condition and size of the signal and the noise in Recurrent Quantification Analysis (RQA, it may be useful for epileptic seizure Detection. In this study, RQA was used to discriminate ictal EEG from the normal EEG where optimal features selected by combination of algorithm genetic and Bayesian Classifier. Recurrence plots of hundred samples in each two categories were obtained with five distance norms in this study: Euclidean, Maximum, Minimum, Normalized and Fixed Norm. In order to choose optimal threshold for each norm, ten threshold of ε was generated and then the best feature space was selected by genetic algorithm in combination with a bayesian classifier. The results shown that proposed method is capable of discriminating the ictal EEG from the normal EEG where for Minimum norm and 0.1˂ε˂1, accuracy was 100%. In addition, the sensitivity of proposed framework to the ε and the distance norm parameters was low. The optimal feature presented in this study is Trans which it was selected in most feature spaces with high accuracy.

  12. DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm

    KAUST Repository

    Soufan, Othman

    2015-02-26

    Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem\\'s dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filteringmethods thatmay be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.

  13. An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model

    KAUST Repository

    Soufan, Othman

    2012-09-01

    Feature selection is the first task of any learning approach that is applied in major fields of biomedical, bioinformatics, robotics, natural language processing and social networking. In feature subset selection problem, a search methodology with a proper criterion seeks to find the best subset of features describing data (relevance) and achieving better performance (optimality). Wrapper approaches are feature selection methods which are wrapped around a classification algorithm and use a performance measure to select the best subset of features. We analyze the proper design of the objective function for the wrapper approach and highlight an objective based on several classification algorithms. We compare the wrapper approaches to different feature selection methods based on distance and information based criteria. Significant improvement in performance, computational time, and selection of minimally sized feature subsets is achieved by combining different objectives for the wrapper model. In addition, considering various classification methods in the feature selection process could lead to a global solution of desirable characteristics.

  14. An application of locally linear model tree algorithm with combination of feature selection in credit scoring

    Science.gov (United States)

    Siami, Mohammad; Gholamian, Mohammad Reza; Basiri, Javad

    2014-10-01

    Nowadays, credit scoring is one of the most important topics in the banking sector. Credit scoring models have been widely used to facilitate the process of credit assessing. In this paper, an application of the locally linear model tree algorithm (LOLIMOT) was experimented to evaluate the superiority of its performance to predict the customer's credit status. The algorithm is improved with an aim of adjustment by credit scoring domain by means of data fusion and feature selection techniques. Two real world credit data sets - Australian and German - from UCI machine learning database were selected to demonstrate the performance of our new classifier. The analytical results indicate that the improved LOLIMOT significantly increase the prediction accuracy.

  15. Feature Subset Selection for Hot Method Prediction using Genetic Algorithm wrapped with Support Vector Machines

    Directory of Open Access Journals (Sweden)

    S. Johnson

    2011-01-01

    Full Text Available Problem statement: All compilers have simple profiling-based heuristics to identify and predict program hot methods and also to make optimization decisions. The major challenge in the profile-based optimization is addressing the problem of overhead. The aim of this work is to perform feature subset selection using Genetic Algorithms (GA to improve and refine the machine learnt static hot method predictive technique and to compare the performance of the new models against the simple heuristics. Approach: The relevant features for training the predictive models are extracted from an initial set of randomly selected ninety static program features, with the help of the GA wrapped with the predictive model using the Support Vector Machine (SVM, a Machine Learning (ML algorithm. Results: The GA-generated feature subsets containing thirty and twenty nine features respectively for the two predictive models when tested on MiBench predict Long Running Hot Methods (LRHM and frequently called hot methods (FCHM with the respective accuracies of 71% and 80% achieving an increase of 19% and 22%. Further, inlining of the predicted LRHM and FCHM improve the program performance by 3% and 5% as against 4% and 6% with Low Level Virtual Machines (LLVM default heuristics. When intra-procedural optimizations (IPO are performed on the predicted hot methods, this system offers a performance improvement of 5% and 4% as against 0% and 3% by LLVM default heuristics on LRHM and FCHM respectively. However, we observe an improvement of 36% in certain individual programs. Conclusion: Overall, the results indicate that the GA wrapped with SVM derived feature reduction improves the hot method prediction accuracy and that the technique of hot method prediction based optimization is potentially useful in selective optimization.

  16. Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier.

    Science.gov (United States)

    Paul, Desbordes; Su, Ruan; Romain, Modzelewski; Sébastien, Vauclin; Pierre, Vera; Isabelle, Gardin

    2016-12-28

    The outcome prediction of patients can greatly help to personalize cancer treatment. A large amount of quantitative features (clinical exams, imaging, …) are potentially useful to assess the patient outcome. The challenge is to choose the most predictive subset of features. In this paper, we propose a new feature selection strategy called GARF (genetic algorithm based on random forest) extracted from positron emission tomography (PET) images and clinical data. The most relevant features, predictive of the therapeutic response or which are prognoses of the patient survival 3 years after the end of treatment, were selected using GARF on a cohort of 65 patients with a local advanced oesophageal cancer eligible for chemo-radiation therapy. The most relevant predictive results were obtained with a subset of 9 features leading to a random forest misclassification rate of 18±4% and an areas under the of receiver operating characteristic (ROC) curves (AUC) of 0.823±0.032. The most relevant prognostic results were obtained with 8 features leading to an error rate of 20±7% and an AUC of 0.750±0.108. Both predictive and prognostic results show better performances using GARF than using 4 other studied methods.

  17. Improving Bee Algorithm Based Feature Selection in Intrusion Detection System Using Membrane Computing

    Directory of Open Access Journals (Sweden)

    Kazeem I. Rufai

    2014-03-01

    Full Text Available Despite the great benefits accruable from the debut of computer and the internet, efforts are constantly being put up by fraudulent and mischievous individuals to compromise the integrity, confidentiality or availability of electronic information systems. In Cyber-security parlance, this is termed ‘intrusion’. Hence, this has necessitated the introduction of Intrusion Detection Systems (IDS to help detect and curb different types of attack. However, based on the high volume of data traffic involved in a network system, effects of redundant and irrelevant data should be minimized if a qualitative intrusion detection mechanism is genuinely desirous. Several attempts, especially feature subset selection approach using Bee Algorithm (BA, Linear Genetic Programming (LGP, Support Vector Decision Function Ranking (SVDF, Rough, Rough-DPSO, and Mutivariate Regression Splines (MARS have been advanced in the past to measure the dependability and quality of a typical IDS. The observed problem among these approaches has to do with their general performance. This has therefore motivated this research work. We hereby propose a new but robust algorithm called membrane algorithm to improve the Bee Algorithm based feature subset selection technique. This Membrane computing paradigm is a class of parallel computing devices. Data used were taken from KDD-Cup 99 Dataset which is the acceptable standard benchmark for intrusion detection. When the final results were compared to those of the existing approaches, using the three standard IDS measurements-Attack Detection, False Alarm and Classification Accuracy Rates, it was discovered that Bee Algorithm-Membrane Computing (BA-MC approach is a better technique. This is because our approach produced very high attack detection rate of 89.11%, classification accuracy of 95.60% and also generated a reasonable decrease in false alarm rate of 0.004. Receiver Operating Characteristic (ROC curve was used for results

  18. Multivariate Feature Selection for Predicting Scour-Related Bridge Damage using a Genetic Algorithm

    Science.gov (United States)

    Anderson, I.

    2015-12-01

    Scour and hydraulic damage are the most common cause of bridge failure, reported to be responsible for over 60% of bridge failure nationwide. Scour is a complex process, and is likely an epistatic function of both bridge and stream conditions that are both stationary and in dynamic flux. Bridge inspections, conducted regularly on bridges nationwide, rate bridge health assuming a static stream condition, and typically do not include dynamically changing geomorphological adjustments. The Vermont Agency of Natural Resources stream geomorphic assessment data could add value into the current bridge inspection and scour design. The 2011 bridge damage from Tropical Storm Irene served as a case study for feature selection to improve bridge scour damage prediction in extreme events. The bridge inspection (with over 200 features on more than 300 damaged and 2,000 non-damaged bridges), and the stream geomorphic assessment (with over 300 features on more than 5000 stream reaches) constitute "Big Data", and together have the potential to generate large numbers of combined features ("epistatic relationships") that might better predict scour-related bridge damage. The potential combined features pose significant computational challenges for traditional statistical techniques (e.g., multivariate logistic regression). This study uses a genetic algorithm to perform a search of the multivariate feature space to identify epistatic relationships that are indicative of bridge scour damage. The combined features identified could be used to improve bridge scour design, and to better monitor and rate bridge scour vulnerability.

  19. Feature selection for disruption prediction from scratch in JET by using genetic algorithms and probabilistic predictors

    Energy Technology Data Exchange (ETDEWEB)

    Pereira, Augusto, E-mail: augusto.pereira@ciemat.es [Laboratorio Nacional de Fusión, CIEMAT, Madrid (Spain); Vega, Jesús; Moreno, Raúl [Laboratorio Nacional de Fusión, CIEMAT, Madrid (Spain); Dormido-Canto, Sebastián [Dpto. Informática y Automática – UNED, Madrid (Spain); Rattá, Giuseppe A. [Laboratorio Nacional de Fusión, CIEMAT, Madrid (Spain); Pavón, Fernando [Dpto. Informática y Automática – UNED, Madrid (Spain)

    2015-10-15

    Recently, a probabilistic classifier has been developed at JET to be used as predictor from scratch. It has been applied to a database of 1237 JET ITER-like wall (ILW) discharges (of which 201 disrupted) with good results: success rate of 94% and false alarm rate of 4.21%. A combinatorial analysis between 14 features to ensure the selection of the best ones to achieve good enough results in terms of success rate and false alarm rate was performed. All possible combinations with a number of features between 2 and 7 were tested and 9893 different predictors were analyzed. An important drawback in this analysis was the time required to compute the results that can be estimated in 1731 h (∼2.4 months). Genetic algorithms (GA) are searching algorithms that simulate the process of natural selection. In this article, the GA and the Venn predictors are combined with the objective not only of finding good enough features within the 14 available ones but also of reducing the computational time requirements. Five different performance metrics as measures of the GA fitness function have been evaluated. The best metric was the measurement called Informedness, with just 6 generations (168 predictors at 29.4 h).

  20. A branch-and-bound feature selection algorithm for U-shaped cost functions

    CERN Document Server

    Ris, Marcelo; Martins, David C

    2008-01-01

    This paper presents the formulation of a combinatorial optimization problem with the following characteristics: i.the search space is the power set of a finite set structured as a Boolean lattice; ii.the cost function forms a U-shaped curve when applied to any lattice chain. This formulation applies for feature selection in the context of pattern recognition. The known approaches for this problem are branch-and-bound algorithms and heuristics, that explore partially the search space. Branch-and-bound algorithms are equivalent to the full search, while heuristics are not. This paper presents a branch-and-bound algorithm that differs from the others known by exploring the lattice structure and the U-shaped chain curves of the search space. The main contribution of this paper is the architecture of this algorithm that is based on the representation and exploration of the search space by new lattice properties proven here. Several experiments, with well known public data, indicate the superiority of the proposed ...

  1. A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms.

    Science.gov (United States)

    Şen, Baha; Peker, Musa; Çavuşoğlu, Abdullah; Çelebi, Fatih V

    2014-03-01

    Sleep scoring is one of the most important diagnostic methods in psychiatry and neurology. Sleep staging is a time consuming and difficult task undertaken by sleep experts. This study aims to identify a method which would classify sleep stages automatically and with a high degree of accuracy and, in this manner, will assist sleep experts. This study consists of three stages: feature extraction, feature selection from EEG signals, and classification of these signals. In the feature extraction stage, it is used 20 attribute algorithms in four categories. 41 feature parameters were obtained from these algorithms. Feature selection is important in the elimination of irrelevant and redundant features and in this manner prediction accuracy is improved and computational overhead in classification is reduced. Effective feature selection algorithms such as minimum redundancy maximum relevance (mRMR); fast correlation based feature selection (FCBF); ReliefF; t-test; and Fisher score algorithms are preferred at the feature selection stage in selecting a set of features which best represent EEG signals. The features obtained are used as input parameters for the classification algorithms. At the classification stage, five different classification algorithms (random forest (RF); feed-forward neural network (FFNN); decision tree (DT); support vector machine (SVM); and radial basis function neural network (RBF)) classify the problem. The results, obtained from different classification algorithms, are provided so that a comparison can be made between computation times and accuracy rates. Finally, it is obtained 97.03 % classification accuracy using the proposed method. The results show that the proposed method indicate the ability to design a new intelligent assistance sleep scoring system.

  2. Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

    Science.gov (United States)

    Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

    2016-06-01

    Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.

  3. 双向自动分支界限特征选择算法%Bidirectional Automated Branch and Bound Algorithm for Feature Selection

    Institute of Scientific and Technical Information of China (English)

    杨胜; 施鹏飞

    2005-01-01

    Feature selection is a process where a minimal feature subset is selected from an original feature set according to a certain measure. In this paper, feature relevancy is defined by an Inconsistency rate. A bidirectional automated branch and bound algorithm is presented. It is a new complete search algorithm for feature selection, which performs feature deletion and feature addition in parallel.it is fit for feature selection.

  4. Feature Selection and Classification of Electroencephalographic Signals: An Artificial Neural Network and Genetic Algorithm Based Approach.

    Science.gov (United States)

    Erguzel, Turker Tekin; Ozekes, Serhat; Tan, Oguz; Gultekin, Selahattin

    2015-10-01

    Feature selection is an important step in many pattern recognition systems aiming to overcome the so-called curse of dimensionality. In this study, an optimized classification method was tested in 147 patients with major depressive disorder (MDD) treated with repetitive transcranial magnetic stimulation (rTMS). The performance of the combination of a genetic algorithm (GA) and a back-propagation (BP) neural network (BPNN) was evaluated using 6-channel pre-rTMS electroencephalographic (EEG) patterns of theta and delta frequency bands. The GA was first used to eliminate the redundant and less discriminant features to maximize classification performance. The BPNN was then applied to test the performance of the feature subset. Finally, classification performance using the subset was evaluated using 6-fold cross-validation. Although the slow bands of the frontal electrodes are widely used to collect EEG data for patients with MDD and provide quite satisfactory classification results, the outcomes of the proposed approach indicate noticeably increased overall accuracy of 89.12% and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.904 using the reduced feature set.

  5. QSAR modeling for quinoxaline derivatives using genetic algorithm and simulated annealing based feature selection.

    Science.gov (United States)

    Ghosh, P; Bagchi, M C

    2009-01-01

    With a view to the rational design of selective quinoxaline derivatives, 2D and 3D-QSAR models have been developed for the prediction of anti-tubercular activities. Successful implementation of a predictive QSAR model largely depends on the selection of a preferred set of molecular descriptors that can signify the chemico-biological interaction. Genetic algorithm (GA) and simulated annealing (SA) are applied as variable selection methods for model development. 2D-QSAR modeling using GA or SA based partial least squares (GA-PLS and SA-PLS) methods identified some important topological and electrostatic descriptors as important factor for tubercular activity. Kohonen network and counter propagation artificial neural network (CP-ANN) considering GA and SA based feature selection methods have been applied for such QSAR modeling of Quinoxaline compounds. Out of a variable pool of 380 molecular descriptors, predictive QSAR models are developed for the training set and validated on the test set compounds and a comparative study of the relative effectiveness of linear and non-linear approaches has been investigated. Further analysis using 3D-QSAR technique identifies two models obtained by GA-PLS and SA-PLS methods leading to anti-tubercular activity prediction. The influences of steric and electrostatic field effects generated by the contribution plots are discussed. The results indicate that SA is a very effective variable selection approach for such 3D-QSAR modeling.

  6. Fingerprint Feature Extraction Algorithm

    Directory of Open Access Journals (Sweden)

    Mehala. G

    2014-03-01

    Full Text Available The goal of this paper is to design an efficient Fingerprint Feature Extraction (FFE algorithm to extract the fingerprint features for Automatic Fingerprint Identification Systems (AFIS. FFE algorithm, consists of two major subdivisions, Fingerprint image preprocessing, Fingerprint image postprocessing. A few of the challenges presented in an earlier are, consequently addressed, in this paper. The proposed algorithm is able to enhance the fingerprint image and also extracting true minutiae.

  7. Fingerprint Feature Extraction Algorithm

    OpenAIRE

    Mehala. G

    2014-01-01

    The goal of this paper is to design an efficient Fingerprint Feature Extraction (FFE) algorithm to extract the fingerprint features for Automatic Fingerprint Identification Systems (AFIS). FFE algorithm, consists of two major subdivisions, Fingerprint image preprocessing, Fingerprint image postprocessing. A few of the challenges presented in an earlier are, consequently addressed, in this paper. The proposed algorithm is able to enhance the fingerprint image and also extractin...

  8. Multi-Features Encoding and Selecting Based on Genetic Algorithm for Human Action Recognition from Video

    Directory of Open Access Journals (Sweden)

    Chenglong Yu

    2013-05-01

    Full Text Available In this study, we proposed multiple local features encoded for recognizing the human actions. The multiple local features were obtained from the simple feature description of human actions in video. The simple features are two kinds of important features, optical flow and edge, to represent the human perception for the video behavior. As the video information descriptors, optical flow and edge, which their computing speeds are very fast and their requirement of memory consumption is very low, can represent respectively the motion information and shape information. Furthermore, key local multi-features are extracted and encoded by GA in order to reduce the computational complexity of the algorithm. After then, the Multi-SVM classifier is applied to discriminate the human actions.

  9. Online feature selection with streaming features.

    Science.gov (United States)

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  10. Multi-Stage Feature Selection by Using Genetic Algorithms for Fault Diagnosis in Gearboxes Based on Vibration Signal

    Directory of Open Access Journals (Sweden)

    Mariela Cerrada

    2015-09-01

    Full Text Available There are growing demands for condition-based monitoring of gearboxes, and techniques to improve the reliability, effectiveness and accuracy for fault diagnosis are considered valuable contributions. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance in the diagnosis system. The main aim of this research is to propose a multi-stage feature selection mechanism for selecting the best set of condition parameters on the time, frequency and time-frequency domains, which are extracted from vibration signals for fault diagnosis purposes in gearboxes. The selection is based on genetic algorithms, proposing in each stage a new subset of the best features regarding the classifier performance in a supervised environment. The selected features are augmented at each stage and used as input for a neural network classifier in the next step, while a new subset of feature candidates is treated by the selection process. As a result, the inherent exploration and exploitation of the genetic algorithms for finding the best solutions of the selection problem are locally focused. The Sensors 2015, 15 23904 approach is tested on a dataset from a real test bed with several fault classes under different running conditions of load and velocity. The model performance for diagnosis is over 98%.

  11. Identity Recognition Algorithm Using Improved Gabor Feature Selection of Gait Energy Image

    Science.gov (United States)

    Chao, LIANG; Ling-yao, JIA; Dong-cheng, SHI

    2017-01-01

    This paper describes an effective gait recognition approach based on Gabor features of gait energy image. In this paper, the kernel Fisher analysis combined with kernel matrix is proposed to select dominant features. The nearest neighbor classifier based on whitened cosine distance is used to discriminate different gait patterns. The approach proposed is tested on the CASIA and USF gait databases. The results show that our approach outperforms other state of gait recognition approaches in terms of recognition accuracy and robustness.

  12. A review of feature selection algorithms that treat the microarray data redundancy

    Directory of Open Access Journals (Sweden)

    Roxana Pérez Rubido

    2013-12-01

    Full Text Available In recent times, the redundancy analysis in attribute selection algorithms in machine learning has become a constant. Studies have shown that the percentages of prediction, after removing these attributes, are better than the cases where it is not. Furthermore, by excluding it from data set, the temporal complexity of the classifier is reduced because it has less data to process. In the actually, the algorithms have evolved in this regard and treat redundancy in different ways and with different criteria. The main aim of this review is to present the different evaluation criteria to address data redundancy in ADN microarrays. The study applied analysis-synthesis, historic-logical and inductivedeductive methods. We conducted a literature review of articles published since the 90's which contain algorithms to select attributes and take into account the dependency between them. The article describe a general way, his steps, the criterion used in the analysis of redundancy and some of its advantages and disadvantages.

  13. 一个混合特征属性选择算法%A Mixing Algorithm for Feature Attribute Selection

    Institute of Scientific and Technical Information of China (English)

    刘明吉; 王秀峰; 饶一梅

    2000-01-01

    The feature attribute selection is a very interesting problem.With the development of Rough Set theory(RS)during these years,many researchers and scholars proposed the attribute selection based on RS.But with the increasement of the attribute number,the efficiency declines rapidly.In this paper,we combine the RS theory with GA and propose a mixing heuristic algorithm for attribute selection.The experiment result shows that it can get better result and higher efficiency especially for settling the problem of large attribute number.

  14. Icing Forecasting of High Voltage Transmission Line Using Weighted Least Square Support Vector Machine with Fireworks Algorithm for Feature Selection

    Directory of Open Access Journals (Sweden)

    Tiannan Ma

    2016-12-01

    Full Text Available Accurate forecasting of icing thickness has great significance for ensuring the security and stability of the power grid. In order to improve the forecasting accuracy, this paper proposes an icing forecasting system based on the fireworks algorithm and weighted least square support vector machine (W-LSSVM. The method of the fireworks algorithm is employed to select the proper input features with the purpose of eliminating redundant influence. In addition, the aim of the W-LSSVM model is to train and test the historical data-set with the selected features. The capability of this proposed icing forecasting model and framework is tested through simulation experiments using real-world icing data from the monitoring center of the key laboratory of anti-ice disaster, Hunan, South China. The results show that the proposed W-LSSVM-FA method has a higher prediction accuracy and it may be a promising alternative for icing thickness forecasting.

  15. A classification system based on a new wrapper feature selection algorithm for the diagnosis of primary and secondary polycythemia.

    Science.gov (United States)

    Korfiatis, Vasileios Ch; Asvestas, Pantelis A; Delibasis, Konstantinos K; Matsopoulos, George K

    2013-12-01

    Primary and Secondary Polycythemia are diseases of the bone marrow that affect the blood's composition and prohibit patients from becoming blood donors. Since these diseases may become fatal, their early diagnosis is important. In this paper, a classification system for the diagnosis of Primary and Secondary Polycythemia is proposed. The proposed system classifies input data into three classes; Healthy, Primary Polycythemic (PP) and Secondary Polycythemic (SP) and is implemented using two separate binary classification levels. The first level performs the Healthy/non-Healthy classification and the second level the PP/SP classification. To this end, a novel wrapper feature selection algorithm, called the LM-FM algorithm, is presented in order to maximize the classifier's performance. The algorithm is comprised of two stages that are applied sequentially: the Local Maximization (LM) stage and the Floating Maximization (FM) stage. The LM stage finds the best possible subset of a fixed predefined size, which is then used as an input for the next stage. The FM stage uses a floating size technique to search for an even better solution by varying the initially provided subset size. Then, the Support Vector Machine (SVM) classifier is used for the discrimination of the data at each classification level. The proposed classification system is compared with various well-established feature selection techniques such as the Sequential Floating Forward Selection (SFFS) and the Maximum Output Information (MOI) wrapper schemes, and with standalone classification techniques such as the Multilayer Perceptron (MLP) and SVM classifier. The proposed LM-FM feature selection algorithm combined with the SVM classifier increases the overall performance of the classification system, scoring up to 98.9% overall accuracy at the first classification level and up to 96.6% at the second classification level. Moreover, it provides excellent robustness regardless of the size of the input feature

  16. Power Quality Disturbances Feature Selection and Recognition Using Optimal Multi-Resolution Fast S-Transform and CART Algorithm

    Directory of Open Access Journals (Sweden)

    Nantian Huang

    2016-11-01

    Full Text Available In order to improve the recognition accuracy and efficiency of power quality disturbances (PQD in microgrids, a novel PQD feature selection and recognition method based on optimal multi-resolution fast S-transform (OMFST and classification and regression tree (CART algorithm is proposed. Firstly, OMFST is carried out according to the frequency domain characteristic of disturbance signal, and 67 features are extracted by time-frequency analysis to construct the original feature set. Subsequently, the optimal feature subset is determined by Gini importance and sorted according to an embedded feature selection method based on the Gini index. Finally, one standard error rule subtree evaluation methods were applied for cost complexity pruning. After pruning, the optimal decision tree (ODT is obtained for PQD classification. The experiments show that the new method can effectively improve the classification efficiency and accuracy with feature selection step. Simultaneously, the ODT can be constructed automatically according to the ability of feature classification. In different noise environments, the classification accuracy of the new method is higher than the method based on probabilistic neural network, extreme learning machine, and support vector machine.

  17. Evaluation of feature selection algorithms for classification in temporal lobe epilepsy based on MR images

    Science.gov (United States)

    Lai, Chunren; Guo, Shengwen; Cheng, Lina; Wang, Wensheng; Wu, Kai

    2017-02-01

    It's very important to differentiate the temporal lobe epilepsy (TLE) patients from healthy people and localize the abnormal brain regions of the TLE patients. The cortical features and changes can reveal the unique anatomical patterns of brain regions from the structural MR images. In this study, structural MR images from 28 normal controls (NC), 18 left TLE (LTLE), and 21 right TLE (RTLE) were acquired, and four types of cortical feature, namely cortical thickness (CTh), cortical surface area (CSA), gray matter volume (GMV), and mean curvature (MCu), were explored for discriminative analysis. Three feature selection methods, the independent sample t-test filtering, the sparse-constrained dimensionality reduction model (SCDRM), and the support vector machine-recursive feature elimination (SVM-RFE), were investigated to extract dominant regions with significant differences among the compared groups for classification using the SVM classifier. The results showed that the SVM-REF achieved the highest performance (most classifications with more than 92% accuracy), followed by the SCDRM, and the t-test. Especially, the surface area and gray volume matter exhibited prominent discriminative ability, and the performance of the SVM was improved significantly when the four cortical features were combined. Additionally, the dominant regions with higher classification weights were mainly located in temporal and frontal lobe, including the inferior temporal, entorhinal cortex, fusiform, parahippocampal cortex, middle frontal and frontal pole. It was demonstrated that the cortical features provided effective information to determine the abnormal anatomical pattern and the proposed method has the potential to improve the clinical diagnosis of the TLE.

  18. A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data

    Directory of Open Access Journals (Sweden)

    Sheng Yang

    2015-01-01

    Full Text Available Sequencing is widely used to discover associations between microRNAs (miRNAs and diseases. However, the negative binomial distribution (NB and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF, was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96 from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes.

  19. A Systematic Evaluation of Feature Selection and Classification Algorithms Using Simulated and Real miRNA Sequencing Data.

    Science.gov (United States)

    Yang, Sheng; Guo, Li; Shao, Fang; Zhao, Yang; Chen, Feng

    2015-01-01

    Sequencing is widely used to discover associations between microRNAs (miRNAs) and diseases. However, the negative binomial distribution (NB) and high dimensionality of data obtained using sequencing can lead to low-power results and low reproducibility. Several statistical learning algorithms have been proposed to address sequencing data, and although evaluation of these methods is essential, such studies are relatively rare. The performance of seven feature selection (FS) algorithms, including baySeq, DESeq, edgeR, the rank sum test, lasso, particle swarm optimistic decision tree, and random forest (RF), was compared by simulation under different conditions based on the difference of the mean, the dispersion parameter of the NB, and the signal to noise ratio. Real data were used to evaluate the performance of RF, logistic regression, and support vector machine. Based on the simulation and real data, we discuss the behaviour of the FS and classification algorithms. The Apriori algorithm identified frequent item sets (mir-133a, mir-133b, mir-183, mir-937, and mir-96) from among the deregulated miRNAs of six datasets from The Cancer Genomics Atlas. Taking these findings altogether and considering computational memory requirements, we propose a strategy that combines edgeR and DESeq for large sample sizes.

  20. Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

    Directory of Open Access Journals (Sweden)

    Mustafa Serter Uzer

    2013-01-01

    Full Text Available This paper offers a hybrid approach that uses the artificial bee colony (ABC algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.

  1. Building an intrusion detection system using a filter-based feature selection algorithm

    NARCIS (Netherlands)

    Ambusaidi, Mohammed A.; He, Xiangjian; Nanda, Priyadarsi; Tan, Zhiyuan

    2016-01-01

    Redundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a

  2. Gesture Recognition from Data Streams of Human Motion Sensor Using Accelerated PSO Swarm Search Feature Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2015-01-01

    Full Text Available Human motion sensing technology gains tremendous popularity nowadays with practical applications such as video surveillance for security, hand signing, and smart-home and gaming. These applications capture human motions in real-time from video sensors, the data patterns are nonstationary and ever changing. While the hardware technology of such motion sensing devices as well as their data collection process become relatively mature, the computational challenge lies in the real-time analysis of these live feeds. In this paper we argue that traditional data mining methods run short of accurately analyzing the human activity patterns from the sensor data stream. The shortcoming is due to the algorithmic design which is not adaptive to the dynamic changes in the dynamic gesture motions. The successor of these algorithms which is known as data stream mining is evaluated versus traditional data mining, through a case of gesture recognition over motion data by using Microsoft Kinect sensors. Three different subjects were asked to read three comic strips and to tell the stories in front of the sensor. The data stream contains coordinates of articulation points and various positions of the parts of the human body corresponding to the actions that the user performs. In particular, a novel technique of feature selection using swarm search and accelerated PSO is proposed for enabling fast preprocessing for inducing an improved classification model in real-time. Superior result is shown in the experiment that runs on this empirical data stream. The contribution of this paper is on a comparative study between using traditional and data stream mining algorithms and incorporation of the novel improved feature selection technique with a scenario where different gesture patterns are to be recognized from streaming sensor data.

  3. Feature Selection and Effective Classifiers.

    Science.gov (United States)

    Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri

    1998-01-01

    Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…

  4. Unsupervised Feature Subset Selection

    DEFF Research Database (Denmark)

    Søndberg-Madsen, Nicolaj; Thomsen, C.; Pena, Jose

    2003-01-01

    This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some...... irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily....

  5. Novel Facial Features Segmentation Algorithm

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An efficient algorithm for facial features extractions is proposed. The facial features we segment are the two eyes, nose and mouth. The algorithm is based on an improved Gabor wavelets edge detector, morphological approach to detect the face region and facial features regions, and an improved T-shape face mask to locate the extract location of facial features. The experimental results show that the proposed method is robust against facial expression, illumination, and can be also effective if the person wearing glasses, and so on.

  6. A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt. Woomyeon, South Korea

    Science.gov (United States)

    Vasu, Nikhil N.; Lee, Seung-Rae

    2016-06-01

    An ever-increasing trend of extreme rainfall events in South Korea owing to climate change is causing shallow landslides and debris flows in mountains that cover 70% of the total land area of the nation. These catastrophic, gravity-driven processes cost the government several billion KRW (South Korean Won) in losses in addition to fatalities every year. The most common type of landslide observed is the shallow landslide, which occurs at 1-3 m depth, and may mobilize into more catastrophic flow-type landslides. Hence, to predict potential landslide areas, susceptibility maps are developed in a geographical information system (GIS) environment utilizing available morphological, hydrological, geotechnical, and geological data. Landslide susceptibility models were developed using 163 landslide points and an equal number of nonlandslide points in Mt. Woomyeon, Seoul, and 23 landslide conditioning factors. However, because not all of the factors contribute to the determination of the spatial probability for landslide initiation, and a simple filter or wrapper-based approach is not efficient in identifying all of the relevant features, a feedback-loop-based hybrid algorithm was implemented in conjunction with a learning scheme called an extreme learning machine, which is based on a single-layer, feed-forward network. Validation of the constructed susceptibility model was conducted using a testing set of landslide inventory data through a prediction rate curve. The model selected 13 relevant conditioning factors out of the initial 23; and the resulting susceptibility map shows a success rate of 85% and a prediction rate of 89.45%, indicating a good performance, in contrast to the low success and prediction rate of 69.19% and 56.19%, respectively, as obtained using a wrapper technique.

  7. Automatic Algorithm Selection for Complex Simulation Problems

    CERN Document Server

    Ewald, Roland

    2012-01-01

    To select the most suitable simulation algorithm for a given task is often difficult. This is due to intricate interactions between model features, implementation details, and runtime environment, which may strongly affect the overall performance. An automated selection of simulation algorithms supports users in setting up simulation experiments without demanding expert knowledge on simulation. Roland Ewald analyzes and discusses existing approaches to solve the algorithm selection problem in the context of simulation. He introduces a framework for automatic simulation algorithm selection and

  8. Partial imputation to improve predictive modelling in insurance risk classification using a hybrid positive selection algorithm and correlation-based feature selection

    CSIR Research Space (South Africa)

    Duma, M

    2013-09-01

    Full Text Available We propose a hybrid missing data imputation technique using positive selection and correlation-based feature selection for insurance data. The hybrid is used to help supervised learning methods improve their classification accuracy and resilience...

  9. Intrusion feature selection algorithm based on Particle Swarm Opti- mization%基于粒子群优化的入侵特征选择算法

    Institute of Scientific and Technical Information of China (English)

    吴庆涛; 曹继邦; 郑瑞娟; 张聚伟

    2013-01-01

      针对高维数入侵检测数据集中信息冗余导致入侵检测算法处理速度慢的问题,提出了一种基于粒子群优化的入侵特征选择算法,通过分析网络入侵数据特征之间的相关性,可使粒子群优化算法在所有特征空间中优化搜索,自主选择有效特征子集,降低数据维度。实验结果表明该算法能够有效去除冗余特征,减少特征选择时间,在保证检测准确率的前提下,有效地提高了系统的检测速度。%Intrusion feature selection can improve the correctness and detection rate of the intrusion detection system effectively. A intrusion feature subset selection algorithm based on Particle Swarm Optimization(PSO)is proposed. Depending on the analyses of correlation between all features of network intrusion data, the PSO algorithm is used to search and choose effective feature subset independently to reduce data dimension in the feature space. The experimental results show that the algorithm can effec-tively remove redundant features, reduce the time of feature selection, ensure detection accuracy and improve detecting speed.

  10. Feature selection in bioinformatics

    Science.gov (United States)

    Wang, Lipo

    2012-06-01

    In bioinformatics, there are often a large number of input features. For example, there are millions of single nucleotide polymorphisms (SNPs) that are genetic variations which determine the dierence between any two unrelated individuals. In microarrays, thousands of genes can be proled in each test. It is important to nd out which input features (e.g., SNPs or genes) are useful in classication of a certain group of people or diagnosis of a given disease. In this paper, we investigate some powerful feature selection techniques and apply them to problems in bioinformatics. We are able to identify a very small number of input features sucient for tasks at hand and we demonstrate this with some real-world data.

  11. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built...

  12. A new model of flavonoids affinity towards P-glycoprotein: genetic algorithm-support vector machine with features selected by a modified particle swarm optimization algorithm.

    Science.gov (United States)

    Cui, Ying; Chen, Qinggang; Li, Yaxiao; Tang, Ling

    2017-02-01

    Flavonoids exhibit a high affinity for the purified cytosolic NBD (C-terminal nucleotide-binding domain) of P-glycoprotein (P-gp). To explore the affinity of flavonoids for P-gp, quantitative structure-activity relationship (QSAR) models were developed using support vector machines (SVMs). A novel method coupling a modified particle swarm optimization algorithm with random mutation strategy and a genetic algorithm coupled with SVM was proposed to simultaneously optimize the kernel parameters of SVM and determine the subset of optimized features for the first time. Using DRAGON descriptors to represent compounds for QSAR, three subsets (training, prediction and external validation set) derived from the dataset were employed to investigate QSAR. With excluding of the outlier, the correlation coefficient (R(2)) of the whole training set (training and prediction) was 0.924, and the R(2) of the external validation set was 0.941. The root-mean-square error (RMSE) of the whole training set was 0.0588; the RMSE of the cross-validation of the external validation set was 0.0443. The mean Q(2) value of leave-many-out cross-validation was 0.824. With more informations from results of randomization analysis and applicability domain, the proposed model is of good predictive ability, stability.

  13. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Classical reinforcement learning techniques become impractical in domains with large complex state spaces. The size of a domain’s state space is...require all the provided features. In this paper we present a feature selection algorithm for reinforcement learning called Incremental Feature

  14. Comparison of Classification Algorithms with Wrapper-Based Feature Selection for Predicting Osteoporosis Outcome Based on Genetic Factors in a Taiwanese Women Population

    Directory of Open Access Journals (Sweden)

    Hsueh-Wei Chang

    2013-01-01

    Full Text Available An essential task in a genomic analysis of a human disease is limiting the number of strongly associated genes when studying susceptibility to the disease. The goal of this study was to compare computational tools with and without feature selection for predicting osteoporosis outcome in Taiwanese women based on genetic factors such as single nucleotide polymorphisms (SNPs. To elucidate relationships between osteoporosis and SNPs in this population, three classification algorithms were applied: multilayer feedforward neural network (MFNN, naive Bayes, and logistic regression. A wrapper-based feature selection method was also used to identify a subset of major SNPs. Experimental results showed that the MFNN model with the wrapper-based approach was the best predictive model for inferring disease susceptibility based on the complex relationship between osteoporosis and SNPs in Taiwanese women. The findings suggest that patients and doctors can use the proposed tool to enhance decision making based on clinical factors such as SNP genotyping data.

  15. Feature Selection in Scientific Applications

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E; Newsam, S; Kamath, C

    2004-02-27

    Numerous applications of data mining to scientific data involve the induction of a classification model. In many cases, the collection of data is not performed with this task in mind, and therefore, the data might contain irrelevant or redundant features that affect negatively the accuracy of the induction algorithms. The size and dimensionality of typical scientific data make it difficult to use any available domain information to identify features that discriminate between the classes of interest. Similarly, exploratory data analysis techniques have limitations on the amount and dimensionality of the data that can be effectively processed. In this paper, we describe applications of efficient feature selection methods to data sets from astronomy, plasma physics, and remote sensing. We use variations of recently proposed filter methods as well as traditional wrapper approaches where practical. We discuss the importance of these applications, the general challenges of feature selection in scientific datasets, the strategies for success that were common among our diverse applications, and the lessons learned in solving these problems.

  16. Feature Selection Algorithm for Incomplete Data Based on Information Entropy%基于信息熵的不完备数据特征选择算法

    Institute of Scientific and Technical Information of China (English)

    陈圣兵; 王晓峰

    2014-01-01

    Grounded on the analysis of the existing incomplete information entropy, the concept of incomplete information entropy based on similarity relations ( SIIE ) is proposed, and some properties of SIIE are discussed. A feature selection algorithm for incomplete data is presented. In this algorithm, SIIE of incomplete data is calculated directly, and SIIE is taken as the criteria for feature selection. Then, the sequential forward floating search method is employed to addresses the problem of correlation among features. Experiments on UCI database are carried out, and the results indicate the accuracy and efficiency of the proposed algorithm.%在分析已有不完备信息熵的基础上,提出一种基于相似关系的不完备信息熵,并证明该信息熵的若干性质。给出一个不完备数据特征选择算法,算法以改进的不完备信息熵作为特征选择准则,直接对不完备数据的特征进行熵值分析,并采用顺序前向浮动选择方法解决特征间的相关性问题。最后在UCI实测数据集上的实验表明,文中算法具有更高的准确率和更快的特征选择速度。

  17. SIFT based algorithm for point feature tracking

    Directory of Open Access Journals (Sweden)

    Adrian BURLACU

    2007-12-01

    Full Text Available In this paper a tracking algorithm for SIFT features in image sequences is developed. For each point feature extracted using SIFT algorithm a descriptor is computed using information from its neighborhood. Using an algorithm based on minimizing the distance between two descriptors tracking point features throughout image sequences is engaged. Experimental results, obtained from image sequences that capture scaling of different geometrical type object, reveal the performances of the tracking algorithm.

  18. Identification of Subtype-Specific Prognostic Genes for Early-Stage Lung Adenocarcinoma and Squamous Cell Carcinoma Patients Using an Embedded Feature Selection Algorithm.

    Directory of Open Access Journals (Sweden)

    Suyan Tian

    Full Text Available The existence of fundamental differences between lung adenocarcinoma (AC and squamous cell carcinoma (SCC in their underlying mechanisms motivated us to postulate that specific genes might exist relevant to prognosis of each histology subtype. To test on this research hypothesis, we previously proposed a simple Cox-regression model based feature selection algorithm and identified successfully some subtype-specific prognostic genes when applying this method to real-world data. In this article, we continue our effort on identification of subtype-specific prognostic genes for AC and SCC, and propose a novel embedded feature selection method by extending Threshold Gradient Descent Regularization (TGDR algorithm and minimizing on a corresponding negative partial likelihood function. Using real-world datasets and simulated ones, we show these two proposed methods have comparable performance whereas the new proposal is superior in terms of model parsimony. Our analysis provides some evidence on the existence of such subtype-specific prognostic genes, more investigation is warranted.

  19. Binary classification of chalcone derivatives with LDA or KNN based on their antileishmanial activity and molecular descriptors selected using the Successive Projections Algorithm feature-selection technique.

    Science.gov (United States)

    Goodarzi, Mohammad; Saeys, Wouter; de Araujo, Mario Cesar Ugulino; Galvão, Roberto Kawakami Harrop; Vander Heyden, Yvan

    2014-01-23

    Chalcones are naturally occurring aromatic ketones, which consist of an α-, β-unsaturated carbonyl system joining two aryl rings. These compounds are reported to exhibit several pharmacological activities, including antiparasitic, antibacterial, antifungal, anticancer, immunomodulatory, nitric oxide inhibition and anti-inflammatory effects. In the present work, a Quantitative Structure-Activity Relationship (QSAR) study is carried out to classify chalcone derivatives with respect to their antileishmanial activity (active/inactive) on the basis of molecular descriptors. For this purpose, two techniques to select descriptors are employed, the Successive Projections Algorithm (SPA) and the Genetic Algorithm (GA). The selected descriptors are initially employed to build Linear Discriminant Analysis (LDA) models. An additional investigation is then carried out to determine whether the results can be improved by using a non-parametric classification technique (One Nearest Neighbour, 1NN). In a case study involving 100 chalcone derivatives, the 1NN models were found to provide better rates of correct classification than LDA, both in the training and test sets. The best result was achieved by a SPA-1NN model with six molecular descriptors, which provided correct classification rates of 97% and 84% for the training and test sets, respectively.

  20. Pap Smear Diagnosis Using a Hybrid Intelligent Scheme Focusing on Genetic Algorithm Based Feature Selection and Nearest Neighbor Classification

    DEFF Research Database (Denmark)

    Marinakis, Yannis; Dounias, Georgios; Jantzen, Jan

    2009-01-01

    The term pap-smear refers to samples of human cells stained by the so-called Papanicolaou method. The purpose of the Papanicolaou method is to diagnose pre-cancerous cell changes before they progress to invasive carcinoma. In this paper a metaheuristic algorithm is proposed in order to classify t...... other previously applied intelligent approaches....

  1. Genetic Feature Selection for Texture Classification

    Institute of Scientific and Technical Information of China (English)

    PAN Li; ZHENG Hong; ZHANG Zuxun; ZHANG Jianqing

    2004-01-01

    This paper presents a novel approach to feature subset selection using genetic algorithms. This approach has the ability to accommodate multiple criteria such as the accuracy and cost of classification into the process of feature selection and finds the effective feature subset for texture classification. On the basis of the effective feature subset selected, a method is described to extract the objects which are higher than their surroundings, such as trees or forest, in the color aerial images. The methodology presented in this paper is illustrated by its application to the problem of trees extraction from aerial images.

  2. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built....... The method is tested and compared against sequential forward feature selection and random search in a dataset derived from a game survey experiment which contains bimodal input features (physiological and gameplay) and expressed pairwise preferences of affect. Results suggest that the proposed method...

  3. Adaptive link selection algorithms for distributed estimation

    Science.gov (United States)

    Xu, Songcen; de Lamare, Rodrigo C.; Poor, H. Vincent

    2015-12-01

    This paper presents adaptive link selection algorithms for distributed estimation and considers their application to wireless sensor networks and smart grids. In particular, exhaustive search-based least mean squares (LMS) / recursive least squares (RLS) link selection algorithms and sparsity-inspired LMS / RLS link selection algorithms that can exploit the topology of networks with poor-quality links are considered. The proposed link selection algorithms are then analyzed in terms of their stability, steady-state, and tracking performance and computational complexity. In comparison with the existing centralized or distributed estimation strategies, the key features of the proposed algorithms are as follows: (1) more accurate estimates and faster convergence speed can be obtained and (2) the network is equipped with the ability of link selection that can circumvent link failures and improve the estimation performance. The performance of the proposed algorithms for distributed estimation is illustrated via simulations in applications of wireless sensor networks and smart grids.

  4. 一种双重过滤式特征选择算法%New feature selection algorithm based on two-phase filter

    Institute of Scientific and Technical Information of China (English)

    计智伟; 胡珉

    2011-01-01

    特征选择是模式识别和机器学习领域的重要问题.针对目前Filter和Wrapper方法,以及传统二阶段组合式方法存在的缺陷,提出了一种双重过滤式特征选择方法FSTPF,并在三个国际公认数据集和一个盾构隧道施工实时数据集上进行了验证测试.实验结果表明,FSTPF算法降维效果好,且获得的优化特征子集的分类准确率得到了提高.%Feature selection is an important problem in the pattern recognition and machine learning areas.Aimed at the question that there are some shortcomings in the actual Filter, Wrapper and tradictional two-phase combined methods, this paper proposes a Feature Selection algorithm based on Two-Phase Filter(FSTPF),and it is used to test in three international accepted datasets and a shield tunneling construncting real-time dataset.The emulational experiment shows that FSTPF can get good effect of reducting dimension and improve the classification accuracy of best feature subset.

  5. 孤立性肺结节诊断模型的特征选择算法%Feature selection algorithm for diagnostic model of solitary pulmonary nodules

    Institute of Scientific and Technical Information of China (English)

    王晋; 张小龙; 赵涓涓

    2014-01-01

    It is a key problem that how to choose an appropriate feature subset in solitary pulmonary nodules diagnosis model.A hybrid model feature subset selection algorithm is proposed to improve the pulmonary nodules diagnosis efficiency and accuracy of benign or malignant,based on joint mutual information.It takes advantages of both filter and wrapper model.Firstly,a filter method is applied to obtain a candidate subset with high correlation.Then,a wrapper method is used to analyze the redundancy between features of the candidate subset.Finally,an optimal feature subset of solitary pulmonary nodules is achieved.Compared with other filter or hybrid feature selection algorithms based on mutual information,the performance of proposed method is better not only on the number of feature subset,but also on the sensitivity,specificity and the average classification accuracy in solitary pulmonary nodules diagnosis.%孤立性肺结节诊断模型中未得到充分解决的一个关键问题就是如何选择合适的特征子集。为了构建一个良好的诊断预测模型,提高肺结节良恶性诊断的效率以及准确率,提出了一种基于联合互信息的混合模型特征子集选择算法。该算法综合过滤式和包裹式特征选择模型各自的优势,首先使用过滤式方法得到与诊断有高相关度的候选特征子集,然后通过包裹式方法对候选特征子集进行特征间冗余分析,最后得到最优特征子集。实验表明,该算法与基于其他互信息的过滤式、混合模型特征选择方法相比,不仅在特征子集数目上,而且在良恶性诊断的敏感性、特异性和平均分类准确率上,均具有很好的性能效果。

  6. Prominent feature selection of microarray data

    Institute of Scientific and Technical Information of China (English)

    Yihui Liu

    2009-01-01

    For wavelet transform, a set of orthogonal wavelet basis aims to detect the localized changing features contained in microarray data. In this research, we investigate the performance of the selected wavelet features based on wavelet detail coefficients at the second level and the third level. The genetic algorithm is performed to optimize wavelet detail coefficients to select the best discriminant features. Exper-iments are carried out on four microarray datasets to evaluate the performance of classification. Experimental results prove that wavelet features optimized from detail coefficients efficiently characterize the differences between normal tissues and cancer tissues.

  7. Classification Using Markov Blanket for Feature Selection

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Luo, Jian

    2009-01-01

    Selecting relevant features is in demand when a large data set is of interest in a classification task. It produces a tractable number of features that are sufficient and possibly improve the classification performance. This paper studies a statistical method of Markov blanket induction algorithm...... induction as a feature selection method. In addition, we point out an important assumption behind the Markov blanket induction algorithm and show its effect on the classification performance....... for filtering features and then applies a classifier using the Markov blanket predictors. The Markov blanket contains a minimal subset of relevant features that yields optimal classification performance. We experimentally demonstrate the improved performance of several classifiers using a Markov blanket...

  8. The Short-Term Power Load Forecasting Based on Sperm Whale Algorithm and Wavelet Least Square Support Vector Machine with DWT-IR for Feature Selection

    Directory of Open Access Journals (Sweden)

    Jin-peng Liu

    2017-07-01

    Full Text Available Short-term power load forecasting is an important basis for the operation of integrated energy system, and the accuracy of load forecasting directly affects the economy of system operation. To improve the forecasting accuracy, this paper proposes a load forecasting system based on wavelet least square support vector machine and sperm whale algorithm. Firstly, the methods of discrete wavelet transform and inconsistency rate model (DWT-IR are used to select the optimal features, which aims to reduce the redundancy of input vectors. Secondly, the kernel function of least square support vector machine LSSVM is replaced by wavelet kernel function for improving the nonlinear mapping ability of LSSVM. Lastly, the parameters of W-LSSVM are optimized by sperm whale algorithm, and the short-term load forecasting method of W-LSSVM-SWA is established. Additionally, the example verification results show that the proposed model outperforms other alternative methods and has a strong effectiveness and feasibility in short-term power load forecasting.

  9. Feature selection for surface electromyography signal using cultural algorithm%基于文化算法的表面肌电信号特征选择

    Institute of Scientific and Technical Information of China (English)

    许璇; 谢洪波; 黄虎; 杨瑞凯

    2012-01-01

    To improve classification accuracy of the surface electromyography ( sEMG)-based prosthesis, this paper proposed a new way to select feature based on cultural algorithm( CA) and used here. It tested its classification performance with linear discrimina analysis ( LDA) . The method used surface differential electrodes to acquire four EMG signals from human body' s upper limbs. Ten healthy subjects participated in the experiment of classification of eight hand motion' s sEMG signals. Test results show that the algorithm can get a good result of classification. Compared with the standard genetic algorithm ( GA) ,it has better search performance.%为了提高假肢控制系统肌电信号的分类准确率,提出一种新的基于文化算法的特征选择方法,通过该方法选择出最佳特征向量,然后用线性分类器检验其分类性能.利用表面差分电极从人体上肢四块肌肉采集四通道的肌电信号,对十个健康受试者进行八个动作的肌电信号模式分类实验,并同时用标准遗传算法来与文化算法作比较.实验结果表明,文化算法与遗传算法相比,特征维数更小,分类准确度更高.

  10. The Importance of Feature Selection in Classification

    Directory of Open Access Journals (Sweden)

    Mrs.K. Moni Sushma Deep

    2014-01-01

    Full Text Available Feature Selection is an important technique for classification for reducing the dimensionality of feature space and it removes redundant, irrelevant, or noisy data. In this paper the feature are selected based on the ranking methods.(1 Information Gain (IG attribute evaluation, (2 Gain Ratio (GR attribute evaluation, (3 Symmetrical Uncertainty (SU attribute evaluation. This paper evaluates the features which are derived from the 3 methods using supervised learning algorithms K-Nearest Neighbor and Naïve Bayes. The measures used for the classifier are True Positive, False Positive, Accuracy and they compared between the algorithm for experimental results. we have taken 2 data sets Pima and Wine from UCI Repository database.

  11. ECG Signal Feature Selection for Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Lichen Xun

    2013-01-01

    Full Text Available This paper aims to study the selection of features based on ECG in emotion recognition. In the process of features selection, we start from existing feature selection algorithm, and pay special attention to some of the intuitive value on ECG waveform as well. Through the use of ANOVA and heuristic search, we picked out the different features to distinguish joy and pleasure these two emotions, then we combine this with pathological analysis of ECG signals by the view of the medical experts to discuss the logic corresponding relation between ECG waveform and emotion distinguish. Through experiment, using the method in this paper we only picked out five features and reached 92% of accuracy rate in the recognition of joy and pleasure.

  12. AN ADVANCED SCALE INVARIANT FEATURE TRANSFORM ALGORITHM FOR FACE RECOGNITION

    OpenAIRE

    Mohammad Mohsen Ahmadinejad; Elizabeth Sherly

    2016-01-01

    In computer vision, Scale-invariant feature transform (SIFT) algorithm is widely used to describe and detect local features in images due to its excellent performance. But for face recognition, the implementation of SIFT was complicated because of detecting false key-points in the face image due to irrelevant portions like hair style and other background details. This paper proposes an algorithm for face recognition to improve recognition accuracy by selecting relevant SIFT key-points only th...

  13. Feature subset selection based on relevance

    Science.gov (United States)

    Wang, Hui; Bell, David; Murtagh, Fionn

    In this paper an axiomatic characterisation of feature subset selection is presented. Two axioms are presented: sufficiency axiom—preservation of learning information, and necessity axiom—minimising encoding length. The sufficiency axiom concerns the existing dataset and is derived based on the following understanding: any selected feature subset should be able to describe the training dataset without losing information, i.e. it is consistent with the training dataset. The necessity axiom concerns the predictability and is derived from Occam's razor, which states that the simplest among different alternatives is preferred for prediction. The two axioms are then restated in terms of relevance in a concise form: maximising both the r( X; Y) and r( Y; X) relevance. Based on the relevance characterisation, four feature subset selection algorithms are presented and analysed: one is exhaustive and the remaining three are heuristic. Experimentation is also presented and the results are encouraging. Comparison is also made with some well-known feature subset selection algorithms, in particular, with the built-in feature selection mechanism in C4.5.

  14. Discriminative feature selection for visual tracking

    Science.gov (United States)

    Ma, Junkai; Luo, Haibo; Zhou, Wei; Song, Yingchao; Hui, Bin; Chang, Zheng

    2017-06-01

    Visual tracking is an important role in computer vision tasks. The robustness of tracking algorithm is a challenge. Especially in complex scenarios such as clutter background, illumination variation and appearance changes etc. As an important component in tracking algorithm, the appropriateness of feature is closed related to the tracking precision. In this paper, an online discriminative feature selection is proposed to provide the tracker the most discriminative feature. Firstly, a feature pool which contains different information of the image such as gradient, gray value and edge is built. And when every frame is processed during tracking, all of these features will be extracted. Secondly, these features are ranked depend on their discrimination between target and background and the highest scored feature is chosen to represent the candidate image patch. Then, after obtaining the tracking result, the target model will be update to adapt the appearance variation. The experiment show that our method is robust when compared with other state-of-the-art algorithms.

  15. Medical Image Feature, Extraction, Selection And Classification

    Directory of Open Access Journals (Sweden)

    M.VASANTHA,

    2010-06-01

    Full Text Available Breast cancer is the most common type of cancer found in women. It is the most frequent form of cancer and one in 22 women in India is likely to suffer from breast cancer. This paper proposes a image classifier to classify the mammogram images. Mammogram image is classified into normal image, benign image and malignant image. Totally 26 features including histogram intensity features and GLCM features are extracted from mammogram image. A hybrid approach of feature selection is proposed in this paper which reduces 75% of the features. Decision tree algorithms are applied to mammography lassification by using these reduced features. Experimental results have been obtained for a data set of 113 images taken from MIAS of different types. This technique of classification has not been attempted before and it reveals the potential of Data mining in medical treatment.

  16. A Fast Algorithm of Cartographic Sounding Selection

    Institute of Scientific and Technical Information of China (English)

    SUI Haigang; HUA Li; ZHAO Haitao; ZHANG Yongli

    2005-01-01

    An effective strategy and framework that adequately integrate the automated and manual processes for fast cartographic sounding selection is presented. The important submarine topographic features are extracted for important soundings selection, and an improved "influence circle" algorithm is introduced for sounding selection. For automatic configuration of soundings distribution pattern, a special algorithm considering multi-factors is employed. A semi-automatic method for solving the ambiguous conflicts is described. On the basis of the algorithms and strategies a system named HGIS for fast cartographic sounding selection is developed and applied in Chinese Marine Safety Administration Bureau (CMSAB). The application experiments show that the system is effective and reliable. At last some conclusions and the future work are given.

  17. Feature dimensionality reduction for myoelectric pattern recognition: a comparison study of feature selection and feature projection methods.

    Science.gov (United States)

    Liu, Jie

    2014-12-01

    This study investigates the effect of the feature dimensionality reduction strategies on the classification of surface electromyography (EMG) signals toward developing a practical myoelectric control system. Two dimensionality reduction strategies, feature selection and feature projection, were tested on both EMG feature sets, respectively. A feature selection based myoelectric pattern recognition system was introduced to select the features by eliminating the redundant features of EMG recordings instead of directly choosing a subset of EMG channels. The Markov random field (MRF) method and a forward orthogonal search algorithm were employed to evaluate the contribution of each individual feature to the classification, respectively. Our results from 15 healthy subjects indicate that, with a feature selection analysis, independent of the type of feature set, across all subjects high overall accuracies can be achieved in classification of seven different forearm motions with a small number of top ranked original EMG features obtained from the forearm muscles (average overall classification accuracy >95% with 12 selected EMG features). Compared to various feature dimensionality reduction techniques in myoelectric pattern recognition, the proposed filter-based feature selection approach is independent of the type of classification algorithms and features, which can effectively reduce the redundant information not only across different channels, but also cross different features in the same channel. This may enable robust EMG feature dimensionality reduction without needing to change ongoing, practical use of classification algorithms, an important step toward clinical utility.

  18. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu....... While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative...

  19. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu....... While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative...

  20. Facial expression feature selection method based on neighborhood rough set theory and quantum genetic algorithm%基于邻域粗糙集与量子遗传算法的人脸表情特征选择方法

    Institute of Scientific and Technical Information of China (English)

    冯林; 李聪; 沈莉

    2013-01-01

    人脸表情特征选择是人脸表情识别研究领域关注的一个热点.基于量子遗传算法与邻域粗糙集理论,文章提出一种新的人脸表情特征选择方法(Feature Selection based on Neighborhood Rough Set Theory and Quantum Genetic Algorithm,简称FSNRSTQGA),以邻域粗糙集理论为基础,定义了最优特征集的适应度函数来评价表情特征子集的选择效果;并结合量子遗传算法进化策略,提出了一种表情特征选择方法.Cohn-Kanade表情数据集上的仿真实验结果表明了该方法的有效性.%Facial expression feature selection is one of the hot issues in the field of facial expression recognition. A novel facial expression feature selection method named feature selection based on neighborhood rough set theory and quantum genetic algorithm (FSNRSTQGA) is proposed. First, an evaluation criterion of the optimization expression feature subset is established based on neighborhood rough set theory and used as the fitness function. Then, by quantum genetic algorithm evolutionary strategy, an approach of facial expression feature selection is proposed. The results of the simulation on Cohn-Kanade expression dataset illustrate that the FSNRSTQGA method is effective.

  1. A New Approach of Feature Selection for Text Categorization

    Institute of Scientific and Technical Information of China (English)

    CUI Zifeng; XU Baowen; ZHANG Weifeng; XU Junling

    2006-01-01

    This paper proposes a new approach of feature selection based on the independent measure between features for text categorization.A fundamental hypothesis that occurrence of the terms in documents is independent of each other,widely used in the probabilistic models for text categorization (TC), is discussed.However, the basic hypothesis is incomplete for independence of feature set.From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to obtain a feature subset.The selected subset is high in relevance with category and strong in independence between features,satisfies the basic hypothesis at maximum degree.Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.

  2. High Dimensional Data Clustering Using Fast Cluster Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Karthikeyan.P

    2014-03-01

    Full Text Available Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST using the Kruskal‟s Algorithm clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Index Terms—

  3. Detecting Lo cal Manifold Structure for Unsup ervised Feature Selection

    Institute of Scientific and Technical Information of China (English)

    FENG Ding-Cheng; CHEN Feng; XU Wen-Li

    2014-01-01

    Unsupervised feature selection is fundamental in statistical pattern recognition, and has drawn persistent attention in the past several decades. Recently, much work has shown that feature selection can be formulated as nonlinear dimensionality reduction with discrete constraints. This line of research emphasizes utilizing the manifold learning techniques, where feature selection and learning can be studied based on the manifold assumption in data distribution. Many existing feature selection methods such as Laplacian score, SPEC (spectrum decomposition of graph Laplacian), TR (trace ratio) criterion, MSFS (multi-cluster feature selection) and EVSC (eigenvalue sensitive criterion) apply the basic properties of graph Laplacian, and select the optimal feature subsets which best preserve the manifold structure defined on the graph Laplacian. In this paper, we propose a new feature selection perspective from locally linear embedding (LLE), which is another popular manifold learning method. The main difficulty of using LLE for feature selection is that its optimization involves quadratic programming and eigenvalue decomposition, both of which are continuous procedures and different from discrete feature selection. We prove that the LLE objective can be decomposed with respect to data dimensionalities in the subset selection problem, which also facilitates constructing better coordinates from data using the principal component analysis (PCA) technique. Based on these results, we propose a novel unsupervised feature selection algorithm, called locally linear selection (LLS), to select a feature subset representing the underlying data manifold. The local relationship among samples is computed from the LLE formulation, which is then used to estimate the contribution of each individual feature to the underlying manifold structure. These contributions, represented as LLS scores, are ranked and selected as the candidate solution to feature selection. We further develop a

  4. SELECTED FEATURES OF POLISH FARMERS

    Directory of Open Access Journals (Sweden)

    Grzegorz Spychalski

    2013-12-01

    Full Text Available The paper presents results of the research carried out among farm owners in Wielkopolskie voivodeship referring to selected features of social capital. The author identifies and estimates impact of some socio-professional factors on social capital quality and derives statistical conclusion. As a result there is a list of economic policy measures facilitating rural areas development in this aspect. The level of education, civic activity and tendency for collective activity are main conditions of social capital quality in Polish rural areas.

  5. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    Directory of Open Access Journals (Sweden)

    Yu-Xiang Zhao

    2016-06-01

    Full Text Available In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters.

  6. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    Science.gov (United States)

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-01-01

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346

  7. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition.

    Science.gov (United States)

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-06-14

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters.

  8. Feature Selection Algorithm Based on IMGA and MKSVM to Intrusion Detection%面向入侵检测的基于IMGA和MKSVM的特征选择算法

    Institute of Scientific and Technical Information of China (English)

    井小沛; 汪厚祥; 聂凯; 罗志伟

    2012-01-01

    入侵检测系统处理的数据具有数据量大、特征维数高等特点,会降低检测算法的处理速度和检测效率.为了提高入侵检测系统的检测速度和准确率,将特征选择应用到入侵检测系统中.首先提出一种基于免疫记忆和遗传算法的高效特征子集生成策略,然后研究基于支持向量机的特征子集评估方法.并针对可能出现的数据集不平衡造成的特征子集评估能力下降,以黎曼几何为依据,利用保角变换对核函数进行修改,以提高支持向量机的分类泛化能力.实验仿真表明,提出的特征选择算法不仅可以提高特征选择的效果,而且在不平衡数据集上具有更好的特征选择能力.还表明,基于该方法构建的入侵检测系统与没有运用特征选择的入侵检测系统相比具有更好的性能.%In order to improve performances of intrusion detection system in terms of detection speed and detection rate,it is necessary to apply feature selection in intrusion detection system. Firstly, an efficient search procedure based on immune memory and genetic algorithm (IMGA) was proposed. Then, support vector machine (SVM) based on wrapper feature evaluation methods was surveyed,in order to improve the feature selection performance of unbalanced datasets. We used the conformal transformation and Riemannian metric to modify kernel function, and reconstructed a new Modified Kernel SVM (MKSVM). Finally, the simulation experimental results show that this approach can improve the process of selecting important features, and has better feature selection ability on the unbalanced data. Furthermore, the experiments indicate that intrusion detection system with this feature selection algorithm has better performances than that without feature selection algorithm.

  9. An ensemble approach for feature selection of Cyber Attack Dataset

    CERN Document Server

    Singh, Shailendra

    2009-01-01

    Feature selection is an indispensable preprocessing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the Knearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset.

  10. Effective Feature Selection for 5G IM Applications Traffic Classification

    Directory of Open Access Journals (Sweden)

    Muhammad Shafiq

    2017-01-01

    Full Text Available Recently, machine learning (ML algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.

  11. NEW FEATURE SELECTION METHOD IN MACHINE FAULT DIAGNOSIS

    Institute of Scientific and Technical Information of China (English)

    Wang Xinfeng; Qiu Jing; Liu Guanjun

    2005-01-01

    Aiming to deficiency of the filter and wrapper feature selection methods, a new method based on composite method of filter and wrapper method is proposed. First the method filters original features to form a feature subset which can meet classification correctness rate, then applies wrapper feature selection method select optimal feature subset. A successful technique for solving optimization problems is given by genetic algorithm (GA). GA is applied to the problem of optimal feature selection. The composite method saves computing time several times of the wrapper method with holding the classification accuracy in data simulation and experiment on bearing fault feature selection. So this method possesses excellent optimization property, can save more selection time, and has the characteristics of high accuracy and high efficiency.

  12. Adaptive feature selection for hyperspectral data analysis

    Science.gov (United States)

    Korycinski, Donna; Crawford, Melba M.; Barnes, J. Wesley

    2004-02-01

    Hyperspectral data can potentially provide greatly improved capability for discrimination between many land cover types, but new methods are required to process these data and extract the required information. Data sets are extremely large, and the data are not well distributed across these high dimensional spaces. The increased number and resolution of spectral bands, many of which are highly correlated, is problematic for supervised statistical classification techniques when the number of training samples is small relative to the dimension of the input vector. Selection of the most relevant subset of features is one means of mitigating these effects. A new algorithm based on the tabu search metaheuristic optimization technique was developed to perform subset feature selection and implemented within a binary hierarchical tree framework. Results obtained using the new approach were compared to those from a greedy common greedy selection technique and to a Fisher discriminant based feature extraction method, both of which were implemented in the same binary hierarchical tree classification scheme. The tabu search based method generally yielded higher classification accuracies with lower variability than these other methods in experiments using hyperspectral data acquired by the EO-1 Hyperion sensor over the Okavango Delta of Botswana.

  13. A Hybrid Feature Subset Selection using Metrics and Forward Selection

    Directory of Open Access Journals (Sweden)

    K. Fathima Bibi

    2015-04-01

    Full Text Available The aim of this study is to design a Feature Subset Selection Technique that speeds up the Feature Selection (FS process in high dimensional datasets with reduced computational cost and great efficiency. FS has become the focus of much research on decision support system areas for which data with tremendous number of variables are analyzed. Filters and wrappers are proposed techniques for the feature subset selection process. Filters make use of association based approach but wrappers adopt classification algorithms to identify important features. Filter method lacks the ability of minimization of simplification error while wrapper method burden weighty computational resource. To pull through these difficulties, a hybrid approach is proposed combining both filters and wrappers. Filter approach uses a permutation of ranker search methods and a wrapper which improves the learning accurateness and obtains a lessening in the memory requirements and finishing time. The UCI machine learning repository was chosen to experiment the approach. The classification accuracy resulted from our approach proves to be higher.

  14. Filter selection using genetic algorithms

    Science.gov (United States)

    Patel, Devesh

    1996-03-01

    Convolution operators act as matched filters for certain types of variations found in images and have been extensively used in the analysis of images. However, filtering through a bank of N filters generates N filtered images, consequently increasing the amount of data considerably. Moreover, not all these filters have the same discriminatory capabilities for the individual images, thus making the task of any classifier difficult. In this paper, we use genetic algorithms to select a subset of relevant filters. Genetic algorithms represent a class of adaptive search techniques where the processes are similar to natural selection of biological evolution. The steady state model (GENITOR) has been used in this paper. The reduction of filters improves the performance of the classifier (which in this paper is the multi-layer perceptron neural network) and furthermore reduces the computational requirement. In this study we use the Laws filters which were proposed for the analysis of texture images. Our aim is to recognize the different textures on the images using the reduced filter set.

  15. 基于Markov blanket和互信息的集成特征选择算法%Ensemble feature selection algorithm based on Markov blanket and mutual information

    Institute of Scientific and Technical Information of China (English)

    姚旭; 王晓丹; 张玉玺; 权文

    2012-01-01

    To resolve the poor performance of classifiers owing to the irrelevant and redundancy features, a feature selection algorithm based on approximate Markov blanket and dynamic mutual information is proposed, then it is introduced to an ensemble feature selection algorithm. In the ensemble algorithm, a base classifier is trained based on Bagging and the proposed feature selection algorithm, and the base classifier diversity is introduced to selective ensemble. Finally, the weighted voting method is utilized to fuse the base classifiers' recognition results. To attest the validity, experiments on data sets with support vector machine (SVM) as the classifier are carried out. The results have been compared with single-SVM, Bagging-SVM and AB-SVM. Experimental results suggest that the proposed algorithm can get higher classification accuracy.%针对大量无关和冗余特征的存在可能降低分类器性能的问题,提出一种基于近似Markov blanket 和动态互信息的特征选择算法并将其应用于集成学习,进而得到一种集成特征选择算法.该集成特征选择算法运用Bagging方法结合提出的特征选择方法生成基分类器,并引入基分类器差异度进行选择性集成,最后用加权投票法融合所选基分类器的识别结果.通过仿真实验验证算法的有效性,以支持向量机(support vector machine,SVM)为分类器,在公共数据集UCI上进行试验,并与单SVM及经典的Bagging集成算法和特征Bagging集成算法进行对比.实验结果显示,该方法可获得较高的分类精度.

  16. A New Evolutionary-Incremental Framework for Feature Selection

    Directory of Open Access Journals (Sweden)

    Mohamad-Hoseyn Sigari

    2014-01-01

    Full Text Available Feature selection is an NP-hard problem from the viewpoint of algorithm design and it is one of the main open problems in pattern recognition. In this paper, we propose a new evolutionary-incremental framework for feature selection. The proposed framework can be applied on an ordinary evolutionary algorithm (EA such as genetic algorithm (GA or invasive weed optimization (IWO. This framework proposes some generic modifications on ordinary EAs to be compatible with the variable length of solutions. In this framework, the solutions related to the primary generations have short length. Then, the length of solutions may be increased through generations gradually. In addition, our evolutionary-incremental framework deploys two new operators called addition and deletion operators which change the length of solutions randomly. For evaluation of the proposed framework, we use that for feature selection in the application of face recognition. In this regard, we applied our feature selection method on a robust face recognition algorithm which is based on the extraction of Gabor coefficients. Experimental results show that our proposed evolutionary-incremental framework can select a few number of features from existing thousands features efficiently. Comparison result of the proposed methods with the previous methods shows that our framework is comprehensive, robust, and well-defined to apply on many EAs for feature selection.

  17. Rough set-based feature selection method

    Institute of Scientific and Technical Information of China (English)

    ZHAN Yanmei; ZENG Xiangyang; SUN Jincai

    2005-01-01

    A new feature selection method is proposed based on the discern matrix in rough set in this paper. The main idea of this method is that the most effective feature, if used for classification, can distinguish the most number of samples belonging to different classes. Experiments are performed using this method to select relevant features for artificial datasets and real-world datasets. Results show that the selection method proposed can correctly select all the relevant features of artificial datasets and drastically reduce the number of features at the same time. In addition, when this method is used for the selection of classification features of real-world underwater targets,the number of classification features after selection drops to 20% of the original feature set, and the classification accuracy increases about 6% using dataset after feature selection.

  18. Feature selection based on fractal dimension and multi-objective genetic algorithm%基于分形维数和多目标遗传算法的特征选择

    Institute of Scientific and Technical Information of China (English)

    吴曼; 张公让; 刘恒

    2015-01-01

    In text categorization system, the characteristics of the advantages and disadvantages often greatly affect the design of classifier and performance. A feature subset selection algorithm is presented based on fractal dimension and with elitist strategy of fast non-dominated sorting genetic algorithm. In the algorithm, fractal dimension is used as an evaluation mechanism and NSGA-II algorithm will regard feature subset selection problem as a multi-objective optimization prob-lem to deal with. In order to analyze the validity of the results, the SVM algorithm is utilized to test Fudan University Cor-pus. The experimental results show that this method has good performance, it can effectively remove the invalid character and improve classification accuracy.%在文本分类系统中,特征的优劣往往极大地影响着分类器的设计和性能。提出一种利用分形维数和带精英策略的非劣支配排序遗传算法进行特征选择的方法。在该方法中分形维数作为特征选择的一个评价机制,利用NSGA-II算法将特征子集选择问题视为多目标优化问题来处理。为了分析结果的有效性,利用SVM分类算法对复旦大学语料库进行测试。实验结果表明该方法具有较好的性能,它可以有效去除无效特征并提高分类准确性。

  19. 基于改进量子遗传算法的入侵检测特征选择%Intrusion Detection Feature Selection Based on Improved Quantum Genetic Algorithm

    Institute of Scientific and Technical Information of China (English)

    刘晙; 狄文辉

    2011-01-01

    Analysis the characteristics of the input intrusion detection data, and higher dimensions problem of intrusion detection. According to the characteristics of intrusion detection, feature selection will be considered as an optimization problem, using quantum genetic algorithm to feature selection, full use of the quantum genetic algorithm global search and parallel processing capabilities, to eliminate redundant attributes and reduce the scale of the problem and improve the data classification quality, faster data processing speed. Data sets in KDD CUP1999 Experimental results show that genetic algorithms and particle swarm algorithm, this method can more effectively streamline features, improve the quality of classification.%针对入侵检测前必须分析输入散据的特征以及检测中数据维数较高的问题,根据入侵检测的特点,将特征选择问题作为优化问题来考虑,采用量子遗传算法对特征进行选择,充分利用其并行处理及全局搜索能力,提高数据分类质量、降低问题规模、消除冗余属性、加快数据处理速度;在KDD CUP1999数据集上进行实验,结果表明与遗传算法以及粒子群算法相比,该方法可以更有效地精简特征,提高分类质量.

  20. Feature subset selection algorithms for incomplete decision systems based on neighborhood rough sets%基于邻域粗糙集的不完整决策系统特征选择算法

    Institute of Scientific and Technical Information of China (English)

    谢娟英; 李楠; 乔子芮

    2011-01-01

    针对不完整决策系统属性约简算法时间复杂度较高问题,基于正域不变条件下,决策系统分类能力保持不变原则,提出不完整决策系统前向顺序特征选择算法.该算法从约简集为空集开始,根据在约简集合中加入各属性后对正域影响程度大小将属性降序排列,采用顺序前向搜索,选择当前最佳特征加入特征约简集合,确定最佳特征子集.将该算法扩展到基于邻域粗糙集的实值和混合型不完整决策系统,得到基于邻域粗糙集的不完整决策系统前向顺序特征选择算法.同时,将基于相容关系的不完整决策系统快速属性约简算法推广到实值和混合属性的不完整决策系统,得到适用于实值、混合属性的不完整决策系统后向特征选择算法.理论分析和University of California Irvine机器学习数据库数据集的实验共同表明,本文提出的基于邻域粗糙集的不完整决策系统前向特征选择算法有效降低了不完整决策系统特征选择算法的时间复杂度,在保持系统识别能力的情况下,用更少的时间得到决策系统的属性约简子集,即特征子集.然而,本文前向特征选择算法的缺陷是有可能因为无法选择到第一个最重要的特征(属性)而使特征选择过程不能进行下去,从而不能完成特征选择过程.%New feature subset selection algorithms are presented in this paper to reduce the heavy computational load of available algorithms to feature subset selection for incomplete decision systems.We firstly propose the forward sequential feature selection algorithm for incomplete decision systems based on the fact that that the discernibility of an incomplete decision system will not change with its unchangeable positive region;then we generalize the algorithm to heterogeneous incomplete decision systems based on neighborhood rough sets theory;finally we extend the fast approach to attribute reduction in incomplete

  1. A Study on Feature Selection Techniques in Educational Data Mining

    CERN Document Server

    Ramaswami, M

    2009-01-01

    Educational data mining (EDM) is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the behaviors of students in the learning process. In this EDM, feature selection is to be made for the generation of subset of candidate variables. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. In this connection, the present study is devoted not only to investigate the most relevant subset features with minimum cardinality for achieving high predictive performance by adopting various filtered feature selection techniques in data mining but also to evaluate the goodness of subsets with different cardinalities and the quality of six filtered feature selection algorithms in terms of F-measure value and Receiver Operating Characteristics (ROC) value, generat...

  2. Unsupervised Feature Selection for Latent Dirichlet Allocation

    Institute of Scientific and Technical Information of China (English)

    Xu Weiran; Du Gang; Chen Guang; Guo Jun; Yang Jie

    2011-01-01

    As a generative model Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to distinguish between “general word” and “special word” in LDA topics.Therefore,we add a constraint to the LDA objective function to let the “general words” only happen in “general topics”other than “special topics”.Then a heuristic algorithm is presented to get the solution.Experiments show that this method can not only improve the information gain of topics,but also make the topics easier to understand by human.

  3. Optimized Image Steganalysis through Feature Selection using MBEGA

    CERN Document Server

    Geetha, S

    2010-01-01

    Feature based steganalysis, an emerging branch in information forensics, aims at identifying the presence of a covert communication by employing the statistical features of the cover and stego image as clues/evidences. Due to the large volumes of security audit data as well as complex and dynamic properties of steganogram behaviours, optimizing the performance of steganalysers becomes an important open problem. This paper is focussed at fine tuning the performance of six promising steganalysers in this field, through feature selection. We propose to employ Markov Blanket-Embedded Genetic Algorithm (MBEGA) for stego sensitive feature selection process. In particular, the embedded Markov blanket based memetic operators add or delete features (or genes) from a genetic algorithm (GA) solution so as to quickly improve the solution and fine-tune the search. Empirical results suggest that MBEGA is effective and efficient in eliminating irrelevant and redundant features based on both Markov blanket and predictive pow...

  4. A New Heuristic for Feature Selection by Consistent Biclustering

    CERN Document Server

    Mucherino, Antonio

    2010-01-01

    Given a set of data, biclustering aims at finding simultaneous partitions in biclusters of its samples and of the features which are used for representing the samples. Consistent biclusterings allow to obtain correct classifications of the samples from the known classification of the features, and vice versa, and they are very useful for performing supervised classifications. The problem of finding consistent biclusterings can be seen as a feature selection problem, where the features that are not relevant for classification purposes are removed from the set of data, while the total number of features is maximized in order to preserve information. This feature selection problem can be formulated as a linear fractional 0-1 optimization problem. We propose a reformulation of this problem as a bilevel optimization problem, and we present a heuristic algorithm for an efficient solution of the reformulated problem. Computational experiments show that the presented algorithm is able to find better solutions with re...

  5. Modeling Suspicious Email Detection using Enhanced Feature Selection

    OpenAIRE

    2013-01-01

    The paper presents a suspicious email detection model which incorporates enhanced feature selection. In the paper we proposed the use of feature selection strategies along with classification technique for terrorists email detection. The presented model focuses on the evaluation of machine learning algorithms such as decision tree (ID3), logistic regression, Na\\"ive Bayes (NB), and Support Vector Machine (SVM) for detecting emails containing suspicious content. In the literature, various algo...

  6. Ensemble feature selection integrating elitist roles and quantum game model

    Institute of Scientific and Technical Information of China (English)

    Weiping Ding; Jiandong Wang; Zhijin Guan; Quan Shi

    2015-01-01

    To accelerate the selection process of feature subsets in the rough set theory (RST), an ensemble elitist roles based quantum game (EERQG) algorithm is proposed for feature selec-tion. Firstly, the multilevel elitist roles based dynamics equilibrium strategy is established, and both immigration and emigration of elitists are able to be self-adaptive to balance between exploration and exploitation for feature selection. Secondly, the utility matrix of trust margins is introduced to the model of multilevel elitist roles to enhance various elitist roles’ performance of searching the optimal feature subsets, and the win-win utility solutions for feature selec-tion can be attained. Meanwhile, a novel ensemble quantum game strategy is designed as an intriguing exhibiting structure to perfect the dynamics equilibrium of multilevel elitist roles. Final y, the en-semble manner of multilevel elitist roles is employed to achieve the global minimal feature subset, which wil greatly improve the fea-sibility and effectiveness. Experiment results show the proposed EERQG algorithm has superiority compared to the existing feature selection algorithms.

  7. A Feature Subset Selection Algorithm Based on Neighborhood Rough Set for Incremental Updating Datasets%基于邻域粗糙集的增量特征选择

    Institute of Scientific and Technical Information of China (English)

    李楠; 谢娟英

    2011-01-01

    A feature subset selection algorithm is presented based on neighborhood rough set theory for die datasets which are updated by the increment in their samples. It is well known that the increment in samples can cause the changeable in the reduction of attributes of the dataset. Did a through-paced analysis to the variety on positive region brought by the new added sample to the dataset, and discussed the selective updating to the feature subset (attribute reduction) according to all the cases. The selective updating to the original reduction of attributes of the dataset can avoid the unwanted operations, and reduce the complexity of the feature subset selection algorithm. Finally, gave a real example and demonstrated the algorithm.%针对连续型属性的数据集,当有新样本加入时,可能引起最佳属性约简子集变化的问题,提出了基于邻域粗糙集的特征子集增量式更新方法.根据新增样本对正域的影响,分情况对原数据集的属性约简子集进行动态更新,以便得到增加样本后的新数据的最佳属性约简子集.这种对原约简集合进行的有选择的动态更新可以有效地避免重复操作,降低算法复杂度,只有在最坏的情况下才需要对整个数据集进行重新约简.并以一个实例进行分析说明.实例分析表明,先对新增样本进行分析,然后选择性对新数据集进行约简可以有效地避免重复操作,得到新数据集的最佳属性约简子集.

  8. Application of genetic algorithm-kernel partial least square as a novel non-linear feature selection method: partitioning of drug molecules.

    Science.gov (United States)

    Noorizadeh, H; Sobhan Ardakani, S; Ahmadi, T; Mortazavi, S S; Noorizadeh, M

    2013-02-01

    Genetic algorithm (GA) and partial least squares (PLS) and kernel PLS (KPLS) techniques were used to investigate the correlation between immobilized liposome chromatography partitioning (log Ks) and descriptors for 65 drug compounds. The models were validated using leave-group-out cross validation LGO-CV. The results indicate that GA-KPLS can be used as an alternative modelling tool for quantitative structure-property relationship (QSPR) studies.

  9. Fast point cloud registration algorithm using multiscale angle features

    Science.gov (United States)

    Lu, Jun; Guo, Congling; Fang, Ying; Xia, Guihua; Wang, Wanjia; Elahi, Ahsan

    2017-05-01

    To fulfill the demands of rapid and real-time three-dimensional optical measurement, a fast point cloud registration algorithm using multiscale axis angle features is proposed. The key point is selected based on the mean value of scalar projections of the vectors from the estimated point to the points in the neighborhood on the normal of the estimated point. This method has a small amount of computation and good discriminating ability. A rotation invariant feature is proposed using the angle information calculated based on multiscale coordinate axis. The feature descriptor of a key point is computed using cosines of the angles between corresponding coordinate axes. Using this method, the surface information around key points is obtained sufficiently in three axes directions and it is easy to recognize. The similarity of descriptors is employed to quickly determine the initial correspondences. The rigid spatial distance invariance and clustering selection method are used to make the corresponding relationships more accurate and evenly distributed. Finally, the rotation matrix and translation vector are determined using the method of singular value decomposition. Experimental results show that the proposed algorithm has high precision, fast matching speed, and good antinoise capability.

  10. Particle swarm optimization and genetic algorithm as feature selection techniques for the QSAR modeling of imidazo[1,5-a]pyrido[3,2-e]pyrazines, inhibitors of phosphodiesterase 10A.

    Science.gov (United States)

    Goodarzi, Mohammad; Saeys, Wouter; Deeb, Omar; Pieters, Sigrid; Vander Heyden, Yvan

    2013-12-01

    Quantitative structure-activity relationship (QSAR) modeling was performed for imidazo[1,5-a]pyrido[3,2-e]pyrazines, which constitute a class of phosphodiesterase 10A inhibitors. Particle swarm optimization (PSO) and genetic algorithm (GA) were used as feature selection techniques to find the most reliable molecular descriptors from a large pool. Modeling of the relationship between the selected descriptors and the pIC50 activity data was achieved by linear [multiple linear regression (MLR)] and non-linear [locally weighted regression (LWR) based on both Euclidean (E) and Mahalanobis (M) distances] methods. In addition, a stepwise MLR model was built using only a limited number of quantum chemical descriptors, selected because of their correlation with the pIC50 . The model was not found interesting. It was concluded that the LWR model, based on the Euclidean distance, applied on the descriptors selected by PSO has the best prediction ability. However, some other models behaved similarly. The root-mean-squared errors of prediction (RMSEP) for the test sets obtained by PSO/MLR, GA/MLR, PSO/LWRE, PSO/LWRM, GA/LWRE, and GA/LWRM models were 0.333, 0.394, 0.313, 0.333, 0.421, and 0.424, respectively. The PSO-selected descriptors resulted in the best prediction models, both linear and non-linear.

  11. Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2015-01-01

    Full Text Available Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs. Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  12. Simultaneous channel and feature selection of fused EEG features based on Sparse Group Lasso.

    Science.gov (United States)

    Wang, Jin-Jia; Xue, Fang; Li, Hui

    2015-01-01

    Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs). Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  13. 基于蚁群算法特征选择的语音情感识别%Feature Selection of Speech Emotional Recognition Based on Ant Colony Optimization Algorithm

    Institute of Scientific and Technical Information of China (English)

    杨鸿章

    2013-01-01

    情感特征提取是语音情感准确识别的关键,传统方法采用单一特征或者简单组合特征提取方法,单一特征无法全面反映语音情感变化,简单组合特征会使特征间产生大量冗余特征,影响识别正确结果.为了提高语音情感识别率,提了一种蚁群算法的语音情感智能识别方法.首先采用语音识别正确率和特征子集维数加权作为目标函数,然后利用蚁群算法找到最优语音特征子集,消除特征冗余信息.通过汉话和丹麦语两种情感语音库进行仿真测试,仿真结果表明,改进方法不仅消除了冗余、无用特征,降低了特征维数,而且提高了语音情感识别率,是一种有效的语音情感智能识别方法.%Speech emotion information has the characteristics of high dimension and redundancy, in order to improve the accuracy of speech emotion recognition, this paper put forward a speech emotion recognition model to select features based on ant colony optimization algorithm. The classification accuracy of KNN and the selected feature dimension form the fitness function, and the ant colony optimization algorithm provides good global searching capability and multiple sub - optimal solutions. A local refinement searching scheme was designed to exclude the redundant features and improve the convergence rate. The performance of method was tested by Chinese emotional speech database and a Danish Emotional Speech. The simulation results show that the proposed method can not only eliminate redundant and useless features to reduce the dimension of features, but also improve the speech emotion recognition rate, therefore the proposed model is an effective speech emotion recognition method.

  14. [Electroencephalogram Feature Selection Based on Correlation Coefficient Analysis].

    Science.gov (United States)

    Zhou, Jinzhi; Tang, Xiaofang

    2015-08-01

    In order to improve the accuracy of classification with small amount of motor imagery training data on the development of brain-computer interface (BCD systems, we proposed an analyzing method to automatically select the characteristic parameters based on correlation coefficient analysis. Throughout the five sample data of dataset IV a from 2005 BCI Competition, we utilized short-time Fourier transform (STFT) and correlation coefficient calculation to reduce the number of primitive electroencephalogram dimension, then introduced feature extraction based on common spatial pattern (CSP) and classified by linear discriminant analysis (LDA). Simulation results showed that the average rate of classification accuracy could be improved by using correlation coefficient feature selection method than those without using this algorithm. Comparing with support vector machine (SVM) optimization features algorithm, the correlation coefficient analysis can lead better selection parameters to improve the accuracy of classification.

  15. Evaluation of feature detection algorithms for structure from motion

    CSIR Research Space (South Africa)

    Govender, N

    2009-11-01

    Full Text Available such as Harris corner detectors and feature descriptors such as SIFT (Scale Invariant Feature Transform) and SURF (Speeded Up Robust Features) given a set of input images. This paper implements state-of-the art feature detection algorithms and evaluates...

  16. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HU Yue-li; CAO Jia-lin; ZHAO Qian; FENG Xu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92 % using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  17. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HUYue-li; CAOJia-lin; ZHAOQian; FENGXu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92% using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  18. Epidemic features affecting the performance of outbreak detection algorithms

    Directory of Open Access Journals (Sweden)

    Kuang Jie

    2012-06-01

    Full Text Available Abstract Background Outbreak detection algorithms play an important role in effective automated surveillance. Although many algorithms have been designed to improve the performance of outbreak detection, few published studies have examined how epidemic features of infectious disease impact on the detection performance of algorithms. This study compared the performance of three outbreak detection algorithms stratified by epidemic features of infectious disease and examined the relationship between epidemic features and performance of outbreak detection algorithms. Methods Exponentially weighted moving average (EWMA, cumulative sum (CUSUM and moving percentile method (MPM algorithms were applied. We inserted simulated outbreaks into notifiable infectious disease data in China Infectious Disease Automated-alert and Response System (CIDARS, and compared the performance of the three algorithms with optimized parameters at a fixed false alarm rate of 5% classified by epidemic features of infectious disease. Multiple linear regression was adopted to analyse the relationship of the algorithms’ sensitivity and timeliness with the epidemic features of infectious diseases. Results The MPM had better detection performance than EWMA and CUSUM through all simulated outbreaks, with or without stratification by epidemic features (incubation period, baseline counts and outbreak magnitude. The epidemic features were associated with both sensitivity and timeliness. Compared with long incubation, short incubation had lower probability (β* = −0.13, P  Conclusions The results of this study suggest that the MPM is a prior algorithm for outbreak detection and differences of epidemic features in detection performance should be considered in automatic surveillance practice.

  19. Feature Selection for Audio Surveillance in Urban Environment

    Directory of Open Access Journals (Sweden)

    KIKTOVA Eva

    2014-05-01

    Full Text Available This paper presents the work leading to the acoustic event detection system, which is designed to recognize two types of acoustic events (shot and breaking glass in urban environment. For this purpose, a huge front-end processing was performed for the effective parametric representation of an input sound. MFCC features and features computed during their extraction (MELSPEC and FBANK, then MPEG-7 audio descriptors and other temporal and spectral characteristics were extracted. High dimensional feature sets were created and in the next phase reduced by the mutual information based selection algorithms. Hidden Markov Model based classifier was applied and evaluated by the Viterbi decoding algorithm. Thus very effective feature sets were identified and also the less important features were found.

  20. Evaluation of Feature Selection Approaches for Urdu Text Categorization

    Directory of Open Access Journals (Sweden)

    Tehseen Zia

    2015-05-01

    Full Text Available Efficient feature selection is an important phase of designing an effective text categorization system. Various feature selection methods have been proposed for selecting dissimilar feature sets. It is often essential to evaluate that which method is more effective for a given task and what size of feature set is an effective model selection choice. Aim of this paper is to answer these questions for designing Urdu text categorization system. Five widely used feature selection methods were examined using six well-known classification algorithms: naive Bays (NB, k-nearest neighbor (KNN, support vector machines (SVM with linear, polynomial and radial basis kernels and decision tree (i.e. J48. The study was conducted over two test collections: EMILLE collection and a naive collection. We have observed that three feature selection methods i.e. information gain, Chi statistics, and symmetrical uncertain, have performed uniformly in most of the cases if not all. Moreover, we have found that no single feature selection method is best for all classifiers. While gain ratio out-performed others for naive Bays and J48, information gain has shown top performance for KNN and SVM with polynomial and radial basis kernels. Overall, linear SVM with any of feature selection methods including information gain, Chi statistics or symmetric uncertain methods is turned-out to be first choice across other combinations of classifiers and feature selection methods on moderate size naive collection. On the other hand, naive Bays with any of feature selection method have shown its advantage for a small sized EMILLE corpus.

  1. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-13

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.

  2. Stable Feature Selection for Biomarker Discovery

    CERN Document Server

    He, Zengyou

    2010-01-01

    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development.

  3. Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data

    Indian Academy of Sciences (India)

    Debarka Sengupta; Indranil Aich; Sanghamitra Bandyopadhyay

    2015-10-01

    Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.

  4. Hadoop neural network for parallel and distributed feature selection.

    Science.gov (United States)

    Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

    2016-06-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. Selecting materialized views using random algorithm

    Science.gov (United States)

    Zhou, Lijuan; Hao, Zhongxiao; Liu, Chi

    2007-04-01

    The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous distributed databases. The information stored at the data warehouse is in form of views referred to as materialized views. The selection of the materialized views is one of the most important decisions in designing a data warehouse. Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical processing queries. The first issue for the user to consider is query response time. So in this paper, we develop algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost under the constraint of a given query response time. We call it query_cost view_ selection problem. First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the materialized views selection problem. But with the development of genetic process, the legal solution produced become more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized algorithms will become invaluable tools for data warehouse evolution.

  6. Iterative selection algorithm for service composition in distributed environments

    Institute of Scientific and Technical Information of China (English)

    SU Sen; LI Fei; YANG FangChun

    2008-01-01

    In service oriented architecture (SOA), service composition is a promising way to create new services. However, some technical challenges are hindering the application of service composition. One of the greatest challenges for composite service provider is to select a set of services to instantiate composite service with end to-end quality of service (QoS) assurance across different autonomous networks and business regions. This paper presents an iterative service selection algorithm for quality driven service composition. The algorithm runs on a peer-to-peer (P2P) service execution environment - distributed intelligent service execution (DISE),which provides scalable QoS registry, dynamic service selection and service execution services. The most significant feature of our iterative service selection algorithm is that it can work on a centralized QoS registry as well as cross decentralized ones. Network status is an optional factor in our QoS model and selection algorithm. The algorithm iteratively selects services following service execution order, so it can be applied either before service execution or at service run-time without any modification. We test our algorithm with a series of experiments on DISE. Experimental results illustrated its excellent selection and outstanding performance.

  7. Feature selection for anomaly–based network intrusion detection using cluster validity indices

    CSIR Research Space (South Africa)

    Naidoo, T

    2015-09-01

    Full Text Available A feature selection algorithm that is novel in the context of anomaly–based network intrusion detection is proposed in this paper. The distinguishing factor of the proposed feature selection algorithm is its complete lack of dependency on labelled...

  8. An Improved Prototype Pattern Selection Algorithm

    Institute of Scientific and Technical Information of China (English)

    Fang Xiuduan; Liu Binhan; Wang Weizhi

    2002-01-01

    The selection of prototype patterns plays a decisive part in the performance of synergetic neural network. Amongst the existing prototype pattern selection schemes, the learning algorithm based on information superposition presented by Wang[1] is the most efficient. However, it has a degree parameter greatly affecting the training process to be determined. To overcome this drawback, an improved algorithm is presented and discussed here. This approach makes use of Genetic Algorithm, a stochastic global search method, to search the global optimum of the unknown parameter in a small search space. Therefore, it converges fairly fast. The experimental results also demonstrate its effectivity.

  9. Surface Defect Target Identification on Copper Strip Based on Adaptive Genetic Algorithm and Feature Saliency

    Directory of Open Access Journals (Sweden)

    Xuewu Zhang

    2013-01-01

    Full Text Available To enhance the stability and robustness of visual inspection system (VIS, a new surface defect target identification method for copper strip based on adaptive genetic algorithm (AGA and feature saliency is proposed. First, the study uses gray level cooccurrence matrix (GLCM and HU invariant moments for feature extraction. Then, adaptive genetic algorithm, which is used for feature selection, is evaluated and discussed. In AGA, total error rates and false alarm rates are integrated to calculate the fitness value, and the probability of crossover and mutation is adjusted dynamically according to the fitness value. At last, the selected features are optimized in accordance with feature saliency and are inputted into a support vector machine (SVM. Furthermore, for comparison, we conduct experiments using the selected optimal feature subsequence (OFS and the total feature sequence (TFS separately. The experimental results demonstrate that the proposed method can guarantee the correct rates of classification and can lower the false alarm rates.

  10. Novel and efficient tag SNPs selection algorithms.

    Science.gov (United States)

    Chen, Wen-Pei; Hung, Che-Lun; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

    2014-01-01

    SNPs are the most abundant forms of genetic variations amongst species; the association studies between complex diseases and SNPs or haplotypes have received great attention. However, these studies are restricted by the cost of genotyping all SNPs; thus, it is necessary to find smaller subsets, or tag SNPs, representing the rest of the SNPs. In fact, the existing tag SNP selection algorithms are notoriously time-consuming. An efficient algorithm for tag SNP selection was presented, which was applied to analyze the HapMap YRI data. The experimental results show that the proposed algorithm can achieve better performance than the existing tag SNP selection algorithms; in most cases, this proposed algorithm is at least ten times faster than the existing methods. In many cases, when the redundant ratio of the block is high, the proposed algorithm can even be thousands times faster than the previously known methods. Tools and web services for haplotype block analysis integrated by hadoop MapReduce framework are also developed using the proposed algorithm as computation kernels.

  11. Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection

    Directory of Open Access Journals (Sweden)

    Shengyu Liu

    2015-01-01

    Full Text Available Drug name recognition (DNR is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  12. Feature engineering for drug name recognition in biomedical texts: feature conjunction and feature selection.

    Science.gov (United States)

    Liu, Shengyu; Tang, Buzhou; Chen, Qingcai; Wang, Xiaolong; Fan, Xiaoming

    2015-01-01

    Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  13. Online Feature Extraction Algorithms for Data Streams

    Science.gov (United States)

    Ozawa, Seiichi

    Along with the development of the network technology and high-performance small devices such as surveillance cameras and smart phones, various kinds of multimodal information (texts, images, sound, etc.) are captured real-time and shared among systems through networks. Such information is given to a system as a stream of data. In a person identification system based on face recognition, for example, image frames of a face are captured by a video camera and given to the system for an identification purpose. Those face images are considered as a stream of data. Therefore, in order to identify a person more accurately under realistic environments, a high-performance feature extraction method for streaming data, which can be autonomously adapted to the change of data distributions, is solicited. In this review paper, we discuss a recent trend on online feature extraction for streaming data. There have been proposed a variety of feature extraction methods for streaming data recently. Due to the space limitation, we here focus on the incremental principal component analysis.

  14. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2017-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  15. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2016-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  16. Supervised Feature Subset Selection based on Modified Fuzzy Relative Information Measure for classifier Cart

    Directory of Open Access Journals (Sweden)

    K.SAROJINI,

    2010-06-01

    Full Text Available Feature subset selection is an essential task in data mining. This paper presents a new method for dealing with supervised feature subset selection based on Modified Fuzzy Relative Information Measure (MFRIM. First, Discretization algorithm is applied to discretize numeric features to construct the membership functions of each fuzzy sets of a feature. Then the proposed MFRIM is applied to select the feature subset focusing on boundary samples. The proposed method can select feature subset with minimum number of features, which are relevant to get higher average classification accuracy for datasets. The experimental results with UCI datasets show that the proposed algorithm is effective and efficient in selecting subset with minimum number of features getting higher average classification accuracy than the consistency based feature subset selection method.

  17. DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS

    Directory of Open Access Journals (Sweden)

    A. A. Vorobeva

    2017-01-01

    Full Text Available The paper deals with identification and authentication of web users participating in the Internet information processes (based on features of online texts.In digital forensics web user identification based on various linguistic features can be used to discover identity of individuals, criminals or terrorists using the Internet to commit cybercrimes. Internet could be used as a tool in different types of cybercrimes (fraud and identity theft, harassment and anonymous threats, terrorist or extremist statements, distribution of illegal content and information warfare. Linguistic identification of web users is a kind of biometric identification, it can be used to narrow down the suspects, identify a criminal and prosecute him. Feature set includes various linguistic and stylistic features extracted from online texts. We propose dynamic feature selection for each web user identification task. Selection is based on calculating Manhattan distance to k-nearest neighbors (Relief-f algorithm. This approach improves the identification accuracy and minimizes the number of features. Experiments were carried out on several datasets with different level of class imbalance. Experiment results showed that features relevance varies in different set of web users (probable authors of some text; features selection for each set of web users improves identification accuracy by 4% at the average that is approximately 1% higher than with the use of static set of features. The proposed approach is most effective for a small number of training samples (messages per user.

  18. Feature selection applied to ultrasound carotid images segmentation.

    Science.gov (United States)

    Rosati, Samanta; Molinari, Filippo; Balestra, Gabriella

    2011-01-01

    The automated tracing of the carotid layers on ultrasound images is complicated by noise, different morphology and pathology of the carotid artery. In this study we benchmarked four methods for feature selection on a set of variables extracted from ultrasound carotid images. The main goal was to select those parameters containing the highest amount of information useful to classify the pixels in the carotid regions they belong to. Six different classes of pixels were identified: lumen, lumen-intima interface, intima-media complex, media-adventitia interface, adventitia and adventitia far boundary. The performances of QuickReduct Algorithm (QRA), Entropy-Based Algorithm (EBR), Improved QuickReduct Algorithm (IQRA) and Genetic Algorithm (GA) were compared using Artificial Neural Networks (ANNs). All methods returned subsets with a high dependency degree, even if the average classification accuracy was about 50%. Among all classes, the best results were obtained for the lumen. Overall, the four methods for feature selection assessed in this study return comparable results. Despite the need for accuracy improvement, this study could be useful to build a pre-classifier stage for the optimization of segmentation performance in ultrasound automated carotid segmentation.

  19. Hyperspectral image classification based on NMF Features Selection Method

    Science.gov (United States)

    Abe, Bolanle T.; Jordaan, J. A.

    2013-12-01

    Hyperspectral instruments are capable of collecting hundreds of images corresponding to wavelength channels for the same area on the earth surface. Due to the huge number of features (bands) in hyperspectral imagery, land cover classification procedures are computationally expensive and pose a problem known as the curse of dimensionality. In addition, higher correlation among contiguous bands increases the redundancy within the bands. Hence, dimension reduction of hyperspectral data is very crucial so as to obtain good classification accuracy results. This paper presents a new feature selection technique. Non-negative Matrix Factorization (NMF) algorithm is proposed to obtain reduced relevant features in the input domain of each class label. This aimed to reduce classification error and dimensionality of classification challenges. Indiana pines of the Northwest Indiana dataset is used to evaluate the performance of the proposed method through experiments of features selection and classification. The Waikato Environment for Knowledge Analysis (WEKA) data mining framework is selected as a tool to implement the classification using Support Vector Machines and Neural Network. The selected features subsets are subjected to land cover classification to investigate the performance of the classifiers and how the features size affects classification accuracy. Results obtained shows that performances of the classifiers are significant. The study makes a positive contribution to the problems of hyperspectral imagery by exploring NMF, SVMs and NN to improve classification accuracy. The performances of the classifiers are valuable for decision maker to consider tradeoffs in method accuracy versus method complexity.

  20. SIFT Feature Matching Algorithm with Local Shape Context

    Directory of Open Access Journals (Sweden)

    Gu Lichuan

    2013-05-01

    Full Text Available SIFT (Scale Invariant Feature Transform is one of the most effective local feature of scale, rotation and illumination invariant, which is widely used in the field of image matching. While there will be a lot mismatches when an image has many similar regions. In this study, an improved SIFT feature matching algorithm with local shape context is put forward. The feature vectors are computed by dominant orientation assignment to each feature point based on elliptical neighboring region and with local shape context and then the feature vectors are matched by using Euclidean distance and the X2 distance. The experiment indicates that the improved algorithm can reduce mismatch probability and acquire good performance on affine invariance, improves matching results greatly.

  1. Comparison of two feature selection algorithms oriented to raw cotton ripeness discrimination%面向田间籽棉成熟度判别的二种特征选择算法比较

    Institute of Scientific and Technical Information of China (English)

    王玲; 刘德营; 姬长英

    2013-01-01

    To discriminate the ripeness of cotton quickly and accurately,15 shape structure features were extracted from cotton images and the execute efficiencies and classification accuracy of their feature subset selection algorithms such as Wrapper-based Exhaustive searching and Wrapper-based stopping(WE-W) and Filter-based Heuristic searching and Wrapper-based stopping(FH-W) were compared by using 10-fold cross-validation.By taking the error rate of a Bayes classifier on validation set (WE-W) and the class-separability measuring value on a training set (FH-W)as assessing functions,the optimal l (l=1,2,3,…,15) feature subset was searched by using exhaustive (WE-W) and heuristic (FH-W) strategies on the training set,which stops at the minimum error rate of Bayes-classifier on the validation set(WE-W and FH-W).Experimental results show that the average classification rates of WE-W and FH-W algorithms on the prediction set are 85.39% (WE-W) and 85.28% (FHW) at l=3,respectively.It concludes that the FH-W algorithm can be a reference in practice for its higher execute efficiency and good classification accuracy.%为了快速、准确地判别田间籽棉的成熟度,提取了描述棉瓣形状的15个结构特征,基于10折交叉验证比较了封装器下穷举搜索并基于封装器停止搜索(WE-W)和过滤器下启发式搜索并基于封装器停止搜索(FH-W)这二种特征选择算法的执行效率和分类性能.分别以验证集上Bayes分类器的误分率(WE-W)和训练集上的类可分性测量值(FH-W)为评价函数,在训练集上穷举搜索(WE-W)和启发式搜索(FH-W)最优l维特征子集,l=1,2,…,15,并于Bayes分类器在验证集上的平均误分率极小时停止搜索(WE-W和FH-W).结果显示,WE-W和FH-W算法在预测集上于l=3处分别获得了85.39% (WE-W)和85.28% (FH-W)的平均识别率,表明FH-W算法执行效率高、分类性能好,对实际应用有参考意义.

  2. Efficient Cluster Head Selection Algorithm for MANET

    Directory of Open Access Journals (Sweden)

    Khalid Hussain

    2013-01-01

    Full Text Available In mobile ad hoc network (MANET cluster head selection is considered a gigantic challenge. In wireless sensor network LEACH protocol can be used to select cluster head on the bases of energy, but it is still a dispute in mobil ad hoc networks and especially when nodes are itinerant. In this paper we proposed an efficient cluster head selection algorithm (ECHSA, for selection of the cluster head efficiently in Mobile ad hoc networks. We evaluate our proposed algorithm through simulation in OMNet++ as well as on test bed; we experience the result according to our assumption. For further evaluation we also compare our proposed protocol with several other protocols like LEACH-C and consequences show perfection.

  3. Anisotropic selection in cellular genetic algorithms

    CERN Document Server

    Simoncini, David; Collard, Philippe; Clergue, Manuel

    2008-01-01

    In this paper we introduce a new selection scheme in cellular genetic algorithms (cGAs). Anisotropic Selection (AS) promotes diversity and allows accurate control of the selective pressure. First we compare this new scheme with the classical rectangular grid shapes solution according to the selective pressure: we can obtain the same takeover time with the two techniques although the spreading of the best individual is different. We then give experimental results that show to what extent AS promotes the emergence of niches that support low coupling and high cohesion. Finally, using a cGA with anisotropic selection on a Quadratic Assignment Problem we show the existence of an anisotropic optimal value for which the best average performance is observed. Further work will focus on the selective pressure self-adjustment ability provided by this new selection scheme.

  4. Bidirectional scale-invariant feature transform feature matching algorithms based on priority k-d tree search

    National Research Council Canada - National Science Library

    Liu, XiangShao; Zhou, Shangbo; Li, Hua; Li, Kun

    2016-01-01

    In this article, a bidirectional feature matching algorithm and two extended algorithms based on the priority k-d tree search are presented for the image registration using scale-invariant feature transform features...

  5. An Improved Particle Swarm Optimization for Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Yuanning Liu; Gang Wang; Huiling Chen; Hao Dong; Xiaodong Zhu; Sujing Wang

    2011-01-01

    Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-swarm strategy have been reported. However, the competition among swarms, reservation or destruction of a swarm, has not been considered further. In this paper, we formulate four rules by introducing the mechanism for survival of the fittest, which simulates the competition among the swarms. Based on the mechanism, we design a modified Multi-Swarm PSO (MSPSO) to solve discrete problems,which consists of a number of sub-swarms and a multi-swarm scheduler that can monitor and control each sub-swarm using the rules. To further settle the feature selection problems, we propose an Improved Feature Selection (IFS) method by integrating MSPSO, Support Vector Machines (SVM) with F-score method. The IFS method aims to achieve higher generalization capability through performing kernel parameter optimization and feature selection simultaneously. The performance of the proposed method is compared with that of the standard PSO based, Genetic Algorithm (GA) based and the grid search based methods on 10 benchmark datasets, taken from UCI machine learning and StatLog databases. The numerical results and statistical analysis show that the proposed IFS method performs significantly better than the other three methods in terms of prediction accuracy with smaller subset of features.

  6. Peripheral Nonlinear Time Spectrum Features Algorithm for Large Vocabulary Mandarin Automatic Speech Recognition

    Institute of Scientific and Technical Information of China (English)

    Fadhil H. T. Al-dulaimy; WANG Zuoying

    2005-01-01

    This work describes an improved feature extractor algorithm to extract the peripheral features of point x(ti,fj) using a nonlinear algorithm to compute the nonlinear time spectrum (NL-TS) pattern. The algorithm observes n×n neighborhoods of the point in all directions, and then incorporates the peripheral features using the Mel frequency cepstrum components (MFCCs)-based feature extractor of the Tsinghua electronic engineering speech processing (THEESP) for Mandarin automatic speech recognition (MASR) system as replacements of the dynamic features with different feature combinations. In this algorithm, the orthogonal bases are extracted directly from the speech data using discrite cosime transformation (DCT) with 3×3 blocks on an NL-TS pattern as the peripheral features. The new primal bases are then selected and simplified in the form of the operator in the time direction and the operator in the frequency direction. The algorithm has 23.29% improvements of the relative error rate in comparison with the standard MFCC feature-set and the dynamic features in tests using THEESP with the duration distribution-based hidden Markov model (DDBHMM) based on MASR system.

  7. An Optimal SVM with Feature Selection Using Multiobjective PSO

    Directory of Open Access Journals (Sweden)

    Iman Behravan

    2016-01-01

    Full Text Available Support vector machine is a classifier, based on the structured risk minimization principle. The performance of the SVM depends on different parameters such as penalty factor, C, and the kernel factor, σ. Also choosing an appropriate kernel function can improve the recognition score and lower the amount of computation. Furthermore, selecting the useful features among several features in dataset not only increases the performance of the SVM, but also reduces the computational time and complexity. So this is an optimization problem which can be solved by heuristic algorithm. In some cases besides the recognition score, the reliability of the classifier’s output is important. So in such cases a multiobjective optimization algorithm is needed. In this paper we have got the MOPSO algorithm to optimize the parameters of the SVM, choose appropriate kernel function, and select the best feature subset simultaneously in order to optimize the recognition score and the reliability of the SVM concurrently. Nine different datasets, from UCI machine learning repository, are used to evaluate the power and the effectiveness of the proposed method (MOPSO-SVM. The results of the proposed method are compared to those which are achieved by single SVM, RBF, and MLP neural networks.

  8. Face Recognition Algorithms Based on Transformed Shape Features

    Directory of Open Access Journals (Sweden)

    Sambhunath Biswas

    2012-05-01

    Full Text Available Human face recognition is, indeed, a challenging task, especially under illumination and pose variations. We examine in the present paper effectiveness of two simple algorithms using coiflet packet and Radon transforms to recognize human faces from some databases of still gray level images, under the environment of illumination and pose variations. Both the algorithms convert 2-D gray level training face images into their respective depth maps or physical shape which are subsequently transformed by Coiflet packet and Radon transforms to compute energy for feature extraction. Experiments show that such transformed shape features are robust to illumination and pose variations. With the features extracted, training classes are optimally separated through linear discriminant analysis (LDA, while classification for test face images is made through a k-NN classifier, based on L1 norm and Mahalanobis distance measures. Proposed algorithms are then tested on face images that differ in illumination,expression or pose separately, obtained from three databases,namely, ORL, Yale and Essex-Grimace databases. Results, so obtained, are compared with two different existing algorithms.Performance using Daubechies wavelets is also examined. It is seen that the proposed Coiflet packet and Radon transform based algorithms have significant performance, especially under different illumination conditions and pose variation. Comparison shows the proposed algorithms are superior.

  9. Video segmentation using multiple features based on EM algorithm

    Institute of Scientific and Technical Information of China (English)

    张风超; 杨杰; 刘尔琦

    2004-01-01

    Object-based video segmentation is an important issue for many multimedia applications. A video segmentation method based on EM algorithm is proposed. We consider video segmentation as an unsupervised classification problem and apply EM algorithm to obtain the maximum-likelihood estimation of the Gaussian model parameters for model-based segmentation. We simultaneously combine multiple features (motion, color) within a maximum likelihood framework to obtain accurate segment results. We also use the temporal consistency among video frames to improve the speed of EM algorithm. Experimental results on typical MPEG-4 sequences and real scene sequences show that our method has an attractive accuracy and robustness.

  10. Coevolution of active vision and feature selection.

    Science.gov (United States)

    Floreano, Dario; Kato, Toshifumi; Marocco, Davide; Sauser, Eric

    2004-03-01

    We show that complex visual tasks, such as position- and size-invariant shape recognition and navigation in the environment, can be tackled with simple architectures generated by a coevolutionary process of active vision and feature selection. Behavioral machines equipped with primitive vision systems and direct pathways between visual and motor neurons are evolved while they freely interact with their environments. We describe the application of this methodology in three sets of experiments, namely, shape discrimination, car driving, and robot navigation. We show that these systems develop sensitivity to a number of oriented, retinotopic, visual-feature-oriented edges, corners, height, and a behavioral repertoire to locate, bring, and keep these features in sensitive regions of the vision system, resembling strategies observed in simple insects.

  11. Unsupervised Feature Selection Based on the Morisita Index

    Science.gov (United States)

    Golay, Jean; Kanevski, Mikhail

    2016-04-01

    Recent breakthroughs in technology have radically improved our ability to collect and store data. As a consequence, the size of datasets has been increasing rapidly both in terms of number of variables (or features) and number of instances. Since the mechanism of many phenomena is not well known, too many variables are sampled. A lot of them are redundant and contribute to the emergence of three major challenges in data mining: (1) the complexity of result interpretation, (2) the necessity to develop new methods and tools for data processing, (3) the possible reduction in the accuracy of learning algorithms because of the curse of dimensionality. This research deals with a new algorithm for selecting the smallest subset of features conveying all the information of a dataset (i.e. an algorithm for removing redundant features). It is a new version of the Fractal Dimensionality Reduction (FDR) algorithm [1] and it relies on two ideas: (a) In general, data lie on non-linear manifolds of much lower dimension than that of the spaces where they are embedded. (b) The situation describes in (a) is partly due to redundant variables, since they do not contribute to increasing the dimension of manifolds, called Intrinsic Dimension (ID). The suggested algorithm implements these ideas by selecting only the variables influencing the data ID. Unlike the FDR algorithm, it resorts to a recently introduced ID estimator [2] based on the Morisita index of clustering and to a sequential forward search strategy. Consequently, in addition to its ability to capture non-linear dependences, it can deal with large datasets and its implementation is straightforward in any programming environment. Many real world case studies are considered. They are related to environmental pollution and renewable resources. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158

  12. A review of channel selection algorithms for EEG signal processing

    Science.gov (United States)

    Alotaiby, Turky; El-Samie, Fathi E. Abd; Alshebeili, Saleh A.; Ahmad, Ishtiaq

    2015-12-01

    Digital processing of electroencephalography (EEG) signals has now been popularly used in a wide variety of applications such as seizure detection/prediction, motor imagery classification, mental task classification, emotion classification, sleep state classification, and drug effects diagnosis. With the large number of EEG channels acquired, it has become apparent that efficient channel selection algorithms are needed with varying importance from one application to another. The main purpose of the channel selection process is threefold: (i) to reduce the computational complexity of any processing task performed on EEG signals by selecting the relevant channels and hence extracting the features of major importance, (ii) to reduce the amount of overfitting that may arise due to the utilization of unnecessary channels, for the purpose of improving the performance, and (iii) to reduce the setup time in some applications. Signal processing tools such as time-domain analysis, power spectral estimation, and wavelet transform have been used for feature extraction and hence for channel selection in most of channel selection algorithms. In addition, different evaluation approaches such as filtering, wrapper, embedded, hybrid, and human-based techniques have been widely used for the evaluation of the selected subset of channels. In this paper, we survey the recent developments in the field of EEG channel selection methods along with their applications and classify these methods according to the evaluation approach.

  13. A Meta-Heuristic Regression-Based Feature Selection for Predictive Analytics

    Directory of Open Access Journals (Sweden)

    Bharat Singh

    2014-11-01

    Full Text Available A high-dimensional feature selection having a very large number of features with an optimal feature subset is an NP-complete problem. Because conventional optimization techniques are unable to tackle large-scale feature selection problems, meta-heuristic algorithms are widely used. In this paper, we propose a particle swarm optimization technique while utilizing regression techniques for feature selection. We then use the selected features to classify the data. Classification accuracy is used as a criterion to evaluate classifier performance, and classification is accomplished through the use of k-nearest neighbour (KNN and Bayesian techniques. Various high dimensional data sets are used to evaluate the usefulness of the proposed approach. Results show that our approach gives better results when compared with other conventional feature selection algorithms.

  14. Application of Random Forest Algorithm in Improtant Feature Selection from EMG Signal%随机森林算法在肌电的重要特征选择中的应用

    Institute of Scientific and Technical Information of China (English)

    张洪强; 刘光远; 赖祥伟

    2013-01-01

    It is a hard problem that how to find the effective feature from high-dimensional feature in EMG signal emotion recognition. This paper used random forest algorithm to comput the contribution in different emotion recognition of the 126 EMG signal, depending on the feature evaluation criteria of random forest algorithm, and then preferentially made up the features that have more contribution to emotion recognition,and used then to emotion recognition. Experiments show it is reasonable.%在肌电信号的情感识别问题中,如何从高维特征中找出起关键作用的特征,一直是情感识别的难题.使用随机森林算法,并依照其对特征的评价准则,来计算肌电信号的126个初始特征在不同情感模式分类中的贡献度.依照每个特征的重要程度,优先组合贡献度大的特征并将其用于情感的分类.实验数据验证了该方法的有效性.

  15. Magnetic Field Feature Extraction and Selection for Indoor Location Estimation

    Directory of Open Access Journals (Sweden)

    Carlos E. Galván-Tejada

    2014-06-01

    Full Text Available User indoor positioning has been under constant improvement especially with the availability of new sensors integrated into the modern mobile devices, which allows us to exploit not only infrastructures made for everyday use, such as WiFi, but also natural infrastructure, as is the case of natural magnetic field. In this paper we present an extension and improvement of our current indoor localization model based on the feature extraction of 46 magnetic field signal features. The extension adds a feature selection phase to our methodology, which is performed through Genetic Algorithm (GA with the aim of optimizing the fitness of our current model. In addition, we present an evaluation of the final model in two different scenarios: home and office building. The results indicate that performing a feature selection process allows us to reduce the number of signal features of the model from 46 to 5 regardless the scenario and room location distribution. Further, we verified that reducing the number of features increases the probability of our estimator correctly detecting the user’s location (sensitivity and its capacity to detect false positives (specificity in both scenarios.

  16. Ad Hoc Access Gateway Selection Algorithm

    Science.gov (United States)

    Jie, Liu

    With the continuous development of mobile communication technology, Ad Hoc access network has become a hot research, Ad Hoc access network nodes can be used to expand capacity of multi-hop communication range of mobile communication system, even business adjacent to the community, improve edge data rates. For mobile nodes in Ad Hoc network to internet, internet communications in the peer nodes must be achieved through the gateway. Therefore, the key Ad Hoc Access Networks will focus on the discovery gateway, as well as gateway selection in the case of multi-gateway and handover problems between different gateways. This paper considers the mobile node and the gateway, based on the average number of hops from an average access time and the stability of routes, improved gateway selection algorithm were proposed. An improved gateway selection algorithm, which mainly considers the algorithm can improve the access time of Ad Hoc nodes and the continuity of communication between the gateways, were proposed. This can improve the quality of communication across the network.

  17. Artificial immune system based on adaptive clonal selection for feature selection and parameters optimisation of support vector machines

    Science.gov (United States)

    Sadat Hashemipour, Maryam; Soleimani, Seyed Ali

    2016-01-01

    Artificial immune system (AIS) algorithm based on clonal selection method can be defined as a soft computing method inspired by theoretical immune system in order to solve science and engineering problems. Support vector machine (SVM) is a popular pattern classification method with many diverse applications. Kernel parameter setting in the SVM training procedure along with the feature selection significantly impacts on the classification accuracy rate. In this study, AIS based on Adaptive Clonal Selection (AISACS) algorithm has been used to optimise the SVM parameters and feature subset selection without degrading the SVM classification accuracy. Several public datasets of University of California Irvine machine learning (UCI) repository are employed to calculate the classification accuracy rate in order to evaluate the AISACS approach then it was compared with grid search algorithm and Genetic Algorithm (GA) approach. The experimental results show that the feature reduction rate and running time of the AISACS approach are better than the GA approach.

  18. A linear-time algorithm for Euclidean feature transform sets

    NARCIS (Netherlands)

    Hesselink, Wim H.

    2007-01-01

    The Euclidean distance transform of a binary image is the function that assigns to every pixel the Euclidean distance to the background. The Euclidean feature transform is the function that assigns to every pixel the set of background pixels with this distance. We present an algorithm to compute the

  19. Feature-extraction algorithms for the PANDA electromagnetic calorimeter

    NARCIS (Netherlands)

    Kavatsyuk, M.; Guliyev, E.; Lemmens, P. J. J.; Loehner, H.; Poelman, T. P.; Tambave, G.; Yu, B

    2009-01-01

    The feature-extraction algorithms are discussed which have been developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA detector at the future FAIR facility. Performance parameters have been derived in test measurements with cosmic rays, particle and photon

  20. Feature-extraction algorithms for the PANDA electromagnetic calorimeter

    NARCIS (Netherlands)

    Kavatsyuk, M.; Guliyev, E.; Lemmens, P. J. J.; Loehner, H.; Poelman, T. P.; Tambave, G.; Yu, B

    2009-01-01

    The feature-extraction algorithms are discussed which have been developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA detector at the future FAIR facility. Performance parameters have been derived in test measurements with cosmic rays, particle and photon be

  1. Gradient Algorithm on Stiefel Manifold and Application in Feature Extraction

    Directory of Open Access Journals (Sweden)

    Zhang Jian-jun

    2013-09-01

    Full Text Available To improve the computational efficiency of system feature extraction, reduce the occupied memory space, and simplify the program design, a modified gradient descent method on Stiefel manifold is proposed based on the optimization algorithm of geometry frame on the Riemann manifold. Different geodesic calculation formulas are used for different scenarios. A polynomial is also used to lie close to the geodesic equations. JiuZhaoQin-Horner polynomial algorithm and the strategies of line-searching technique and change of the step size of iteration are also adopted. The gradient descent algorithm on Stiefel manifold applied in Principal Component Analysis (PCA is discussed in detail as an example of system feature extraction. Theoretical analysis and simulation experiments show that the new method can achieve superior performance in both the convergence rate and calculation efficiency while ensuring the unitary column orthogonality. In addition, it is easier to implement by software or hardware.

  2. Cluster analysis based on dimensional information with applications to feature selection and classification

    Science.gov (United States)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  3. Cluster analysis based on dimensional information with applications to feature selection and classification

    Science.gov (United States)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  4. Probabilistic Route Selection Algorithm for IP Traceback

    Science.gov (United States)

    Yim, Hong-Bin; Jung, Jae-Il

    DoS(Denial of Service) or DDoS(Distributed DoS) attack is a major threaten and the most difficult problem to solve among many attacks. Moreover, it is very difficult to find a real origin of attackers because DoS/DDoS attacker uses spoofed IP addresses. To solve this problem, we propose a probabilistic route selection traceback algorithm, namely PRST, to trace the attacker's real origin. This algorithm uses two types of packets such as an agent packet and a reply agent packet. The agent packet is in use to find the attacker's real origin and the reply agent packet is in use to notify to a victim that the agent packet is reached the edge router of the attacker. After attacks occur, the victim generates the agent packet and sends it to a victim's edge router. The attacker's edge router received the agent packet generates the reply agent packet and send it to the victim. The agent packet and the reply agent packet is forwarded refer to probabilistic packet forwarding table (PPFT) by routers. The PRST algorithm runs on the distributed routers and PPFT is stored and managed by routers. We validate PRST algorithm by using mathematical approach based on Poisson distribution.

  5. Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

    Energy Technology Data Exchange (ETDEWEB)

    Balabin, Roman M., E-mail: balabin@org.chem.ethz.ch [Department of Chemistry and Applied Biosciences, ETH Zurich, 8093 Zurich (Switzerland); Smirnov, Sergey V. [Unimilk Joint Stock Co., 143421 Moscow Region (Russian Federation)

    2011-04-29

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm{sup -1}) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  6. Matrix Multiplication Algorithm Selection with Support Vector Machines

    Science.gov (United States)

    2015-05-01

    approach for linear algebra algorithms, as we achieve up to a 26% performance improvement over selecting a single algorithm in advance. III. Related...different algorithms (such as industry-standard linear algebra algorithms and their communication-avoiding counterparts) [5]. One way to tackle algorithm...we conclude that using weighted SVM for algorithm selection is superior to using unweighted SVM. D. Properties of Misclassifications For any complex

  7. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Roy Kaushik

    2008-01-01

    Full Text Available Abstract The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  8. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Prabir Bhattacharya

    2008-06-01

    Full Text Available The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  9. Moment feature based fast feature extraction algorithm for moving object detection using aerial images.

    Directory of Open Access Journals (Sweden)

    A F M Saifuddin Saif

    Full Text Available Fast and computationally less complex feature extraction for moving object detection using aerial images from unmanned aerial vehicles (UAVs remains as an elusive goal in the field of computer vision research. The types of features used in current studies concerning moving object detection are typically chosen based on improving detection rate rather than on providing fast and computationally less complex feature extraction methods. Because moving object detection using aerial images from UAVs involves motion as seen from a certain altitude, effective and fast feature extraction is a vital issue for optimum detection performance. This research proposes a two-layer bucket approach based on a new feature extraction algorithm referred to as the moment-based feature extraction algorithm (MFEA. Because a moment represents the coherent intensity of pixels and motion estimation is a motion pixel intensity measurement, this research used this relation to develop the proposed algorithm. The experimental results reveal the successful performance of the proposed MFEA algorithm and the proposed methodology.

  10. Optimal band selection for high dimensional remote sensing data using genetic algorithm

    Science.gov (United States)

    Zhang, Xianfeng; Sun, Quan; Li, Jonathan

    2009-06-01

    A 'fused' method may not be suitable for reducing the dimensionality of data and a band/feature selection method needs to be used for selecting an optimal subset of original data bands. This study examined the efficiency of GA in band selection for remote sensing classification. A GA-based algorithm for band selection was designed deliberately in which a Bhattacharyya distance index that indicates separability between classes of interest is used as fitness function. A binary string chromosome is designed in which each gene location has a value of 1 representing a feature being included or 0 representing a band being not included. The algorithm was implemented in MATLAB programming environment, and a band selection task for lithologic classification in the Chocolate Mountain area (California) was used to test the proposed algorithm. The proposed feature selection algorithm can be useful in multi-source remote sensing data preprocessing, especially in hyperspectral dimensionality reduction.

  11. Feature extraction and classification algorithms for high dimensional data

    Science.gov (United States)

    Lee, Chulhee; Landgrebe, David

    1993-01-01

    Feature extraction and classification algorithms for high dimensional data are investigated. Developments with regard to sensors for Earth observation are moving in the direction of providing much higher dimensional multispectral imagery than is now possible. In analyzing such high dimensional data, processing time becomes an important factor. With large increases in dimensionality and the number of classes, processing time will increase significantly. To address this problem, a multistage classification scheme is proposed which reduces the processing time substantially by eliminating unlikely classes from further consideration at each stage. Several truncation criteria are developed and the relationship between thresholds and the error caused by the truncation is investigated. Next an approach to feature extraction for classification is proposed based directly on the decision boundaries. It is shown that all the features needed for classification can be extracted from decision boundaries. A characteristic of the proposed method arises by noting that only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is introduced. The proposed feature extraction algorithm has several desirable properties: it predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal means or equal covariances as some previous algorithms do. In addition, the decision boundary feature extraction algorithm can be used both for parametric and non-parametric classifiers. Finally, some problems encountered in analyzing high dimensional data are studied and possible solutions are proposed. First, the increased importance of the second order statistics in analyzing high dimensional data is recognized

  12. Feature Based Correspondence: A Comparative Study on Image Matching Algorithms

    Directory of Open Access Journals (Sweden)

    Munim Tanvir

    2016-03-01

    Full Text Available Image matching and recognition are the crux of computer vision and have a major part to play in everyday lives. From industrial robots to surveillance cameras, from autonomous vehicles to medical imaging and from missile guidance to space exploration vehicles computer vision and hence image matching is embedded in our lives. This communication presents a comparative study on the prevalent matching algorithms, addressing their restrictions and providing a criterion to define the level of efficiency likely to be expected from an algorithm. The study includes the feature detection and matching techniques used by these prevalent algorithms to allow a deeper insight. The chief aim of the study is to deliver a source of comprehensive reference for the researchers involved in image matching, regardless of specific applications.

  13. The positioning algorithm based on feature variance of billet character

    Science.gov (United States)

    Yi, Jiansong; Hong, Hanyu; Shi, Yu; Chen, Hongyang

    2015-12-01

    In the process of steel billets recognition on the production line, the key problem is how to determine the position of the billet from complex scenes. To solve this problem, this paper presents a positioning algorithm based on the feature variance of billet character. Using the largest intra-cluster variance recursive method based on multilevel filtering, the billet characters are segmented completely from the complex scenes. There are three rows of characters on each steel billet, we are able to determine whether the connected regions, which satisfy the condition of the feature variance, are on a straight line. Then we can accurately locate the steel billet. The experimental results demonstrated that the proposed method in this paper is competitive to other methods in positioning the characters and it also reduce the running time. The algorithm can provide a better basis for the character recognition.

  14. Exploitation of Intra-Spectral Band Correlation for Rapid Feature Selection, and Target Identification in Hyperspectral Imagery

    Science.gov (United States)

    2009-03-01

    entitled “Improved Feature Extraction, Feature Selection, and Identification Techniques that Create a Fast Unsupervised Hyperspectral Target Detection...thesis proposal “Improved Feature Extraction, Feature Selection, and Identification Techniques that Create a Fast Unsupervised Hyperspectral Target...target or non-target classifications . Integration of this type of autonomous target detection algorithm along with hyperspectral imaging sensors

  15. Fingerprint Recognition: Enhancement, Feature Extraction and Automatic Evaluation of Algorithms

    OpenAIRE

    Turroni, Francesco

    2012-01-01

    The identification of people by measuring some traits of individual anatomy or physiology has led to a specific research area called biometric recognition. This thesis is focused on improving fingerprint recognition systems considering three important problems: fingerprint enhancement, fingerprint orientation extraction and automatic evaluation of fingerprint algorithms. An effective extraction of salient fingerprint features depends on the quality of the input fingerprint. If the fingerp...

  16. Making Trillion Correlations Feasible in Feature Grouping and Selection.

    Science.gov (United States)

    Zhai, Yiteng; Ong, Yew-Soon; Tsang, Ivor W

    2016-12-01

    Today, modern databases with "Big Dimensionality" are experiencing a growing trend. Existing approaches that require the calculations of pairwise feature correlations in their algorithmic designs have scored miserably on such databases, since computing the full correlation matrix (i.e., square of dimensionality in size) is computationally very intensive (i.e., million features would translate to trillion correlations). This poses a notable challenge that has received much lesser attention in the field of machine learning and data mining research. Thus, this paper presents a study to fill in this gap. Our findings on several established databases with big dimensionality across a wide spectrum of domains have indicated that an extremely small portion of the feature pairs contributes significantly to the underlying interactions and there exists feature groups that are highly correlated. Inspired by the intriguing observations, we introduce a novel learning approach that exploits the presence of sparse correlations for the efficient identifications of informative and correlated feature groups from big dimensional data that translates to a reduction in complexity from O(m(2)n) to O(mlogm + Ka mn), where Ka strategy, designed to filter out the large number of non-contributing correlations that could otherwise confuse the classifier while identifying the correlated and informative feature groups, forms one of the highlights of our approach. We also demonstrated the proposed method on one-class learning, where notable speedup can be observed when solving one-class problem on big dimensional data. Further, to identify robust informative features with minimal sampling bias, our feature selection strategy embeds the V-fold cross validation in the learning model, so as to seek for features that exhibit stable or consistent performance accuracy on multiple data folds. Extensive empirical studies on both synthetic and several real-world datasets comprising up to 30 million

  17. Chaotic genetic algorithm for gene selection and classification problems.

    Science.gov (United States)

    Chuang, Li-Yeh; Yang, Cheng-San; Li, Jung-Chike; Yang, Cheng-Hong

    2009-10-01

    Pattern recognition techniques suffer from a well-known curse, the dimensionality problem. The microarray data classification problem is a classical complex pattern recognition problem. Selecting relevant genes from microarray data poses a formidable challenge to researchers due to the high-dimensionality of features, multiclass categories being involved, and the usually small sample size. The goal of feature (gene) selection is to select those subsets of differentially expressed genes that are potentially relevant for distinguishing the sample classes. In this paper, information gain and chaotic genetic algorithm are proposed for the selection of relevant genes, and a K-nearest neighbor with the leave-one-out crossvalidation method serves as a classifier. The chaotic genetic algorithm is modified by using the chaotic mutation operator to increase the population diversity. The enhanced population diversity expands the GA's search ability. The proposed approach is tested on 10 microarray data sets from the literature. The experimental results show that the proposed method not only effectively reduced the number of gene expression levels, but also achieved lower classification error rates than other methods.

  18. Feature Selection in Classification of Eye Movements Using Electrooculography for Activity Recognition

    Directory of Open Access Journals (Sweden)

    S. Mala

    2014-01-01

    Full Text Available Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE, a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG. Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.

  19. Feature selection and survival modeling in The Cancer Genome Atlas

    Directory of Open Access Journals (Sweden)

    Kim H

    2013-09-01

    Full Text Available Hyunsoo Kim,1 Markus Bredel2 1Department of Pathology, The University of Alabama at Birmingham, Birmingham, AL, USA; 2Department of Radiation Oncology, and Comprehensive Cancer Center, The University of Alabama at Birmingham, Birmingham, AL, USA Purpose: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling. Patients and methods: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1 1-nearest neighbor (1-NN survival prediction method; (2 random patient selection method and a Cox-based regression method with nested cross-validation; (3 least absolute shrinkage and selection operator (LASSO optimization using whole-genome gene expression profiles; or (4 gene expression profiles of cancer pathway genes. Results: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone. Conclusion: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology. Keywords: brain, feature selection

  20. Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

    Energy Technology Data Exchange (ETDEWEB)

    Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.; Pounds, Joel G.

    2013-09-01

    This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the set of predictors to around 8 factors that can be validated using reputable medical and public health resources.

  1. Generating Feature Spaces for Linear Algorithms with Regularized Sparse Kernel Slow Feature Analysis

    NARCIS (Netherlands)

    Böhmer, W.; Grünewälder, S.; Nickisch, H.; Obermayer, K.

    2013-01-01

    Without non-linear basis functions many problems can not be solved by linear algorithms. This article proposes a method to automatically construct such basis functions with slow feature analysis (SFA). Non-linear optimization of this unsupervised learning method generates an orthogonal basis on the

  2. Highly accurate SVM model with automatic feature selection for word sense disambiguation

    Institute of Scientific and Technical Information of China (English)

    王浩; 陈贵林; 吴连献

    2004-01-01

    A novel algorithm for word sense disambiguation(WSD) that is based on SVM model improved with automatic feature selection is introduced. This learning method employs rich contextual features to predict the proper senses for specific words. Experimental results show that this algorithm can achieve an execellent performance on the set of data released during the SENSEEVAL-2 competition. We present the results obtained and discuss the transplantation of this algorithm to other languages such as Chinese. Experimental results on Chinese corpus show that our algorithm achieves an accuracy of 70.0 % even with small training data.

  3. Feature selection with the image grand tour

    Science.gov (United States)

    Marchette, David J.; Solka, Jeffrey L.

    2000-08-01

    The grand tour is a method for visualizing high dimensional data by presenting the user with a set of projections and the projected data. This idea was extended to multispectral images by viewing each pixel as a multidimensional value, and viewing the projections of the grand tour as an image. The user then looks for projections which provide a useful interpretation of the image, for example, separating targets from clutter. We discuss a modification of this which allows the user to select convolution kernels which provide useful discriminant ability, both in an unsupervised manner as in the image grand tour, or in a supervised manner using training data. This approach is extended to other window-based features. For example, one can define a generalization of the median filter as a linear combination of the order statistics within a window. Thus the median filter is that projection containing zeros everywhere except for the middle value, which contains a one. Using the convolution grand tour one can select projections on these order statistics to obtain new nonlinear filters.

  4. Feature Selection for Chemical Sensor Arrays Using Mutual Information

    Science.gov (United States)

    Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.

    2014-01-01

    We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058

  5. ALGORITHM OF SELECTION EFFECTIVE SOLUTIONS FOR REPROFILING OF INDUSTRIAL BUILDINGS

    Directory of Open Access Journals (Sweden)

    MENEJLJUK A. I.

    2016-08-01

    Full Text Available Raising of problem.Non-compliance requirements of today's industrial enterprises, which were built during the Soviet period, as well as significant technical progress, economic reform and transition to market principles of performance evaluation leading to necessity to change their target and functionality. The technical condition of many industrial buildings in Ukraine allows to exploit them for decades.Redesigning manufacturing enterprises allows not only to reduce the cost of construction, but also to obtain new facilities in the city. Despite the large number of industrial buildings that have lost their effectiveness and relevance, as well as a significant investor interest in these objects, the scope of redevelopment in the construction remains unexplored. Analysis researches on the topic. The problem of reconstruction of industrial buildings considered in Topchy D. [3], Travin V. [9], as well as in the work of other scientists. However, there are no rules in regulatory documents and system studies for improving the organization of the reconstruction of buildings at realigning. The purpose of this work is the development an algorithm of actions for selection of effective organizational decisions at the planning stage of a reprofiling project of industrial buildings. The proposed algorithm allows you to select an effective organizational and technological solution for the re-profiling of industrial buildings, taking into account features of the building, its location, its state of structures and existing restrictions. The most effective organizational solution allows realize the reprofiling project of an industrial building in the most possible short terms and with the lowest possible use of material resources, taking into account the available features and restrictions. Conclusion. Each object has a number of unique features that necessary for considering at choosing an effective reprofiling variant. The developed algorithm for selecting

  6. Genetic Particle Swarm Optimization-Based Feature Selection for Very-High-Resolution Remotely Sensed Imagery Object Change Detection.

    Science.gov (United States)

    Chen, Qiang; Chen, Yunhao; Jiang, Weiguo

    2016-07-30

    In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm.

  7. Condition monitoring of face milling tool using K-star algorithm and histogram features of vibration signal

    Directory of Open Access Journals (Sweden)

    C.K. Madhusudana

    2016-09-01

    Full Text Available This paper deals with the fault diagnosis of the face milling tool based on machine learning approach using histogram features and K-star algorithm technique. Vibration signals of the milling tool under healthy and different fault conditions are acquired during machining of steel alloy 42CrMo4. Histogram features are extracted from the acquired signals. The decision tree is used to select the salient features out of all the extracted features and these selected features are used as an input to the classifier. K-star algorithm is used as a classifier and the output of the model is utilised to study and classify the different conditions of the face milling tool. Based on the experimental results, K-star algorithm is provided a better classification accuracy in the range from 94% to 96% with histogram features and is acceptable for fault diagnosis.

  8. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    Science.gov (United States)

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.

  9. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    Directory of Open Access Journals (Sweden)

    Joeri Ruyssinck

    Full Text Available One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made

  10. Clonal Selection Based Memetic Algorithm for Job Shop Scheduling Problems

    Institute of Scientific and Technical Information of China (English)

    Jin-hui Yang; Liang Sun; Heow Pueh Lee; Yun Qian; Yan-chun Liang

    2008-01-01

    A clonal selection based memetic algorithm is proposed for solving job shop scheduling problems in this paper. In the proposed algorithm, the clonal selection and the local search mechanism are designed to enhance exploration and exploitation. In the clonal selection mechanism, clonal selection, hypermutation and receptor edit theories are presented to construct an evolutionary searching mechanism which is used for exploration. In the local search mechanism, a simulated annealing local search algorithm based on Nowicki and Smutnicki's neighborhood is presented to exploit local optima. The proposed algorithm is examined using some well-known benchmark problems. Numerical results validate the effectiveness of the proposed algorithm.

  11. Feature Selection for Wheat Yield Prediction

    Science.gov (United States)

    Ruß, Georg; Kruse, Rudolf

    Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.

  12. Information Theory for Gabor Feature Selection for Face Recognition

    Directory of Open Access Journals (Sweden)

    Shen Linlin

    2006-01-01

    Full Text Available A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004, our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  13. Information Theory for Gabor Feature Selection for Face Recognition

    Science.gov (United States)

    Shen, Linlin; Bai, Li

    2006-12-01

    A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004), our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  14. Recursive Feature Selection with Significant Variables of Support Vectors

    Directory of Open Access Journals (Sweden)

    Chen-An Tsai

    2012-01-01

    Full Text Available The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE and recursive support vector machine (RSVM. The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  15. The Calibration Algorithm of a 3D Color Measurement System based on the Line Feature

    Directory of Open Access Journals (Sweden)

    Ganhua Li

    2009-10-01

    Full Text Available This paper describes a novel 3 dimensional color measurement system. After 3 kinds of geometrical features are analyzed, the line features were selected. A calibration board with right-angled triangle outline was designed to improve the calibration precision. For this system, two algorithms are presented. One is the calibration algorithm between 2 dimensional laser range finder (2D LRF, while the other is for 2D LRF and the color camera. The result parameters were obtained through solving the constrain equations by the correspond data between the 2D LRF and other two sensors. The 3D color reconstruction experiments of real data prove the effectiveness and the efficient of the system and the algorithms.

  16. Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2013-01-01

    Full Text Available Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.

  17. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

    Directory of Open Access Journals (Sweden)

    Daniel Peralta

    2015-01-01

    Full Text Available Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.

  18. A simulation of remote sensor systems and data processing algorithms for spectral feature classification

    Science.gov (United States)

    Arduini, R. F.; Aherron, R. M.; Samms, R. W.

    1984-01-01

    A computational model of the deterministic and stochastic processes involved in multispectral remote sensing was designed to evaluate the performance of sensor systems and data processing algorithms for spectral feature classification. Accuracy in distinguishing between categories of surfaces or between specific types is developed as a means to compare sensor systems and data processing algorithms. The model allows studies to be made of the effects of variability of the atmosphere and of surface reflectance, as well as the effects of channel selection and sensor noise. Examples of these effects are shown.

  19. Optimal Genetic View Selection Algorithm for Data Warehouse

    Institute of Scientific and Technical Information of China (English)

    Wang Ziqiang; Feng Boqin

    2005-01-01

    To efficiently solve the materialized view selection problem, an optimal genetic algorithm of how to select a set of views to be materialized is proposed so as to achieve both good query performance and low view maintenance cost under a storage space constraint. First, a pre-processing algorithm based on the maximum benefit per unit space is used to generate initial solutions. Then, the initial solutions are improved by the genetic algorithm having the mixture of optimal strategies. Furthermore, the generated infeasible solutions during the evolution process are repaired by loss function. The experimental results show that the proposed algorithm outperforms the heuristic algorithm and canonical genetic algorithm in finding optimal solutions.

  20. Improving the performance of the ripper in insurance risk classification : A comparitive study using feature selection

    CERN Document Server

    Duma, Mlungisi; Marwala, Tshilidzi

    2011-01-01

    The Ripper algorithm is designed to generate rule sets for large datasets with many features. However, it was shown that the algorithm struggles with classification performance in the presence of missing data. The algorithm struggles to classify instances when the quality of the data deteriorates as a result of increasing missing data. In this paper, a feature selection technique is used to help improve the classification performance of the Ripper model. Principal component analysis and evidence automatic relevance determination techniques are used to improve the performance. A comparison is done to see which technique helps the algorithm improve the most. Training datasets with completely observable data were used to construct the model and testing datasets with missing values were used for measuring accuracy. The results showed that principal component analysis is a better feature selection for the Ripper in improving the classification performance.

  1. Efficient Generation and Selection of Combined Features for Improved Classification

    KAUST Repository

    Shono, Ahmad N.

    2014-05-01

    This study contributes a methodology and associated toolkit developed to allow users to experiment with the use of combined features in classification problems. Methods are provided for efficiently generating combined features from an original feature set, for efficiently selecting the most discriminating of these generated combined features, and for efficiently performing a preliminary comparison of the classification results when using the original features exclusively against the results when using the selected combined features. The potential benefit of considering combined features in classification problems is demonstrated by applying the developed methodology and toolkit to three sample data sets where the discovery of combined features containing new discriminating information led to improved classification results.

  2. Biometric hashing for handwriting: entropy-based feature selection and semantic fusion

    Science.gov (United States)

    Scheidat, Tobias; Vielhauer, Claus

    2008-02-01

    Some biometric algorithms lack of the problem of using a great number of features, which were extracted from the raw data. This often results in feature vectors of high dimensionality and thus high computational complexity. However, in many cases subsets of features do not contribute or with only little impact to the correct classification of biometric algorithms. The process of choosing more discriminative features from a given set is commonly referred to as feature selection. In this paper we present a study on feature selection for an existing biometric hash generation algorithm for the handwriting modality, which is based on the strategy of entropy analysis of single components of biometric hash vectors, in order to identify and suppress elements carrying little information. To evaluate the impact of our feature selection scheme to the authentication performance of our biometric algorithm, we present an experimental study based on data of 86 users. Besides discussing common biometric error rates such as Equal Error Rates, we suggest a novel measurement to determine the reproduction rate probability for biometric hashes. Our experiments show that, while the feature set size may be significantly reduced by 45% using our scheme, there are marginal changes both in the results of a verification process as well as in the reproducibility of biometric hashes. Since multi-biometrics is a recent topic, we additionally carry out a first study on a pair wise multi-semantic fusion based on reduced hashes and analyze it by the introduced reproducibility measure.

  3. Feature selection and classification methodology for the detection of knee-joint disorders.

    Science.gov (United States)

    Nalband, Saif; Sundar, Aditya; Prince, A Amalin; Agarwal, Anita

    2016-04-01

    Vibroarthographic (VAG) signals emitted from the knee joint disorder provides an early diagnostic tool. The nonstationary and nonlinear nature of VAG signal makes an important aspect for feature extraction. In this work, we investigate VAG signals by proposing a wavelet based decomposition. The VAG signals are decomposed into sub-band signals of different frequencies. Nonlinear features such as recurrence quantification analysis (RQA), approximate entropy (ApEn) and sample entropy (SampEn) are extracted as features of VAG signal. A total of twenty-four features form a vector to characterize a VAG signal. Two feature selection (FS) techniques, apriori algorithm and genetic algorithm (GA) selects six and four features as the most significant features. Least square support vector machines (LS-SVM) and random forest are proposed as classifiers to evaluate the performance of FS techniques. Results indicate that the classification accuracy was more prominent with features selected from FS algorithms. Results convey that LS-SVM using the apriori algorithm gives the highest accuracy of 94.31% with false discovery rate (FDR) of 0.0892. The proposed work also provided better classification accuracy than those reported in the previous studies which gave an accuracy of 88%. This work can enhance the performance of existing technology for accurately distinguishing normal and abnormal VAG signals. And the proposed methodology could provide an effective non-invasive diagnostic tool for knee joint disorders.

  4. A Hybrid Intelligent Algorithm for Optimal Birandom Portfolio Selection Problems

    Directory of Open Access Journals (Sweden)

    Qi Li

    2014-01-01

    Full Text Available Birandom portfolio selection problems have been well developed and widely applied in recent years. To solve these problems better, this paper designs a new hybrid intelligent algorithm which combines the improved LGMS-FOA algorithm with birandom simulation. Since all the existing algorithms solving these problems are based on genetic algorithm and birandom simulation, some comparisons between the new hybrid intelligent algorithm and the existing algorithms are given in terms of numerical experiments, which demonstrate that the new hybrid intelligent algorithm is more effective and precise when the numbers of the objective function computations are the same.

  5. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.

  6. A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms.

    Science.gov (United States)

    Zhou, Hongfang; Guo, Jie; Wang, Yinghui; Zhao, Minghua

    2016-01-01

    Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF, t-Test, and CMFS algorithms.

  7. Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data

    Directory of Open Access Journals (Sweden)

    Lijun Wang

    2013-01-01

    Full Text Available Brain decoding with functional magnetic resonance imaging (fMRI requires analysis of complex, multivariate data. Multivoxel pattern analysis (MVPA has been widely used in recent years. MVPA treats the activation of multiple voxels from fMRI data as a pattern and decodes brain states using pattern classification methods. Feature selection is a critical procedure of MVPA because it decides which features will be included in the classification analysis of fMRI data, thereby improving the performance of the classifier. Features can be selected by limiting the analysis to specific anatomical regions or by computing univariate (voxel-wise or multivariate statistics. However, these methods either discard some informative features or select features with redundant information. This paper introduces the principal feature analysis as a novel multivariate feature selection method for fMRI data processing. This multivariate approach aims to remove features with redundant information, thereby selecting fewer features, while retaining the most information.

  8. Generalized Discriminant Analysis algorithm for feature reduction in Cyber Attack Detection System

    Directory of Open Access Journals (Sweden)

    Shailendra Singh

    2009-10-01

    Full Text Available This Generalized Discriminant Analysis (GDA has provided an extremely powerful approach to extracting non-linear features. The network traffic data provided for the design of intrusion detection system always are large with ineffective information, thus we need to remove the worthless information from the original high dimensional database. To improve the generalization ability, we usually generate a small set of features from the original input variables by feature extraction. The conventional Linear Discriminant Analysis (LDA feature reduction technique has its limitations. It is not suitable for non-linear dataset. Thus we propose an efficient algorithm based on the Generalized Discriminant Analysis (GDA feature reduction technique which is novel approach used in the area of cyber attack detection. This not only reduces the number of the input features but also increases the classification accuracy and reduces the training and testing time of the classifiers by selecting most discriminating features. We use Artificial Neural Network (ANN and C4.5 classifiers to compare the performance of the proposed technique. The result indicates the superiority of algorithm.

  9. Image Recommendation Algorithm Using Feature-Based Collaborative Filtering

    Science.gov (United States)

    Kim, Deok-Hwan

    As the multimedia contents market continues its rapid expansion, the amount of image contents used in mobile phone services, digital libraries, and catalog service is increasing remarkably. In spite of this rapid growth, users experience high levels of frustration when searching for the desired image. Even though new images are profitable to the service providers, traditional collaborative filtering methods cannot recommend them. To solve this problem, in this paper, we propose feature-based collaborative filtering (FBCF) method to reflect the user's most recent preference by representing his purchase sequence in the visual feature space. The proposed approach represents the images that have been purchased in the past as the feature clusters in the multi-dimensional feature space and then selects neighbors by using an inter-cluster distance function between their feature clusters. Various experiments using real image data demonstrate that the proposed approach provides a higher quality recommendation and better performance than do typical collaborative filtering and content-based filtering techniques.

  10. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

    Science.gov (United States)

    Yu, Sheng; Liao, Katherine P; Shaw, Stanley Y; Gainer, Vivian S; Churchill, Susanne E; Szolovits, Peter; Murphy, Shawn N; Kohane, Isaac S; Cai, Tianxi

    2015-09-01

    Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All

  11. Performance Evaluation of Content Based Image Retrieval on Feature Optimization and Selection Using Swarm Intelligence

    Directory of Open Access Journals (Sweden)

    Kirti Jain

    2016-03-01

    Full Text Available The diversity and applicability of swarm intelligence is increasing everyday in the fields of science and engineering. Swarm intelligence gives the features of the dynamic features optimization concept. We have used swarm intelligence for the process of feature optimization and feature selection for content-based image retrieval. The performance of content-based image retrieval faced the problem of precision and recall. The value of precision and recall depends on the retrieval capacity of the image. The basic raw image content has visual features such as color, texture, shape and size. The partial feature extraction technique is based on geometric invariant function. Three swarm intelligence algorithms were used for the optimization of features: ant colony optimization, particle swarm optimization (PSO, and glowworm optimization algorithm. Coral image dataset and MatLab software were used for evaluating performance.

  12. A curriculum-based approach for feature selection

    Science.gov (United States)

    Kalavala, Deepthi; Bhagvati, Chakravarthy

    2017-06-01

    Curriculum learning is a learning technique in which a classifier learns from easy samples first and then from increasingly difficult samples. On similar lines, a curriculum based feature selection framework is proposed for identifying most useful features in a dataset. Given a dataset, first, easy and difficult samples are identified. In general, the number of easy samples is assumed larger than difficult samples. Then, feature selection is done in two stages. In the first stage a fast feature selection method which gives feature scores is used. Feature scores are then updated incrementally with the set of difficult samples. The existing feature selection methods are not incremental in nature; entire data needs to be used in feature selection. The use of curriculum learning is expected to decrease the time needed for feature selection with classification accuracy comparable to the existing methods. Curriculum learning also allows incremental refinements in feature selection as new training samples become available. Our experiments on a number of standard datasets demonstrate that feature selection is indeed faster without sacrificing classification accuracy.

  13. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

    Science.gov (United States)

    Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A

    2015-06-01

    Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification.

  14. A feature selection method based on multiple kernel learning with expression profiles of different types.

    Science.gov (United States)

    Du, Wei; Cao, Zhongbo; Song, Tianci; Li, Ying; Liang, Yanchun

    2017-01-01

    With the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number of samples and hundreds or thousands features, how to extract informative features from expression data effectively and robustly using feature selection technique is challenging and crucial. So far, a mass of many feature selection approaches have been proposed and applied to analyse expression data of different types. However, most of these methods only are limited to measure the performances on one single type of expression data by accuracy or error rate of classification. In this article, we propose a hybrid feature selection method based on Multiple Kernel Learning (MKL) and evaluate the performance on expression datasets of different types. Firstly, the relevance between features and classifying samples is measured by using the optimizing function of MKL. In this step, an iterative gradient descent process is used to perform the optimization both on the parameters of Support Vector Machine (SVM) and kernel confidence. Then, a set of relevant features is selected by sorting the optimizing function of each feature. Furthermore, we apply an embedded scheme of forward selection to detect the compact feature subsets from the relevant feature set. We not only compare the classification accuracy with other methods, but also compare the stability, similarity and consistency of different algorithms. The proposed method has a satisfactory capability of feature selection for analysing expression datasets of different types using different performance measurements.

  15. Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Thanh-Tung Nguyen

    2015-01-01

    Full Text Available Random forests (RFs have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  16. Unbiased feature selection in learning random forests for high-dimensional data.

    Science.gov (United States)

    Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

    2015-01-01

    Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  17. Feature selection with neighborhood entropy-based cooperative game theory.

    Science.gov (United States)

    Zeng, Kai; She, Kun; Niu, Xinzheng

    2014-01-01

    Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT) yield better performance than classical ones.

  18. Clone Selection Algorithm with Niching Strategy for Computer Iune System

    Institute of Scientific and Technical Information of China (English)

    张雅静; 侯朝桢; 薛阳

    2004-01-01

    A clone selection algorithm for computer iune system is presented. Clone selection principles in biological iune system are applied to the domain of computer virus detection. Based on the negative selection algorithm proposed by Stephanie Forrest, combining mutation operator in genetic algorithms and niching strategy in biology is adopted, the number of detectors is decreased effectively and the ability on self-nonself discrimination is improved. Simulation experiment shows that the algorithm is simple, practical and is adapted to the discrimination for long files.

  19. Optimal Feature Extraction for Discriminating Raman Spectra of Different Skin Samples using Statistical Methods and Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Zohreh Dehghani Bidgoli

    2011-06-01

    Full Text Available Introduction: Raman spectroscopy, that is a spectroscopic technique based on inelastic scattering of monochromatic light, can provide valuable information about molecular vibrations, so using this technique we can study molecular changes in a sample. Material and Methods: In this research, 153 Raman spectra obtained from normal and dried skin samples. Baseline and electrical noise were eliminated in the preprocessing stage with subsequent normalization of Raman spectra. Then, using statistical analysis and Genetic algorithm, optimal features for discrimination between these two classes have been searched.  In statistical analysis for choosing optimal features, T test, Bhattacharyya distance and entropy between two classes have been calculated. Seeing that T test can better discriminate these two classes so this method used for selecting the best features. Another time Genetic algorithm used for selecting optimal features, finally using these selected features and classifiers such as LDA, KNN, SVM and neural network, these two classes have been discriminated. Results: In comparison of classifiers results, under various strategies for selecting features and classifier, the best results obtained in combination of genetic algorithm in feature selection and SVM in classification. Finally using combination of genetic algorithm and SVM, we could discriminate normal and dried skin samples with accuracy of 90%, sensitivity of 89% and specificity of 91%. Discussion and Conclusion: According to obtained results, we can conclude that genetic algorithm demonstrates better performance than statistical analysis in selection of discriminating features of Raman spectra. In addition, results of this research illustrate the potential of Raman spectroscopy in study of different material effects on skin and skin diseases related to skin dehydration.

  20. Possibility Study of Scale Invariant Feature Transform (SIFT) Algorithm Application to Spine Magnetic Resonance Imaging.

    Science.gov (United States)

    Lee, Dong-Hoon; Lee, Do-Wan; Han, Bong-Soo

    2016-01-01

    The purpose of this study is an application of scale invariant feature transform (SIFT) algorithm to stitch the cervical-thoracic-lumbar (C-T-L) spine magnetic resonance (MR) images to provide a view of the entire spine in a single image. All MR images were acquired with fast spin echo (FSE) pulse sequence using two MR scanners (1.5 T and 3.0 T). The stitching procedures for each part of spine MR image were performed and implemented on a graphic user interface (GUI) configuration. Moreover, the stitching process is performed in two categories; manual point-to-point (mPTP) selection that performed by user specified corresponding matching points, and automated point-to-point (aPTP) selection that performed by SIFT algorithm. The stitched images using SIFT algorithm showed fine registered results and quantitatively acquired values also indicated little errors compared with commercially mounted stitching algorithm in MRI systems. Our study presented a preliminary validation of the SIFT algorithm application to MRI spine images, and the results indicated that the proposed approach can be performed well for the improvement of diagnosis. We believe that our approach can be helpful for the clinical application and extension of other medical imaging modalities for image stitching.

  1. A HYBRID FILTER AND WRAPPER FEATURE SELECTION APPROACH FOR DETECTING CONTAMINATION IN DRINKING WATER MANAGEMENT SYSTEM

    Directory of Open Access Journals (Sweden)

    S. VISALAKSHI

    2017-07-01

    Full Text Available Feature selection is an important task in predictive models which helps to identify the irrelevant features in the high - dimensional dataset. In this case of water contamination detection dataset, the standard wrapper algorithm alone cannot be applied because of the complexity. To overcome this computational complexity problem and making it lighter, filter-wrapper based algorithm has been proposed. In this work, reducing the feature space is a significant component of water contamination. The main findings are as follows: (1 The main goal is speeding up the feature selection process, so the proposed filter - based feature pre-selection is applied and guarantees that useful data are improbable to be detached in the initial stage which discussed briefly in this paper. (2 The resulting features are again filtered by using the Genetic Algorithm coded with Support Vector Machine method, where it facilitates to nutshell the subset of features with high accuracy and decreases the expense. Experimental results show that the proposed methods trim down redundant features effectively and achieved better classification accuracy.

  2. Enhancement of Selection, Bubble and Insertion Sorting Algorithm

    Directory of Open Access Journals (Sweden)

    Muhammad Farooq Umar

    2014-07-01

    Full Text Available In everyday life there is a large amount of data to arrange because sorting removes any ambiguities and make the data analysis and data processing very easy, efficient and provides with cost less effort. In this study a set of improved sorting algorithms are proposed which gives better performance and design idea. In this study five new sorting algorithms (Bi-directional Selection Sort, Bi-directional bubble sort, MIDBiDirectional Selection Sort, MIDBidirectional bubble sort and linear insertion sort are presented. Bi-directional Selection Sort and MIDBiDirectional Selection Sort are the enhancement on basic selection sort while Bidirectional bubble sort and MIDBidirectional bubble sort are the enhancement on basic bubble sort by changing the selection and swapping mechanism of data for sorting. Enhanced sorting algorithms reduced the iteration by half and quarter times respectively. Asymptotically complexities of these algorithms are reduced to O (n2/2 and O (n2/4 from O (n2. Linear insertion sort is the enhancement of insertion sort by changing the design of algorithm (convert two loops to one loop. So asymptotically this algorithm is converted to linear time complexity from quadratic complexity. These sorting algorithms are described using C. The proposed algorithms are analyzed using asymptotic analysis and also using machine-running time and compared with their basic sorting algorithms. In this study we also discuss how the performance and complexity can be improved by optimizing the code and design.

  3. Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection

    Institute of Scientific and Technical Information of China (English)

    Fatemeh Azmandian; Ayse Yilmazer; Jennifer G Dy; Javed A Aslam; David R Kaeli

    2014-01-01

    Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.

  4. Geochemical dynamics in selected Yellowstone hydrothermal features

    Science.gov (United States)

    Druschel, G.; Kamyshny, A.; Findlay, A.; Nuzzio, D.

    2010-12-01

    Yellowstone National Park has a wide diversity of thermal features, and includes springs with a range of pH conditions that significantly impact sulfur speciation. We have utilized a combination of voltammetric and spectroscopic techniques to characterize the intermediate sulfur chemistry of Cinder Pool, Evening Primrose, Ojo Caliente, Frying Pan, Azure, and Dragon thermal springs. These measurements additionally have demonstrated the geochemical dynamics inherent in these systems; significant variability in chemical speciation occur in many of these thermal features due to changes in gas supply rates, fluid discharge rates, and thermal differences that occur on second time scales. The dynamics of the geochemical settings shown may significantly impact how microorganisms interact with the sulfur forms in these systems.

  5. A New Feature Selection Method for Text Clustering

    Institute of Scientific and Technical Information of China (English)

    XU Junling; XU Baowen; ZHANG Weifeng; CUI Zifeng; ZHANG Wei

    2007-01-01

    Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the DaviesBouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.

  6. Clonal Selection Algorithm Based Iterative Learning Control with Random Disturbance

    Directory of Open Access Journals (Sweden)

    Yuanyuan Ju

    2013-01-01

    Full Text Available Clonal selection algorithm is improved and proposed as a method to solve optimization problems in iterative learning control. And a clonal selection algorithm based optimal iterative learning control algorithm with random disturbance is proposed. In the algorithm, at the same time, the size of the search space is decreased and the convergence speed of the algorithm is increased. In addition a model modifying device is used in the algorithm to cope with the uncertainty in the plant model. In addition a model is used in the algorithm cope with the uncertainty in the plant model. Simulations show that the convergence speed is satisfactory regardless of whether or not the plant model is precise nonlinear plants. The simulation test verify the controlled system with random disturbance can reached to stability by using improved iterative learning control law but not the traditional control law.

  7. Aptamers overview: selection, features and applications.

    Science.gov (United States)

    Hernandez, Luiza I; Machado, Isabel; Schafer, Thomas; Hernandez, Frank J

    2015-01-01

    Apatamer technology has been around for a quarter of a century and the field had matured enough to start seeing real applications, especially in the medical field. Since their discovery, aptamers rapidly emerged as key players in many fields, such as diagnostics, drug discovery, food science, drug delivery and therapeutics. Because of their synthetic nature, aptamers are evolving at an exponential rate gaining from the newest advances in chemistry, nanotechnology, biology and medicine. This review is meant to give an overview of the aptamer field, by including general aspects of aptamer identification and applications as well as highlighting certain features that contribute to their quick deployment in the biomedical field.

  8. Bayesian feature selection to estimate customer survival

    OpenAIRE

    Figini, Silvia; Giudici, Paolo; Brooks, S P

    2006-01-01

    We consider the problem of estimating the lifetime value of customers, when a large number of features are present in the data. In order to measure lifetime value we use survival analysis models to estimate customer tenure. In such a context, a number of classical modelling challenges arise. We will show how our proposed Bayesian methods perform, and compare it with classical churn models on a real case study. More specifically, based on data from a media service company, our aim will be to p...

  9. A Heuristic Algorithm for Core Selection in Multicast Routing

    Institute of Scientific and Technical Information of China (English)

    Manas Ranjan Kabat; Manoj Kumar Patel; Chita Ranjan Tripathy

    2011-01-01

    With the development of network multimedia technology,more and more real-time multimedia applications need to transmit information using multicast.The basis of multicast data transmission is to construct a multicast tree.The main problem concerning the construction of a shared multicast tree is selection of a root of the shared tree or the core point.In this paper,we propose a heuristic algorithm for core selection in multicast routing.The proposed algorithm selects core point by considering both delay and inter-destination delay variation.The simulation results show that the proposed algorithm performs better than the existing algorithms in terms of delay variation subject to the end-to-end delay bound.The mathematical time complexity and the execution time of the proposed algorithm are comparable to those of the existing algorithms.

  10. A new ensemble feature selection and its application to pattern classification

    Institute of Scientific and Technical Information of China (English)

    Dongbo ZHANG; Yaonan WANG

    2009-01-01

    Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.

  11. Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors

    Directory of Open Access Journals (Sweden)

    Hong Zhao

    2013-01-01

    Full Text Available Feature selection is an essential process in data mining applications since it reduces a model’s complexity. However, feature selection with various types of costs is still a new research topic. In this paper, we study the cost-sensitive feature selection problem of numeric data with measurement errors. The major contributions of this paper are fourfold. First, a new data model is built to address test costs and misclassification costs as well as error boundaries. It is distinguished from the existing models mainly on the error boundaries. Second, a covering-based rough set model with normal distribution measurement errors is constructed. With this model, coverings are constructed from data rather than assigned by users. Third, a new cost-sensitive feature selection problem is defined on this model. It is more realistic than the existing feature selection problems. Fourth, both backtracking and heuristic algorithms are proposed to deal with the new problem. Experimental results show the efficiency of the pruning techniques for the backtracking algorithm and the effectiveness of the heuristic algorithm. This study is a step toward realistic applications of the cost-sensitive learning.

  12. Neural Gen Feature Selection for Supervised Learning Classifier

    Directory of Open Access Journals (Sweden)

    Mohammed Hasan Abdulameer

    2014-04-01

    Full Text Available Face recognition has recently received significant attention, especially during the past few years. Many face recognition techniques were developed such as PSO-SVM and LDA-SVM However, inefficient features in the face recognition may lead to inadequate in the recognition results. Hence, a new face recognition system based on Genetic Algorithm and FFBNN technique is proposed. Our proposed face recognition system initially performs the feature extraction and these optimal features are promoted to the recognition process. In the feature extraction, the optimal features are extracted from the face image database by Genetic Algorithm (GA with FFBNN and the computed optimal features are given to the FFBNN technique to carry out the training and testing process. The optimal features from the feature database are fed to the FFBNN for accomplishing the training process. The well trained FFBNN with the optimal features provide the recognition result. The optimal features in FFBNN by GA efficiently perform the face recognition process. The human face dataset called YALE is utilized to analyze the performance of our proposed GA-FFNN technique and also this GA-FFBNN is compared with standard SVM and PSO-SVM techniques.

  13. A Features Selection for Crops Classification

    Science.gov (United States)

    Zhao, Lei; Chen, Erxue; Li, Zengyuan; Li, Lan; Gu, Xinzhi

    2016-08-01

    Polarization orientation angle (POA) is a major parameter of electromagnetic wave. This angle will be shift due to azimuth slopes, which will affect the radiometric quality of PolSAR data. Under the assumption of reflection symmetrical medium, the shift value of polarization orientation angle (POAs) can be estimated by Circular Polarization Method (CPM). Then, the shift angle can be used to compensate PolSAR data or extract DEM information. However, it is less effective when using high-frequency SAR (L-, C-band) in the forest area. The main reason is that the polarization orientation angle shift of forest area not only influenced by topography, but also affected by the forest canopy. Among them, the influence of the former belongs to the interference information should be removed, but the impact of the latter belongs to the polarization feature information needs to be retained. The ALOS2 PALSAR2 L-band full polarimetric SAR data was used in this study. Base on the Circular Polarization and DEM-based method, we analyzed the variation of shift value of polarization orientation angle and developed the polarization orientation shift estimation and compensation of PolSAR data in forest.

  14. ADAPTIVE SELECTION OF AUXILIARY OBJECTIVES IN MULTIOBJECTIVE EVOLUTIONARY ALGORITHMS

    Directory of Open Access Journals (Sweden)

    I. A. Petrova

    2016-05-01

    Full Text Available Subject of Research.We propose to modify the EA+RL method, which increases efficiency of evolutionary algorithms by means of auxiliary objectives. The proposed modification is compared to the existing objective selection methods on the example of travelling salesman problem. Method. In the EA+RL method a reinforcement learning algorithm is used to select an objective – the target objective or one of the auxiliary objectives – at each iteration of the single-objective evolutionary algorithm.The proposed modification of the EA+RL method adopts this approach for the usage with a multiobjective evolutionary algorithm. As opposed to theEA+RL method, in this modification one of the auxiliary objectives is selected by reinforcement learning and optimized together with the target objective at each step of the multiobjective evolutionary algorithm. Main Results.The proposed modification of the EA+RL method was compared to the existing objective selection methods on the example of travelling salesman problem. In the EA+RL method and its proposed modification reinforcement learning algorithms for stationary and non-stationary environment were used. The proposed modification of the EA+RL method applied with reinforcement learning for non-stationary environment outperformed the considered objective selection algorithms on the most problem instances. Practical Significance. The proposed approach increases efficiency of evolutionary algorithms, which may be used for solving discrete NP-hard optimization problems. They are, in particular, combinatorial path search problems and scheduling problems.

  15. Feature Fusion Algorithm for Multimodal Emotion Recognition from Speech and Facial Expression Signal

    Directory of Open Access Journals (Sweden)

    Han Zhiyan

    2016-01-01

    Full Text Available In order to overcome the limitation of single mode emotion recognition. This paper describes a novel multimodal emotion recognition algorithm, and takes speech signal and facial expression signal as the research subjects. First, fuse the speech signal feature and facial expression signal feature, get sample sets by putting back sampling, and then get classifiers by BP neural network (BPNN. Second, measure the difference between two classifiers by double error difference selection strategy. Finally, get the final recognition result by the majority voting rule. Experiments show the method improves the accuracy of emotion recognition by giving full play to the advantages of decision level fusion and feature level fusion, and makes the whole fusion process close to human emotion recognition more, with a recognition rate 90.4%.

  16. A Study on Radio Access Technology Selection Algorithms

    CERN Document Server

    Wu, Leijia

    2012-01-01

    This book discusses the basic idea of Common Radio Resource Management (CRRM), especially on the Radio Access Technologies selection part of CRRM. It introduces two interaction functions (information reporting function and RRM decision support function) and four interaction degrees (from low to very high) of CRRM. Four possible CRRM topologies (CRRM server, integrated CRRM, Hierarchical CRRM, and CRRM in user terminals) are described. The book presents different Radio Access Technologies selection algorithms, including single criterion and multiple criteria based algorithms are presented and compares them. Finally, the book analyses the advantages and disadvantages of the different selection algorithms.

  17. Sensor-Based Vibration Signal Feature Extraction Using an Improved Composite Dictionary Matching Pursuit Algorithm

    Directory of Open Access Journals (Sweden)

    Lingli Cui

    2014-09-01

    Full Text Available This paper presents a new method for a composite dictionary matching pursuit algorithm, which is applied to vibration sensor signal feature extraction and fault diagnosis of a gearbox. Three advantages are highlighted in the new method. First, the composite dictionary in the algorithm has been changed from multi-atom matching to single-atom matching. Compared to non-composite dictionary single-atom matching, the original composite dictionary multi-atom matching pursuit (CD-MaMP algorithm can achieve noise reduction in the reconstruction stage, but it cannot dramatically reduce the computational cost and improve the efficiency in the decomposition stage. Therefore, the optimized composite dictionary single-atom matching algorithm (CD-SaMP is proposed. Second, the termination condition of iteration based on the attenuation coefficient is put forward to improve the sparsity and efficiency of the algorithm, which adjusts the parameters of the termination condition constantly in the process of decomposition to avoid noise. Third, composite dictionaries are enriched with the modulation dictionary, which is one of the important structural characteristics of gear fault signals. Meanwhile, the termination condition of iteration settings, sub-feature dictionary selections and operation efficiency between CD-MaMP and CD-SaMP are discussed, aiming at gear simulation vibration signals with noise. The simulation sensor-based vibration signal results show that the termination condition of iteration based on the attenuation coefficient enhances decomposition sparsity greatly and achieves a good effect of noise reduction. Furthermore, the modulation dictionary achieves a better matching effect compared to the Fourier dictionary, and CD-SaMP has a great advantage of sparsity and efficiency compared with the CD-MaMP. The sensor-based vibration signals measured from practical engineering gearbox analyses have further shown that the CD-SaMP decomposition and

  18. Sensor-based vibration signal feature extraction using an improved composite dictionary matching pursuit algorithm.

    Science.gov (United States)

    Cui, Lingli; Wu, Na; Wang, Wenjing; Kang, Chenhui

    2014-09-09

    This paper presents a new method for a composite dictionary matching pursuit algorithm, which is applied to vibration sensor signal feature extraction and fault diagnosis of a gearbox. Three advantages are highlighted in the new method. First, the composite dictionary in the algorithm has been changed from multi-atom matching to single-atom matching. Compared to non-composite dictionary single-atom matching, the original composite dictionary multi-atom matching pursuit (CD-MaMP) algorithm can achieve noise reduction in the reconstruction stage, but it cannot dramatically reduce the computational cost and improve the efficiency in the decomposition stage. Therefore, the optimized composite dictionary single-atom matching algorithm (CD-SaMP) is proposed. Second, the termination condition of iteration based on the attenuation coefficient is put forward to improve the sparsity and efficiency of the algorithm, which adjusts the parameters of the termination condition constantly in the process of decomposition to avoid noise. Third, composite dictionaries are enriched with the modulation dictionary, which is one of the important structural characteristics of gear fault signals. Meanwhile, the termination condition of iteration settings, sub-feature dictionary selections and operation efficiency between CD-MaMP and CD-SaMP are discussed, aiming at gear simulation vibration signals with noise. The simulation sensor-based vibration signal results show that the termination condition of iteration based on the attenuation coefficient enhances decomposition sparsity greatly and achieves a good effect of noise reduction. Furthermore, the modulation dictionary achieves a better matching effect compared to the Fourier dictionary, and CD-SaMP has a great advantage of sparsity and efficiency compared with the CD-MaMP. The sensor-based vibration signals measured from practical engineering gearbox analyses have further shown that the CD-SaMP decomposition and reconstruction algorithm

  19. A new algorithm for estimating gillnet selectivity

    Institute of Scientific and Technical Information of China (English)

    唐衍力; 黄六一; 葛长字; 梁振林; 孙鹏

    2010-01-01

    The estimation of gear selectivity is a critical issue in fishery stock assessment and management.Several methods have been developed for estimating gillnet selectivity,but they all have their limitations,such as inappropriate objective function in data fitting,lack of unique estimates due to the difficulty in finding global minima in minimization,biased estimates due to outliers,and estimations of selectivity being influenced by the predetermined selectivity functions.In this study,we develop a new algorit...

  20. Adaptive feature selection using v-shaped binary particle swarm optimization

    Science.gov (United States)

    Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850

  1. Research on a randomized real-valued negative selection algorithm

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    A real-valued negative selection algorithm with good mathematical foundation is presented to solve some of the drawbacks of previous approach. Specifically, it can produce a good estimate of the optimal number of detectors needed to cover the non-self space, and the maximization of the non-self coverage is done through an optimization algorithm with proven convergence properties. Experiments are performed to validate the assumptions made while designing the algorithm and to evaluate its performance.

  2. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

    Science.gov (United States)

    Mortazavi, Atiyeh; Moattar, Mohammad Hossein

    2016-01-01

    High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  3. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information

    Directory of Open Access Journals (Sweden)

    Atiyeh Mortazavi

    2016-01-01

    Full Text Available High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  4. Machine Learning Feature Selection for Tuning Memory Page Swapping

    Science.gov (United States)

    2013-09-01

    erroneous and generally results in useful pages being paged out too early, only to be paged back in shortly there after. [1] The first in/first out ( FIFO ...the tail of the queue are selected. This algorithm has been shown to have significant shortcomings. When using a FIFO PRA, it is possible to encounter a...page which was just paged out. FIFO is therefore, a sub-optimal page replacement algorithm. Least recently used (LRU) is incredibly simple in concept

  5. Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

    Science.gov (United States)

    Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin

    2017-04-01

    As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.

  6. GAIN RATIO BASED FEATURE SELECTION METHOD FOR PRIVACY PRESERVATION

    Directory of Open Access Journals (Sweden)

    R. Praveena Priyadarsini

    2011-04-01

    Full Text Available Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional datasets like adult and census. since, both the data sets are high dimensional, feature subset selection method like Gain Ratio is applied and the attributes of the datasets are ranked and low ranking attributes are filtered to form new reduced data subsets. K-anonymization privacy preservation technique is then applied on reduced datasets. The accuracy of the privacy preserved reduced datasets and the original datasets are compared for their accuracy on the two functionalities of data mining namely classification and clustering using naïve Bayesian and k-means algorithm respectively. Experimental results show that classification and clustering accuracy are comparatively the same for reduced k-anonym zed datasets and the original data sets.

  7. Particle swarm optimization algorithm for partner selection in virtual enterprise

    Institute of Scientific and Technical Information of China (English)

    Qiang Zhao; Xinhui Zhang; Renbin Xiao

    2008-01-01

    Partner selection is a fundamental problem in the formation and success of a virtual enterprise. The partner selection problem with precedence and due date constraint is the basis of the various extensions and is studied in this paper. A nonlinear integer program model for the partner selection problem is established. The problem is shown to be NP-complete by reduction to the knapsack problem, and therefore no polynomial time algorithm exists. To solve it efficiently, a particle swarm optimization (PSO) algorithm is adopted, and several mechanisms that include initialization expansion mechanism, variance mechanism and local searching mechanism have been developed to improve the performance of the proposed PSO algorithm. A set of experiments have been conducted using real examples and numerical simulation, and have shown that the PSO algorithm is an effective and efficient way to solve the partner selection problems with precedence and due date constraints.

  8. Genetic Fuzzy System (GFS based wavelet co-occurrence feature selection in mammogram classification for breast cancer diagnosis

    Directory of Open Access Journals (Sweden)

    Meenakshi M. Pawar

    2016-09-01

    Full Text Available Breast cancer is significant health problem diagnosed mostly in women worldwide. Therefore, early detection of breast cancer is performed with the help of digital mammography, which can reduce mortality rate. This paper presents wrapper based feature selection approach for wavelet co-occurrence feature (WCF using Genetic Fuzzy System (GFS in mammogram classification problem. The performance of GFS algorithm is explained using mini-MIAS database. WCF features are obtained from detail wavelet coefficients at each level of decomposition of mammogram image. At first level of decomposition, 18 features are applied to GFS algorithm, which selects 5 features with an average classification success rate of 39.64%. Subsequently, at second level it selects 9 features from 36 features and the classification success rate is improved to 56.75%. For third level, 16 features are selected from 54 features and average success rate is improved to 64.98%. Lastly, at fourth level 72 features are applied to GFS, which selects 16 features and thereby increasing average success rate to 89.47%. Hence, GFS algorithm is the effective way of obtaining optimal set of feature in breast cancer diagnosis.

  9. Improving Image steganalysis performance using a graph-based feature selection method

    Directory of Open Access Journals (Sweden)

    Amir Nouri

    2016-05-01

    Full Text Available Steganalysis is the skill of discovering the use of steganography algorithms within an image with low or no information regarding the steganography algorithm or/and its parameters. The high-dimensionality of image data with small number of samples has presented a difficult challenge for the steganalysis task. Several methods have been presented to improve the steganalysis performance by feature selection. Feature selection, also known as variable selection, is one of the fundamental problems in the fields of machine learning, pattern recognition and statistics. The aim of feature selection is to reduce the dimensionality of image data in order to enhance the accuracy of Steganalysis task. In this paper, we have proposed a new graph-based blind steganalysis method for detecting stego images from the cover images in JPEG images using a feature selection technique based on community detection. The experimental results show that the proposed approach is easy to be employed for steganalysis purposes. Moreover, performance of proposed method is better than several recent and well-known feature selection-based Image steganalysis methods.

  10. Feature Selection for Neural Network Based Stock Prediction

    Science.gov (United States)

    Sugunnasil, Prompong; Somhom, Samerkae

    We propose a new methodology of feature selection for stock movement prediction. The methodology is based upon finding those features which minimize the correlation relation function. We first produce all the combination of feature and evaluate each of them by using our evaluate function. We search through the generated set with hill climbing approach. The self-organizing map based stock prediction model is utilized as the prediction method. We conduct the experiment on data sets of the Microsoft Corporation, General Electric Co. and Ford Motor Co. The results show that our feature selection method can improve the efficiency of the neural network based stock prediction.

  11. Immune Algorithm for Selecting Optimum Services in Web Services Composition

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    For the problem of dynamic optimization in Web services composition, this paper presents a novel approach for selecting optimum Web services, which is based on the longest path method of weighted multistage graph. We propose and implement an Immune Algorithm for global optimization to construct composed Web services. Results of the experimentation illustrates that the algorithm in this paper has a powerful capability and can greatly improve the efficiency and veracity in service selection.

  12. Ant-cuckoo colony optimization for feature selection in digital mammogram.

    Science.gov (United States)

    Jona, J B; Nagaveni, N

    2014-01-15

    Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.

  13. Eigenvalue-weighting and feature selection for computer-aided polyp detection in CT colonography

    Science.gov (United States)

    Zhu, Hongbin; Wang, Su; Fan, Yi; Lu, Hongbing; Liang, Zhengrong

    2010-03-01

    With the development of computer-aided polyp detection towards virtual colonoscopy screening, the trade-off between detection sensitivity and specificity has gained increasing attention. An optimum detection, with least number of false positives and highest true positive rate, is desirable and involves interdisciplinary knowledge, such as feature extraction, feature selection as well as machine learning. Toward that goal, various geometrical and textural features, associated with each suspicious polyp candidate, have been individually extracted and stacked together as a feature vector. However, directly inputting these high-dimensional feature vectors into a learning machine, e.g., neural network, for polyp detection may introduce redundant information due to feature correlation and induce the curse of dimensionality. In this paper, we explored an indispensable building block of computer-aided polyp detection, i.e., principal component analysis (PCA)-weighted feature selection for neural network classifier of true and false positives. The major concepts proposed in this paper include (1) the use of PCA to reduce the feature correlation, (2) the scheme of adaptively weighting each principal component (PC) by the associated eigenvalue, and (3) the selection of feature combinations via the genetic algorithm. As such, the eigenvalue is also taken as part of the characterizing feature, and the necessary number of features can be exposed to mitigate the curse of dimensionality. Learned and tested by radial basis neural network, the proposed computer-aided polyp detection has achieved 95% sensitivity at a cost of average 2.99 false positives per polyp.

  14. VHDL implementation of feature-extraction algorithm for the PANDA electromagnetic calorimeter

    Science.gov (United States)

    Guliyev, E.; Kavatsyuk, M.; Lemmens, P. J. J.; Tambave, G.; Löhner, H.; Panda Collaboration

    2012-02-01

    A simple, efficient, and robust feature-extraction algorithm, developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA spectrometer at FAIR, Darmstadt, is implemented in VHDL for a commercial 16 bit 100 MHz sampling ADC. The source-code is available as an open-source project and is adaptable for other projects and sampling ADCs. Best performance with different types of signal sources can be achieved through flexible parameter selection. The on-line data-processing in FPGA enables to construct an almost dead-time free data acquisition system which is successfully evaluated as a first step towards building a complete trigger-less readout chain. Prototype setups are studied to determine the dead-time of the implemented algorithm, the rate of false triggering, timing performance, and event correlations.

  15. Selective attention to temporal features on nested time scales.

    Science.gov (United States)

    Henry, Molly J; Herrmann, Björn; Obleser, Jonas

    2015-02-01

    Meaningful auditory stimuli such as speech and music often vary simultaneously along multiple time scales. Thus, listeners must selectively attend to, and selectively ignore, separate but intertwined temporal features. The current study aimed to identify and characterize the neural network specifically involved in this feature-selective attention to time. We used a novel paradigm where listeners judged either the duration or modulation rate of auditory stimuli, and in which the stimulation, working memory demands, response requirements, and task difficulty were held constant. A first analysis identified all brain regions where individual brain activation patterns were correlated with individual behavioral performance patterns, which thus supported temporal judgments generically. A second analysis then isolated those brain regions that specifically regulated selective attention to temporal features: Neural responses in a bilateral fronto-parietal network including insular cortex and basal ganglia decreased with degree of change of the attended temporal feature. Critically, response patterns in these regions were inverted when the task required selectively ignoring this feature. The results demonstrate how the neural analysis of complex acoustic stimuli with multiple temporal features depends on a fronto-parietal network that simultaneously regulates the selective gain for attended and ignored temporal features.

  16. Feature-selective attention in healthy old age: a selective decline in selective attention?

    Science.gov (United States)

    Quigley, Cliodhna; Müller, Matthias M

    2014-02-12

    Deficient selection against irrelevant information has been proposed to underlie age-related cognitive decline. We recently reported evidence for maintained early sensory selection when older and younger adults used spatial selective attention to perform a challenging task. Here we explored age-related differences when spatial selection is not possible and feature-selective attention must be deployed. We additionally compared the integrity of feedforward processing by exploiting the well established phenomenon of suppression of visual cortical responses attributable to interstimulus competition. Electroencephalogram was measured while older and younger human adults responded to brief occurrences of coherent motion in an attended stimulus composed of randomly moving, orientation-defined, flickering bars. Attention was directed to horizontal or vertical bars by a pretrial cue, after which two orthogonally oriented, overlapping stimuli or a single stimulus were presented. Horizontal and vertical bars flickered at different frequencies and thereby elicited separable steady-state visual-evoked potentials, which were used to examine the effect of feature-based selection and the competitive influence of a second stimulus on ongoing visual processing. Age differences were found in feature-selective attentional modulation of visual responses: older adults did not show consistent modulation of magnitude or phase. In contrast, the suppressive effect of a second stimulus was robust and comparable in magnitude across age groups, suggesting that bottom-up processing of the current stimuli is essentially unchanged in healthy old age. Thus, it seems that visual processing per se is unchanged, but top-down attentional control is compromised in older adults when space cannot be used to guide selection.

  17. A Novel Feature Selection Strategy for Enhanced Biomedical Event Extraction Using the Turku System

    Directory of Open Access Journals (Sweden)

    Jingbo Xia

    2014-01-01

    Full Text Available Feature selection is of paramount importance for text-mining classifiers with high-dimensional features. The Turku Event Extraction System (TEES is the best performing tool in the GENIA BioNLP 2009/2011 shared tasks, which relies heavily on high-dimensional features. This paper describes research which, based on an implementation of an accumulated effect evaluation (AEE algorithm applying the greedy search strategy, analyses the contribution of every single feature class in TEES with a view to identify important features and modify the feature set accordingly. With an updated feature set, a new system is acquired with enhanced performance which achieves an increased F-score of 53.27% up from 51.21% for Task 1 under strict evaluation criteria and 57.24% according to the approximate span and recursive criterion.

  18. Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations.

    Directory of Open Access Journals (Sweden)

    Benjamin A Logsdon

    Full Text Available Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL, which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1, and genes involved in endocytosis (RCY1, the spindle checkpoint (BUB2, sulfonate catabolism (JLP1, and cell-cell communication (PRM7. Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data.

  19. A fingerprint feature extraction algorithm based on curvature of Bezier curve

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Fingerprint feature extraction is a key step of fingerprint identification. A novel feature extraction algorithm is proposed in this paper, which describes fingerprint feature with the bending information of fingerprint ridges. Ridges in the specific region of fingerprint images are traced firstly in the algorithm, and then these ridges are fit with Bezier curve. Finally, the point that has the maximal curvature on Bezier curve is defined as a feature point. Experimental results demonstrate that this kind of feature points characterize the bending trend of fingerprint ridges effectively, and they are robust to noise, in addition, the extraction precision of this algorithm is also better than the conventional approaches.

  20. Feature selection for domain knowledge representation through multitask learning

    CSIR Research Space (South Africa)

    Rosman, Benjamin S

    2014-10-01

    Full Text Available -1 Feature selection for domain knowledge representation through multitask learning Benjamin Rosman Mobile Intelligent Autonomous Systems CSIR South Africa BRosman@csir.co.za Representation learning is a difficult and important problem...

  1. AlPOs Synthetic Factor Analysis Based on Maximum Weight and Minimum Redundancy Feature Selection

    Directory of Open Access Journals (Sweden)

    Yinghua Lv

    2013-11-01

    Full Text Available The relationship between synthetic factors and the resulting structures is critical for rational synthesis of zeolites and related microporous materials. In this paper, we develop a new feature selection method for synthetic factor analysis of (6,12-ring-containing microporous aluminophosphates (AlPOs. The proposed method is based on a maximum weight and minimum redundancy criterion. With the proposed method, we can select the feature subset in which the features are most relevant to the synthetic structure while the redundancy among these selected features is minimal. Based on the database of AlPO synthesis, we use (6,12-ring-containing AlPOs as the target class and incorporate 21 synthetic factors including gel composition, solvent and organic template to predict the formation of (6,12-ring-containing microporous aluminophosphates (AlPOs. From these 21 features, 12 selected features are deemed as the optimized features to distinguish (6,12-ring-containing AlPOs from other AlPOs without such rings. The prediction model achieves a classification accuracy rate of 91.12% using the optimal feature subset. Comprehensive experiments demonstrate the effectiveness of the proposed algorithm, and deep analysis is given for the synthetic factors selected by the proposed method.

  2. Electrocardiogram Based Identification using a New Effective Intelligent Selection of Fused Features

    Science.gov (United States)

    Abbaspour, Hamidreza; Razavi, Seyyed Mohammad; Mehrshad, Nasser

    2015-01-01

    Over the years, the feasibility of using Electrocardiogram (ECG) signal for human identification issue has been investigated, and some methods have been suggested. In this research, a new effective intelligent feature selection method from ECG signals has been proposed. This method is developed in such a way that it is able to select important features that are necessary for identification using analysis of the ECG signals. For this purpose, after ECG signal preprocessing, its characterizing features were extracted and then compressed using the cosine transform. The more effective features in the identification, among the characterizing features, are selected using a combination of the genetic algorithm and artificial neural networks. The proposed method was tested on three public ECG databases, namely, MIT-BIH Arrhythmias Database, MITBIH Normal Sinus Rhythm Database and The European ST-T Database, in order to evaluate the proposed subject identification method on normal ECG signals as well as ECG signals with arrhythmias. Identification rates of 99.89% and 99.84% and 99.99% are obtained for these databases respectively. The proposed algorithm exhibits remarkable identification accuracies not only with normal ECG signals, but also in the presence of various arrhythmias. Simulation results showed that the proposed method despite the low number of selected features has a high performance in identification task. PMID:25709939

  3. An ant colony optimization based feature selection for web page classification.

    Science.gov (United States)

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  4. An Ant Colony Optimization Based Feature Selection for Web Page Classification

    Directory of Open Access Journals (Sweden)

    Esra Saraç

    2014-01-01

    Full Text Available The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  5. Feature selection for appearance-based vehicle tracking in geospatial video

    Science.gov (United States)

    Poostchi, Mahdieh; Bunyak, Filiz; Palaniappan, Kannappan; Seetharaman, Guna

    2013-05-01

    Current video tracking systems often employ a rich set of intensity, edge, texture, shape and object level features combined with descriptors for appearance modeling. This approach increases tracker robustness but is compu- tationally expensive for realtime applications and localization accuracy can be adversely affected by including distracting features in the feature fusion or object classification processes. This paper explores offline feature subset selection using a filter-based evaluation approach for video tracking to reduce the dimensionality of the feature space and to discover relevant representative lower dimensional subspaces for online tracking. We com- pare the performance of the exhaustive FOCUS algorithm to the sequential heuristic SFFS, SFS and RELIEF feature selection methods. Experiments show that using offline feature selection reduces computational complex- ity, improves feature fusion and is expected to translate into better online tracking performance. Overall SFFS and SFS perform very well, close to the optimum determined by FOCUS, but RELIEF does not work as well for feature selection in the context of appearance-based object tracking.

  6. A DYNAMIC FEATURE SELECTION METHOD FOR DOCUMENT RANKING WITH RELEVANCE FEEDBACK APPROACH

    Directory of Open Access Journals (Sweden)

    K. Latha

    2010-07-01

    Full Text Available Ranking search results is essential for information retrieval and Web search. Search engines need to not only return highly relevant results, but also be fast to satisfy users. As a result, not all available features can be used for ranking, and in fact only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. In this paper we describe a 0/1 knapsack procedure for automatically selecting features to use within Generalization model for Document Ranking. We propose an approach for Relevance Feedback using Expectation Maximization method and evaluate the algorithm on the TREC Collection for describing classes of feedback textual information retrieval features. Experimental results, evaluated on standard TREC-9 part of the OHSUMED collections, show that our feature selection algorithm produces models that are either significantly more effective than, or equally effective as, models such as Markov Random Field model, Correlation Co-efficient and Count Difference method

  7. Comparative Study on Feature Selection and Fusion Schemes for Emotion Recognition from Speech

    Directory of Open Access Journals (Sweden)

    Santiago Planet

    2012-09-01

    Full Text Available The automatic analysis of speech to detect affective states may improve the way users interact with electronic devices. However, the analysis only at the acoustic level could be not enough to determine the emotion of a user in a realistic scenario. In this paper we analyzed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We compared two versions of this greedy selection algorithm by performing the search of the relevant features forwards and backwards. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme. Moreover, the extremely reduced set of acoustic features obtained by the greedy forward search selection algorithm improved the results provided by the full set.

  8. Tuning target selection algorithms to improve galaxy redshift estimates

    CERN Document Server

    Hoyle, Ben; Rau, Markus Michael; Seitz, Stella; Weller, Jochen

    2015-01-01

    We showcase machine learning (ML) inspired target selection algorithms to determine which of all potential targets should be selected first for spectroscopic follow up. Efficient target selection can improve the ML redshift uncertainties as calculated on an independent sample, while requiring less targets to be observed. We compare the ML targeting algorithms with the Sloan Digital Sky Survey (SDSS) target order, and with a random targeting algorithm. The ML inspired algorithms are constructed iteratively by estimating which of the remaining target galaxies will be most difficult for the machine learning methods to accurately estimate redshifts using the previously observed data. This is performed by predicting the expected redshift error and redshift offset (or bias) of all of the remaining target galaxies. We find that the predicted values of bias and error are accurate to better than 10-30% of the true values, even with only limited training sample sizes. We construct a hypothetical follow-up survey and fi...

  9. TOPSIS Based Multi-Criteria Decision Making of Feature Selection Techniques for Network Traffic Dataset

    Directory of Open Access Journals (Sweden)

    Raman Singh

    2014-01-01

    Full Text Available Intrusion detection systems (IDS have to process millions of packets with many features, which delay the detection of anomalies. Sampling and feature selection may be used to reduce computation time and hence minimizing intrusion detection time. This paper aims to suggest some feature selection algorithm on the basis of The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS. TOPSIS is used to suggest one or more choice(s among some alternatives, having many attributes. Total ten feature selection techniques have been used for the analysis of KDD network dataset. Three classifiers namely Naïve Bayes, J48 and PART have been considered for this experiment using Weka data mining tool. Ranking of the techniques using TOPSIS have been calculated by using MATLAB as a tool. Out of these techniques Filtered Subset Evaluation has been found suitable for intrusion detection in terms of very less computational time with acceptable accuracy.

  10. Feature Selection for Better Identification of Subtypes of Guillain-Barré Syndrome

    Directory of Open Access Journals (Sweden)

    José Hernández-Torruco

    2014-01-01

    Full Text Available Guillain-Barré syndrome (GBS is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS, chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.

  11. A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features

    Directory of Open Access Journals (Sweden)

    P. Amudha

    2015-01-01

    Full Text Available Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. This is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC with Enhanced Particle Swarm Optimization (EPSO to predict the intrusion detection problem. The algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. The purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup’99 benchmark dataset from the UCI Machine Learning repository is used. The performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different.

  12. Evolutionary Algorithm Based Feature Optimization for Multi-Channel EEG Classification

    Science.gov (United States)

    Wang, Yubo; Veluvolu, Kalyana C.

    2017-01-01

    The most BCI systems that rely on EEG signals employ Fourier based methods for time-frequency decomposition for feature extraction. The band-limited multiple Fourier linear combiner is well-suited for such band-limited signals due to its real-time applicability. Despite the improved performance of these techniques in two channel settings, its application in multiple-channel EEG is not straightforward and challenging. As more channels are available, a spatial filter will be required to eliminate the noise and preserve the required useful information. Moreover, multiple-channel EEG also adds the high dimensionality to the frequency feature space. Feature selection will be required to stabilize the performance of the classifier. In this paper, we develop a new method based on Evolutionary Algorithm (EA) to solve these two problems simultaneously. The real-valued EA encodes both the spatial filter estimates and the feature selection into its solution and optimizes it with respect to the classification error. Three Fourier based designs are tested in this paper. Our results show that the combination of Fourier based method with covariance matrix adaptation evolution strategy (CMA-ES) has the best overall performance. PMID:28203141

  13. Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images.

    Science.gov (United States)

    Zhang, Lefei; Zhang, Qian; Du, Bo; Huang, Xin; Tang, Yuan Yan; Tao, Dacheng

    2016-09-12

    In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature, and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation has not efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient.

  14. Multi-task GLOH feature selection for human age estimation

    CERN Document Server

    Liang, Yixiong; Xu, Ying; Xiang, Yao; Zou, Beiji

    2011-01-01

    In this paper, we propose a novel age estimation method based on GLOH feature descriptor and multi-task learning (MTL). The GLOH feature descriptor, one of the state-of-the-art feature descriptor, is used to capture the age-related local and spatial information of face image. As the exacted GLOH features are often redundant, MTL is designed to select the most informative feature bins for age estimation problem, while the corresponding weights are determined by ridge regression. This approach largely reduces the dimensions of feature, which can not only improve performance but also decrease the computational burden. Experiments on the public available FG-NET database show that the proposed method can achieve comparable performance over previous approaches while using much fewer features.

  15. Accurate Image Retrieval Algorithm Based on Color and Texture Feature

    Directory of Open Access Journals (Sweden)

    Chunlai Yan

    2013-06-01

    Full Text Available Content-Based Image Retrieval (CBIR is one of the most active hot spots in the current research field of multimedia retrieval. According to the description and extraction of visual content (feature of the image, CBIR aims to find images that contain specified content (feature in the image database. In this paper, several key technologies of CBIR, e. g. the extraction of the color and texture features of the image, as well as the similarity measures are investigated. On the basis of the theoretical research, an image retrieval system based on color and texture features is designed. In this system, the Weighted Color Feature based on HSV space is adopted as a color feature vector, four features of the Co-occurrence Matrix, saying Energy, Entropy, Inertia Quadrature and Correlation, are used to construct texture vectors, and the Euclidean distance for similarity measure is employed as well. Experimental results show that this CBIR system is efficient in image retrieval.

  16. Advances in feature selection methods for hyperspectral image processing in food industry applications: a review.

    Science.gov (United States)

    Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An

    2015-01-01

    There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.

  17. Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

    Energy Technology Data Exchange (ETDEWEB)

    Pon, R K; Cardenas, A F; Buttler, D J

    2007-09-19

    The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifier with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.

  18. Dominant Local Binary Pattern Based Face Feature Selection and Detection

    Directory of Open Access Journals (Sweden)

    Kavitha.T

    2010-04-01

    Full Text Available Face Detection plays a major role in Biometrics.Feature selection is a problem of formidable complexity. Thispaper proposes a novel approach to extract face features forface detection. The LBP features can be extracted faster in asingle scan through the raw image and lie in a lower dimensional space, whilst still retaining facial information efficiently. The LBP features are robust to low-resolution images. The dominant local binary pattern (DLBP is used to extract features accurately. A number of trainable methods are emerging in the empirical practice due to their effectiveness. The proposed method is a trainable system for selecting face features from over-completes dictionaries of imagemeasurements. After the feature selection procedure is completed the SVM classifier is used for face detection. The main advantage of this proposal is that it is trained on a very small training set. The classifier is used to increase the selection accuracy. This is not only advantageous to facilitate the datagathering stage, but, more importantly, to limit the training time. CBCL frontal faces dataset is used for training and validation.

  19. Dualheap Selection Algorithm: Efficient, Inherently Parallel and Somewhat Mysterious

    CERN Document Server

    Sepesi, Greg

    2007-01-01

    An inherently parallel algorithm is proposed that efficiently performs selection: finding the K-th largest member of a set of N members. Selection is a common component of many more complex algorithms and therefore is a widely studied problem. Not much is new in the proposed dualheap selection algorithm: the heap data structure is from J.W.J.Williams, the bottom-up heap construction is from R.W. Floyd, and the concept of a two heap data structure is from J.W.J. Williams and D.E. Knuth. The algorithm's novelty is limited to a few relatively minor implementation twists: 1) the two heaps are oriented with their roots at the partition values rather than at the minimum and maximum values, 2)the coding of one of the heaps (the heap of smaller values) employs negative indexing, and 3) the exchange phase of the algorithm is similar to a bottom-up heap construction, but navigates the heap with a post-order tree traversal. When run on a single processor, the dualheap selection algorithm's performance is competitive wit...

  20. Tuning target selection algorithms to improve galaxy redshift estimates

    Science.gov (United States)

    Hoyle, Ben; Paech, Kerstin; Rau, Markus Michael; Seitz, Stella; Weller, Jochen

    2016-06-01

    We showcase machine learning (ML) inspired target selection algorithms to determine which of all potential targets should be selected first for spectroscopic follow-up. Efficient target selection can improve the ML redshift uncertainties as calculated on an independent sample, while requiring less targets to be observed. We compare seven different ML targeting algorithms with the Sloan Digital Sky Survey (SDSS) target order, and with a random targeting algorithm. The ML inspired algorithms are constructed iteratively by estimating which of the remaining target galaxies will be most difficult for the ML methods to accurately estimate redshifts using the previously observed data. This is performed by predicting the expected redshift error and redshift offset (or bias) of all of the remaining target galaxies. We find that the predicted values of bias and error are accurate to better than 10-30 per cent of the true values, even with only limited training sample sizes. We construct a hypothetical follow-up survey and find that some of the ML targeting algorithms are able to obtain the same redshift predictive power with 2-3 times less observing time, as compared to that of the SDSS, or random, target selection algorithms. The reduction in the required follow-up resources could allow for a change to the follow-up strategy, for example by obtaining deeper spectroscopy, which could improve ML redshift estimates for deeper test data.

  1. Simulation Algorithm of Image Background Suppression Atomization in Guidance of Optimized Feature Selection%优选特征指导下雾化图像背景抑制算法仿真

    Institute of Scientific and Technical Information of China (English)

    彭其华

    2015-01-01

    雾化图像的浓雾背景抑制是提高雾天环境下成像的分辨力和图像质量的有效技术,在浓雾环境下雾化图像的边缘雾点受到暗原色干扰,去雾效果不好。提出一种基于优选特征指导下的雾化图像背景抑制算法。通过对雾化图像进行光线滤波平滑处理,采用雾化图像的暗原色特征增强方法实现对雾化图像背景抑制,采用雾化图像的暗原色特征增强方法实现对雾化图像背景抑制算法改进,提高了图像的分辨能力。仿真结果表明,采用该方法进行雾化图像去雾背景抑制,能较为平稳地拟合雾化图像的雾化点,降低图像的归一化最小平方误差,提高去雾效能。%The atomization image fog background suppression is an effective technique to improve the quality of image resolution and fog imaging ,in the dense fog environment ,the fog atomizing image is affected by the interference of dark colors ,the defogging effect is not good .An optimized mage background suppression algorithm is proposed under the guidance of atomization characteristics .By the light of atomization image smoothing ,using dark channel characteristics of image enhancement method to suppress the atomization of image background ,using dark channel characteristics of atomization image enhancement method to achieve improved suppression algorithm of atomization image background ,image resolution is improved .The simulation results show that ,using this method to suppress the background fog atomization image ,it can smoothly fitting image atomization point ,reduce the normalized least square error of the image ,it can improve the efficiency of the fog .

  2. A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot.

    Science.gov (United States)

    Liu, Zhi; Xu, Shuqiong; Zhang, Yun; Chen, Chun Lung Philip

    2014-11-01

    This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.

  3. The Effect of Feature Selection on Phish Website Detection

    Directory of Open Access Journals (Sweden)

    Hiba Zuhair

    2015-10-01

    Full Text Available Recently, limited anti-phishing campaigns have given phishers more possibilities to bypass through their advanced deceptions. Moreover, failure to devise appropriate classification techniques to effectively identify these deceptions has degraded the detection of phishing websites. Consequently, exploiting as new; few; predictive; and effective features as possible has emerged as a key challenge to keep the detection resilient. Thus, some prior works had been carried out to investigate and apply certain selected methods to develop their own classification techniques. However, no study had generally agreed on which feature selection method that could be employed as the best assistant to enhance the classification performance. Hence, this study empirically examined these methods and their effects on classification performance. Furthermore, it recommends some promoting criteria to assess their outcomes and offers contribution on the problem at hand. Hybrid features, low and high dimensional datasets, different feature selection methods, and classification models were examined in this study. As a result, the findings displayed notably improved detection precision with low latency, as well as noteworthy gains in robustness and prediction susceptibilities. Although selecting an ideal feature subset was a challenging task, the findings retrieved from this study had provided the most advantageous feature subset as possible for robust selection and effective classification in the phishing detection domain.

  4. Selecting Optimal Subset of Features for Student Performance Model

    Directory of Open Access Journals (Sweden)

    Hany M. Harb

    2012-09-01

    Full Text Available Educational data mining (EDM is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the student behavior in the learning process. Classification methods like decision trees, rule mining, and Bayesian network, can be applied on the educational data for predicting the student behavior like performance in an examination. This prediction may help in student evaluation. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. The main objective of this work is to achieve high predictive performance by adopting various feature selection techniques to increase the predictive accuracy with least number of features. The outcomes show a reduction in computational time and constructional cost in both training and classification phases of the student performance model.

  5. [Research on non-rigid medical image registration algorithm based on SIFT feature extraction].

    Science.gov (United States)

    Wang, Anna; Lu, Dan; Wang, Zhe; Fang, Zhizhen

    2010-08-01

    In allusion to non-rigid registration of medical images, the paper gives a practical feature points matching algorithm--the image registration algorithm based on the scale-invariant features transform (Scale Invariant Feature Transform, SIFT). The algorithm makes use of the image features of translation, rotation and affine transformation invariance in scale space to extract the image feature points. Bidirectional matching algorithm is chosen to establish the matching relations between the images, so the accuracy of image registrations is improved. On this basis, affine transform is chosen to complement the non-rigid registration, and normalized mutual information measure and PSO optimization algorithm are also chosen to optimize the registration process. The experimental results show that the method can achieve better registration results than the method based on mutual information.

  6. A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease

    Science.gov (United States)

    Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas

    2017-08-01

    The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.

  7. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  8. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  9. Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection

    Science.gov (United States)

    Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj

    2015-08-01

    Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers -based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations.

  10. Mutual information-based feature selection for radiomics

    Science.gov (United States)

    Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine

    2016-03-01

    Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.

  11. Tournament screening cum EBIC for feature selection with high-dimensional feature spaces

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics.The high dimensionality of the feature space causes serious diffculties:(i) the sample correlations between features become high even if the features are stochastically independent;(ii) the computation becomes intractable.These diffculties make conventional approaches either inapplicable or ine?cient.The reduction of dimensionality of the feature space followed by low dimensional approaches appears the only feasible way to tackle the problem.Along this line,we develop in this article a tournament screening cum EBIC approach for feature selection with high dimensional feature space.The procedure of tournament screening mimics that of a tournament.It is shown theoretically that the tournament screening has the sure screening property,a necessary property which should be satisfied by any valid screening procedure.It is demonstrated by numerical studies that the tournament screening cum EBIC approach enjoys desirable properties such as having higher positive selection rate and lower false discovery rate than other approaches.

  12. Feature selection versus feature compression in the building of calibration models from FTIR-spectrophotometry datasets.

    Science.gov (United States)

    Vergara, Alexander; Llobet, Eduard

    2012-01-15

    Undoubtedly, FTIR-spectrophotometry has become a standard in chemical industry for monitoring, on-the-fly, the different concentrations of reagents and by-products. However, representing chemical samples by FTIR spectra, which spectra are characterized by hundreds if not thousands of variables, conveys their own set of particular challenges because they necessitate to be analyzed in a high-dimensional feature space, where many of these features are likely to be highly correlated and many others surely affected by noise. Therefore, identifying a subset of features that preserves the classifier/regressor performance seems imperative prior any attempt to build an appropriate pattern recognition method. In this context, we investigate the benefit of utilizing two different dimensionality reduction methods, namely the minimum Redundancy-Maximum Relevance (mRMR) feature selection scheme and a new self-organized map (SOM) based feature compression, coupled to regression methods to quantitatively analyze two-component liquid samples utilizing FTIR spectrophotometry. Since these methods give us the possibility of selecting a small subset of relevant features from FTIR spectra preserving the statistical characteristics of the target variable being analyzed, we claim that expressing the FTIR spectra by these dimensionality-reduced set of features may be beneficial. We demonstrate the utility of these novel feature selection schemes in quantifying the distinct analytes within their binary mixtures utilizing a FTIR-spectrophotometer.

  13. An Improved Local Community Detection Algorithm Using Selection Probability

    Directory of Open Access Journals (Sweden)

    Shixiong Xia

    2014-01-01

    Full Text Available In order to find the structure of local community more effectively, we propose an improved local community detection algorithm ILCDSP, which improves the node selection strategy, and sets selection probability value for every candidate node. ILCDSP assigns nodes with different selection probability values, which are equal to the degree of the nodes to be chosen. By this kind of strategy, the proposed algorithm can detect the local communities effectively, since it can ensure the best search direction and avoid the local optimal solution. Various experimental results on both synthetic and real networks demonstrate that the quality of the local communities detected by our algorithm is significantly superior to the state-of-the-art methods.

  14. Hybridization of Evolutionary Mechanisms for Feature Subset Selection in Unsupervised Learning

    Science.gov (United States)

    Torres, Dolores; Ponce-de-León, Eunice; Torres, Aurora; Ochoa, Alberto; Díaz, Elva

    Feature subset selection for unsupervised learning, is a very important topic in artificial intelligence because it is the base for saving computational resources. In this implementation we use a typical testor’s methodology in order to incorporate an importance index for each variable. This paper presents the general framework and the way two hybridized meta-heuristics work in this NP-complete problem. The evolutionary mechanisms are based on the Univariate Marginal Distribution Algorithm (UMDA) and the Genetic Algorithm (GA). GA and UMDA - Estimation of Distribution Algorithm (EDA) use a very useful rapid operator implemented for finding typical testors on a very large dataset and also, both algorithms, have a local search mechanism for improving time and fitness. Experiments show that EDA is faster than GA because it has a better exploitation performance; nevertheless, GA’ solutions are more consistent.

  15. Comparative Analysis of PSO and GA in Geom-Statistical Character Features Selection for Online Character Recognition

    Directory of Open Access Journals (Sweden)

    Fenwa O.D

    2015-08-01

    Full Text Available Online handwriting recognition today has special interest due to increased usage of the hand held devices and it has become a difficult problem because of the high variability and ambiguity in the character shapes written by individuals. One major problem encountered by researchers in developing character recognition system is selection of efficient features (optimal features. In this paper, a feature extraction technique for online character recognition system was developed using hybrid of geometrical and statistical (Geom-statistical features. Thus, through the integration of geometrical and statistical features, insights were gained into new character properties, since these types of features were considered to be complementary. Several optimization techniques have been used in literature for feature selection in character recognition such as; Ant Colony Optimization Algorithm (ACO, Genetic Algorithm (GA, Particle Swarm Optimization (PSO and Simulated Annealing but comparative analysis of GA and PSO in online character has not been carried out. In this paper, a comparative analysis of performance was made between the GA and PSO in optimizing the Geom-statistical features in online character recognition using Modified Optical Backpropagation (MOBP as classifier. Simulation of the system was done and carried out on Matlab 7.10a. The results generated show that PSO is a well-accepted optimization algorithm in selection of optimal features as it outperforms the GA in terms of number of features selected, training time and recognition accuracy.

  16. Selective processing of multiple features in the human brain: effects of feature type and salience.

    Science.gov (United States)

    McGinnis, E Menton; Keil, Andreas

    2011-02-09

    Identifying targets in a stream of items at a given constant spatial location relies on selection of aspects such as color, shape, or texture. Such attended (target) features of a stimulus elicit a negative-going event-related brain potential (ERP), termed Selection Negativity (SN), which has been used as an index of selective feature processing. In two experiments, participants viewed a series of Gabor patches in which targets were defined as a specific combination of color, orientation, and shape. Distracters were composed of different combinations of color, orientation, and shape of the target stimulus. This design allows comparisons of items with and without specific target features. Consistent with previous ERP research, SN deflections extended between 160-300 ms. Data from the subsequent P3 component (300-450 ms post-stimulus) were also examined, and were regarded as an index of target processing. In Experiment A, predominant effects of target color on SN and P3 amplitudes were found, along with smaller ERP differences in response to variations of orientation and shape. Manipulating color to be less salient while enhancing the saliency of the orientation of the Gabor patch (Experiment B) led to delayed color selection and enhanced orientation selection. Topographical analyses suggested that the location of SN on the scalp reliably varies with the nature of the to-be-attended feature. No interference of non-target features on the SN was observed. These results suggest that target feature selection operates by means of electrocortical facilitation of feature-specific sensory processes, and that selective electrocortical facilitation is more effective when stimulus saliency is heightened.

  17. Selective processing of multiple features in the human brain: effects of feature type and salience.

    Directory of Open Access Journals (Sweden)

    E Menton McGinnis

    Full Text Available Identifying targets in a stream of items at a given constant spatial location relies on selection of aspects such as color, shape, or texture. Such attended (target features of a stimulus elicit a negative-going event-related brain potential (ERP, termed Selection Negativity (SN, which has been used as an index of selective feature processing. In two experiments, participants viewed a series of Gabor patches in which targets were defined as a specific combination of color, orientation, and shape. Distracters were composed of different combinations of color, orientation, and shape of the target stimulus. This design allows comparisons of items with and without specific target features. Consistent with previous ERP research, SN deflections extended between 160-300 ms. Data from the subsequent P3 component (300-450 ms post-stimulus were also examined, and were regarded as an index of target processing. In Experiment A, predominant effects of target color on SN and P3 amplitudes were found, along with smaller ERP differences in response to variations of orientation and shape. Manipulating color to be less salient while enhancing the saliency of the orientation of the Gabor patch (Experiment B led to delayed color selection and enhanced orientation selection. Topographical analyses suggested that the location of SN on the scalp reliably varies with the nature of the to-be-attended feature. No interference of non-target features on the SN was observed. These results suggest that target feature selection operates by means of electrocortical facilitation of feature-specific sensory processes, and that selective electrocortical facilitation is more effective when stimulus saliency is heightened.

  18. Welding Diagnostics by Means of Particle Swarm Optimization and Feature Selection

    Directory of Open Access Journals (Sweden)

    J. Mirapeix

    2012-01-01

    Full Text Available In a previous contribution, a welding diagnostics approach based on plasma optical spectroscopy was presented. It consisted of the employment of optimization algorithms and synthetic spectra to obtain the participation profiles of the species participating in the plasma. A modification of the model is discussed here: on the one hand the controlled random search algorithm has been substituted by a particle swarm optimization implementation. On the other hand a feature selection stage has been included to determine those spectral windows where the optimization process will take place. Both experimental and field tests will be shown to illustrate the performance of the solution that improves the results of the previous work.

  19. EEG artifact elimination by extraction of ICA-component features using image processing algorithms.

    Science.gov (United States)

    Radüntz, T; Scouten, J; Hochmuth, O; Meffert, B

    2015-03-30

    Artifact rejection is a central issue when dealing with electroencephalogram recordings. Although independent component analysis (ICA) separates data in linearly independent components (IC), the classification of these components as artifact or EEG signal still requires visual inspection by experts. In this paper, we achieve automated artifact elimination using linear discriminant analysis (LDA) for classification of feature vectors extracted from ICA components via image processing algorithms. We compare the performance of this automated classifier to visual classification by experts and identify range filtering as a feature extraction method with great potential for automated IC artifact recognition (accuracy rate 88%). We obtain almost the same level of recognition performance for geometric features and local binary pattern (LBP) features. Compared to the existing automated solutions the proposed method has two main advantages: First, it does not depend on direct recording of artifact signals, which then, e.g. have to be subtracted from the contaminated EEG. Second, it is not limited to a specific number or type of artifact. In summary, the present method is an automatic, reliable, real-time capable and practical tool that reduces the time intensive manual selection of ICs for artifact removal. The results are very promising despite the relatively small channel resolution of 25 electrodes.

  20. Informative Feature Selection for Object Recognition via Sparse PCA

    Science.gov (United States)

    2011-04-07

    the BMW database [17] are used for training. For each image pair in SfM, SURF features are deemed informative if the consensus of the corresponding...observe that the first two sparse PVs are sufficient for selecting in- formative features that lie on the foreground objects in the BMW database (as... BMW ) database [17]. The database consists of multiple-view images of 20 landmark buildings on the Berkeley campus. For each building, wide-baseline

  1. Feature selection and multi-kernel learning for sparse representation on a manifold.

    Science.gov (United States)

    Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao et al. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods.

  2. Feature selection and multi-kernel learning for sparse representation on a manifold

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao etal. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods. © 2013 Elsevier Ltd.

  3. Selecting Features of Single Lead ECG Signal for Automatic Sleep Stages Classification using Correlation-based Feature Subset Selection

    Directory of Open Access Journals (Sweden)

    Ary Noviyanto

    2011-09-01

    Full Text Available Knowing about our sleep quality will help human life to maximize our life performance. ECG signal has potency to determine the sleep stages so that sleep quality can be measured. The data that used in this research is single lead ECG signal from the MIT-BIH Polysomnographic Database. The ECGs features can be derived from RR interval, EDR information and raw ECG signal. Correlation-based Feature Subset Selection (CFS is used to choose the features which are significant to determine the sleep stages. Those features will be evaluated using four different characteristic classifiers (Bayesian network, multilayer perceptron, IB1 and random forest. Performance evaluations by Bayesian network, IB1 and random forest show that CFS performs excellent. It can reduce the number of features significantly with small decreasing accuracy. The best classification result based on this research is a combination of the feature set derived from raw ECG signal and the random forest classifier.

  4. Fast Macroblock Mode Selection Algorithm for Multiview Video Coding

    Directory of Open Access Journals (Sweden)

    Qionghai Dai

    2009-01-01

    Full Text Available Multiview video coding (MVC plays an important role in three-dimensional video applications. Joint Video Team developed a joint multiview video model (JMVM in which full-search algorithm is employed in macroblock mode selection to provide the best rate distortion performance for MVC. However, it results in a considerable increase in encoding complexity. We propose a hybrid fast macroblock mode selection algorithm after analyzing the full-search algorithm of JMVM. For nonanchor frames of the base view, the proposed algorithm halfway stops the macroblock mode search process by designing three dynamic thresholds. When nonanchor frames of the other views are being encoded, the macroblock modes can be predicted from the frames of the neighboring views due to the strong correlations of the macroblock modes. Experimental results show that the proposed hybrid fast macroblock mode selection algorithm promotes the encoding speed by 2.37 ∼ 9.97 times without noticeable quality degradation compared with the JMVM.

  5. Algorithms for selecting informative marker panels for population assignment.

    Science.gov (United States)

    Rosenberg, Noah A

    2005-11-01

    Given a set of potential source populations, genotypes of an individual of unknown origin at a collection of markers can be used to predict the correct source population of the individual. For improved efficiency, informative markers can be chosen from a larger set of markers to maximize the accuracy of this prediction. However, selecting the loci that are individually most informative does not necessarily produce the optimal panel. Here, using genotypes from eight species--carp, cat, chicken, dog, fly, grayling, human, and maize--this univariate accumulation procedure is compared to new multivariate "greedy" and "maximin" algorithms for choosing marker panels. The procedures generally suggest similar panels, although the greedy method often recommends inclusion of loci that are not chosen by the other algorithms. In seven of the eight species, when applied to five or more markers, all methods achieve at least 94% assignment accuracy on simulated individuals, with one species--dog--producing this level of accuracy with only three markers, and the eighth species--human--requiring approximately 13-16 markers. The new algorithms produce substantial improvements over use of randomly selected markers; where differences among the methods are noticeable, the greedy algorithm leads to slightly higher probabilities of correct assignment. Although none of the approaches necessarily chooses the panel with optimal performance, the algorithms all likely select panels with performance near enough to the maximum that they all are suitable for practical use.

  6. Effective user selection algorithm for quantized precoding in massive MIMO

    Directory of Open Access Journals (Sweden)

    Nayan fang

    2015-02-01

    Full Text Available The downlink of a multi-user massive MIMO wireless system is considered, where the base station equipped with a large number of antennas simultaneously servesmultiple users. In this paper, an effective user selection algorithm is proposed for quantized precoding in massive MIMO systems. The algorithm aims at minimizing the correlation of precoders among users by relaxing the optimal problem to be convex and solving it using the Primal Newton’s Barrier Method. The complexity of the proposed algorithm is relatively low and the performance shown by the numerical results is close to the exhaustive search method. The advantage of the proposed algorithm increasingly shows up as the transmit antennas increase significantly.

  7. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Maolong Xi

    2016-01-01

    Full Text Available This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO for cancer feature gene selection, coupling support vector machine (SVM for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV. Finally, the BQPSO coupling SVM (BQPSO/SVM, binary PSO coupling SVM (BPSO/SVM, and genetic algorithm coupling SVM (GA/SVM are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms.

  8. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine.

    Science.gov (United States)

    Xi, Maolong; Sun, Jun; Liu, Li; Fan, Fangyun; Wu, Xiaojun

    2016-01-01

    This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO) for cancer feature gene selection, coupling support vector machine (SVM) for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV). Finally, the BQPSO coupling SVM (BQPSO/SVM), binary PSO coupling SVM (BPSO/SVM), and genetic algorithm coupling SVM (GA/SVM) are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms.

  9. Multiobjective immune algorithm with nondominated neighbor-based selection.

    Science.gov (United States)

    Gong, Maoguo; Jiao, Licheng; Du, Haifeng; Bo, Liefeng

    2008-01-01

    Abstract Nondominated Neighbor Immune Algorithm (NNIA) is proposed for multiobjective optimization by using a novel nondominated neighbor-based selection technique, an immune inspired operator, two heuristic search operators, and elitism. The unique selection technique of NNIA only selects minority isolated nondominated individuals in the population. The selected individuals are then cloned proportionally to their crowding-distance values before heuristic search. By using the nondominated neighbor-based selection and proportional cloning, NNIA pays more attention to the less-crowded regions of the current trade-off front. We compare NNIA with NSGA-II, SPEA2, PESA-II, and MISA in solving five DTLZ problems, five ZDT problems, and three low-dimensional problems. The statistical analysis based on three performance metrics including the coverage of two sets, the convergence metric, and the spacing, show that the unique selection method is effective, and NNIA is an effective algorithm for solving multiobjective optimization problems. The empirical study on NNIA's scalability with respect to the number of objectives shows that the new algorithm scales well along the number of objectives.

  10. Feature selection gait-based gender classification under different circumstances

    Science.gov (United States)

    Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

    2014-05-01

    This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.

  11. Spatial selection of features within perceived and remembered objects

    Directory of Open Access Journals (Sweden)

    Duncan E Astle

    2009-04-01

    Full Text Available Our representation of the visual world can be modulated by spatially specific attentional biases that depend flexibly on task goals. We compared searching for task-relevant features in perceived versus remembered objects. When searching perceptual input, selected task-relevant and suppressed task-irrelevant features elicited contrasting spatiotopic ERP effects, despite them being perceptually identical. This was also true when participants searched a memory array, suggesting that memory had retained the spatial organisation of the original perceptual input and that this representation could be modulated in a spatially specific fashion. However, task-relevant selection and task-irrelevant suppression effects were of the opposite polarity when searching remembered compared to perceived objects. We suggest that this surprising result stems from the nature of feature- and object-based representations when stored in visual short-term memory. When stored, features are integrated into objects, meaning that the spatially specific selection mechanisms must operate upon objects rather than specific feature-level representations.

  12. Technical Evaluation Report 27: Educational Wikis: Features and selection criteria

    Directory of Open Access Journals (Sweden)

    Jim Rudolph

    2004-04-01

    Full Text Available This report discusses the educational uses of the ‘wiki,’ an increasingly popular approach to online community development. Wikis are defined and compared with ‘blogging’ methods; characteristics of major wiki engines are described; and wiki features and selection criteria are examined.

  13. Variance Ranklets : Orientation-selective rank features for contrast modulations

    NARCIS (Netherlands)

    Azzopardi, George; Smeraldi, Fabrizio

    2009-01-01

    We introduce a novel type of orientation–selective rank features that are sensitive to contrast modulations (second–order stimuli). Variance Ranklets are designed in close analogy with the standard Ranklets, but use the Siegel–Tukey statistics for dispersion instead of the Wilcoxon statistics. Their

  14. Emotion of Physiological Signals Classification Based on TS Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Wang Yujing; Mo Jianlin

    2015-01-01

    This paper propose a method of TS-MLP about emotion recognition of physiological signal.It can recognize emotion successfully by Tabu search which selects features of emotion’s physiological signals and multilayer perceptron that is used to classify emotion.Simulation shows that it has achieved good emotion classification performance.

  15. Auditory-model based robust feature selection for speech recognition.

    Science.gov (United States)

    Koniaris, Christos; Kuropatwinski, Marcin; Kleijn, W Bastiaan

    2010-02-01

    It is shown that robust dimension-reduction of a feature set for speech recognition can be based on a model of the human auditory system. Whereas conventional methods optimize classification performance, the proposed method exploits knowledge implicit in the auditory periphery, inheriting its robustness. Features are selected to maximize the similarity of the Euclidean geometry of the feature domain and the perceptual domain. Recognition experiments using mel-frequency cepstral coefficients (MFCCs) confirm the effectiveness of the approach, which does not require labeled training data. For noisy data the method outperforms commonly used discriminant-analysis based dimension-reduction methods that rely on labeling. The results indicate that selecting MFCCs in their natural order results in subsets with good performance.

  16. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles

    2012-04-26

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  17. Ship Targets Discrimination Algorithm in SAR Images Based on Hu Moment Feature and Texture Feature

    Directory of Open Access Journals (Sweden)

    Liu Lei

    2016-01-01

    Full Text Available To discriminate the ship targets in SAR images, this paper proposed the method based on combination of Hu moment feature and texture feature. Firstly,7 Hu moment features should be extracted, while gray level co-occurrence matrix is then used to extract the features of mean, variance, uniformity, energy, entropy, inertia moment, correlation and differences. Finally the k-neighbour classifier was used to analysis the 15 dimensional feature vectors. The experimental results show that the method of this paper has a good effect.

  18. Context-dependent feature selection for landmine detection with ground-penetrating radar

    Science.gov (United States)

    Ratto, Christopher R.; Torrione, Peter A.; Collins, Leslie M.

    2009-05-01

    We present a novel method for improving landmine detection with ground-penetrating radar (GPR) by utilizing a priori knowledge of environmental conditions to facilitate algorithm training. The goal of Context-Dependent Feature Selection (CDFS) is to mitigate performance degradation caused by environmental factors. CDFS operates on GPR data by first identifying its environmental context, and then fuses the decisions of several classifiers trained on context-dependent subsets of features. CDFS was evaluated on GPR data collected at several distinct sites under a variety of weather conditions. Results show that using prior environmental knowledge in this fashion has the potential to improve landmine detection.

  19. A harmony search algorithm for clustering with feature selection

    Directory of Open Access Journals (Sweden)

    Carlos Cobos

    2010-01-01

    Full Text Available En este artículo se presenta un nuevo algoritmo de clustering denominado IHSK, con la capacidad de seleccionar características en un orden de complejidad lineal. El algoritmo es inspirado en la combinación de los algoritmos de búsqueda armónica y K-means. Para la selección de las características se usó el concepto de variabilidad y un método heurístico que penaliza la presencia de dimensiones con baja probabilidad de aportar en la solución actual. El algoritmo fue probado con conjuntos de datos sintéticos y reales, obteniendo resultados prometedores.

  20. A Frequent Pattern Mining Algorithm for Feature Extraction of Customer Reviews

    Directory of Open Access Journals (Sweden)

    Seyed Hamid Ghorashi

    2012-07-01

    Full Text Available Online shoppers often have different idea about the same product. They look for the product features that are consistent with their goal. Sometimes a feature might be interesting for one, while it does not make that impression for someone else. Unfortunately, identifying the target product with particular features is a tough task which is not achievable with existing functionality provided by common websites. In this paper, we present a frequent pattern mining algorithm to mine a bunch of reviews and extract product features. Our experimental results indicate that the algorithm outperforms the old pattern mining techniques used by previous researchers.

  1. On the selection of optimal feature region set for robust digital image watermarking.

    Science.gov (United States)

    Tsai, Jen-Sheng; Huang, Win-Bin; Kuo, Yau-Hwang

    2011-03-01

    A novel feature region selection method for robust digital image watermarking is proposed in this paper. This method aims to select a nonoverlapping feature region set, which has the greatest robustness against various attacks and can preserve image quality as much as possible after watermarked. It first performs a simulated attacking procedure using some predefined attacks to evaluate the robustness of every candidate feature region. According to the evaluation results, it then adopts a track-with-pruning procedure to search a minimal primary feature set which can resist the most predefined attacks. In order to enhance its resistance to undefined attacks under the constraint of preserving image quality, the primary feature set is then extended by adding into some auxiliary feature regions. This work is formulated as a multidimensional knapsack problem and solved by a genetic algorithm based approach. The experimental results for StirMark attacks on some benchmark images support our expectation that the primary feature set can resist all the predefined attacks and its extension can enhance the robustness against undefined attacks. Comparing with some well-known feature-based methods, the proposed method exhibits better performance in robust digital watermarking.

  2. Regular Network Class Features Enhancement Using an Evolutionary Synthesis Algorithm

    Directory of Open Access Journals (Sweden)

    O. G. Monahov

    2014-01-01

    Full Text Available This paper investigates a solution of the optimization problem concerning the construction of diameter-optimal regular networks (graphs. Regular networks are of practical interest as the graph-theoretical models of reliable communication networks of parallel supercomputer systems, as a basis of the structure in a model of small world in optical and neural networks. It presents a new class of parametrically described regular networks - hypercirculant networks (graphs. An approach that uses evolutionary algorithms for the automatic generation of parametric descriptions of optimal hypercirculant networks is developed. Synthesis of optimal hypercirculant networks is based on the optimal circulant networks with smaller degree of nodes. To construct optimal hypercirculant networks is used a template of circulant network from the known optimal families of circulant networks with desired number of nodes and with smaller degree of nodes. Thus, a generating set of the circulant network is used as a generating subset of the hypercirculant network, and the missing generators are synthesized by means of the evolutionary algorithm, which is carrying out minimization of diameter (average diameter of networks. A comparative analysis of the structural characteristics of hypercirculant, toroidal, and circulant networks is conducted. The advantage hypercirculant networks under such structural characteristics, as diameter, average diameter, and the width of bisection, with comparable costs of the number of nodes and the number of connections is demonstrated. It should be noted the advantage of hypercirculant networks of dimension three over four higher-dimensional tori. Thus, the optimization of hypercirculant networks of dimension three is more efficient than the introduction of an additional dimension for the corresponding toroidal structures. The paper also notes the best structural parameters of hypercirculant networks in comparison with iBT-networks previously

  3. GPU-ASIFT : A Fast Fully Affine-Invariant Feature Extraction Algorithm

    NARCIS (Netherlands)

    Codreanu, Valeriu; Dong, Feng; Liu, Baoquan; Roerdink, Jos B.T.M.; Williams, David; Yang, Po; Yasar, Burhan

    2013-01-01

    This paper presents a method that takes advantage of powerful graphics hardware to obtain fully affine-invariant image feature detection and matching. The chosen approach is the accurate, but also very computationally expensive, ASIFT algorithm. We have created a CUDA version of this algorithm that

  4. Development of a Fingerprint Gender Classification Algorithm Using Fingerprint Global Features

    OpenAIRE

    S. F. Abdullah; A.F.N.A. Rahman; Z.A.Abas; W.H.M Saad

    2016-01-01

    In forensic world, the process of identifying and calculating the fingerprint features is complex and take time when it is done manually using fingerprint laboratories magnifying glass. This study is meant to enhance the forensic manual method by proposing a new algorithm for fingerprint global feature extraction for gender classification. The result shows that the new algorithm gives higher acceptable readings which is above 70% of classification rate when it is compared to the manual method...

  5. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    Directory of Open Access Journals (Sweden)

    Hossam M Zawbaa

    Full Text Available Poly-lactide-co-glycolide (PLGA is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP, multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR. The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  6. Modeling neuron selectivity over simple midlevel features for image classification.

    Science.gov (United States)

    Shu Kong; Zhuolin Jiang; Qiang Yang

    2015-08-01

    We now know that good mid-level features can greatly enhance the performance of image classification, but how to efficiently learn the image features is still an open question. In this paper, we present an efficient unsupervised midlevel feature learning approach (MidFea), which only involves simple operations, such as k-means clustering, convolution, pooling, vector quantization, and random projection. We show this simple feature can also achieve good performance in traditional classification task. To further boost the performance, we model the neuron selectivity (NS) principle by building an additional layer over the midlevel features prior to the classifier. The NS-layer learns category-specific neurons in a supervised manner with both bottom-up inference and top-down analysis, and thus supports fast inference for a query image. Through extensive experiments, we demonstrate that this higher level NS-layer notably improves the classification accuracy with our simple MidFea, achieving comparable performances for face recognition, gender classification, age estimation, and object categorization. In particular, our approach runs faster in inference by an order of magnitude than sparse coding-based feature learning methods. As a conclusion, we argue that not only do carefully learned features (MidFea) bring improved performance, but also a sophisticated mechanism (NS-layer) at higher level boosts the performance further.

  7. A self region based real-valued negative selection algorithm

    Institute of Scientific and Technical Information of China (English)

    ZHANG Feng-bin; WANG Da-wei; WANG Sheng-wen

    2008-01-01

    Point-wise negative selection algorithms, which generate their detector sets based on point of self da-ta, have lower training efficiency and detection rate. To solve this problem, a self region based real-valued neg-ative selection algorithm is presented. In this new approach, the continuous self region is defined by the collec-tion of self data, the partial training takes place at the training stage according to both the radius of self region and the cosine distance between gravity of the self region and detector candidate, and variable detectors in the self region are deployed. The algorithm is tested using the triangle shape of self region in the 2-D complement space and KDD CUP 1999 data set. Results show that, more information can be provided when the training self points are used together as a whole, and compared with the point-wise negative selection algorithm, the new ap-proach can improve the training efficiency of system and the detection rate significantly.

  8. Naturally selecting solutions: the use of genetic algorithms in bioinformatics.

    Science.gov (United States)

    Manning, Timmy; Sleator, Roy D; Walsh, Paul

    2013-01-01

    For decades, computer scientists have looked to nature for biologically inspired solutions to computational problems; ranging from robotic control to scheduling optimization. Paradoxically, as we move deeper into the post-genomics era, the reverse is occurring, as biologists and bioinformaticians look to computational techniques, to solve a variety of biological problems. One of the most common biologically inspired techniques are genetic algorithms (GAs), which take the Darwinian concept of natural selection as the driving force behind systems for solving real world problems, including those in the bioinformatics domain. Herein, we provide an overview of genetic algorithms and survey some of the most recent applications of this approach to bioinformatics based problems.

  9. Core Business Selection Based on Ant Colony Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Yu Lan

    2014-01-01

    Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.

  10. A Heuristic Algorithm for optimizing Page Selection Instructions

    CERN Document Server

    Li, Qing'an; Chen, Yong; Wu, Wei; Xu, Wenwen

    2010-01-01

    Page switching is a technique that increases the memory in microcontrollers without extending the address buses. This technique is widely used in the design of 8-bit MCUs. In this paper, we present an algorithm to reduce the overhead of page switching. To pursue small code size, we place the emphasis on the allocation of functions into suitable pages with a heuristic algorithm, thereby the cost-effective placement of page selection instructions. Our experimental results showed the optimization achieved a reduction in code size of 13.2 percent.

  11. Economic indicators selection for crime rates forecasting using cooperative feature selection

    Science.gov (United States)

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Salleh Sallehuddin, Roselina

    2013-04-01

    Features selection in multivariate forecasting model is very important to ensure that the model is accurate. The purpose of this study is to apply the Cooperative Feature Selection method for features selection. The features are economic indicators that will be used in crime rate forecasting model. The Cooperative Feature Selection combines grey relational analysis and artificial neural network to establish a cooperative model that can rank and select the significant economic indicators. Grey relational analysis is used to select the best data series to represent each economic indicator and is also used to rank the economic indicators according to its importance to the crime rate. After that, the artificial neural network is used to select the significant economic indicators for forecasting the crime rates. In this study, we used economic indicators of unemployment rate, consumer price index, gross domestic product and consumer sentiment index, as well as data rates of property crime and violent crime for the United States. Levenberg-Marquardt neural network is used in this study. From our experiments, we found that consumer price index is an important economic indicator that has a significant influence on the violent crime rate. While for property crime rate, the gross domestic product, unemployment rate and consumer price index are the influential economic indicators. The Cooperative Feature Selection is also found to produce smaller errors as compared to Multiple Linear Regression in forecasting property and violent crime rates.

  12. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection

    Directory of Open Access Journals (Sweden)

    Abdelniser Moomen

    2016-04-01

    Full Text Available Nondestructive Testing (NDT assessment of materials’ health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms.

  13. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection.

    Science.gov (United States)

    Moomen, Abdelniser; Ali, Abdulbaset; Ramahi, Omar M

    2016-04-19

    Nondestructive Testing (NDT) assessment of materials' health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies) are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms.

  14. Effective feature selection of clinical and genetic to predict warfarin dose using artificial neural network

    Directory of Open Access Journals (Sweden)

    Mohammad Karim Sohrabi

    2016-03-01

    Full Text Available Background: Warfarin is one of the most common oral anticoagulant, which role is to prevent the clots. The dose of this medicine is very important because changes can be dangerous for patients. Diagnosis is difficult for physicians because increase and decrease in use of warfarin is so dangerous for patients. Identifying the clinical and genetic features involved in determining dose could be useful to predict using data mining techniques. The aim of this paper is to provide a convenient way to select the clinical and genetic features to determine the dose of warfarin using artificial neural networks (ANN and evaluate it in order to predict the dose patients. Methods: This experimental study, was investigate from April to May 2014 on 552 patients in Tehran Heart Center Hospital (THC candidates for warfarin anticoagulant therapy within the international normalized ratio (INR therapeutic target. Factors affecting the dose include clinical characteristics and genetic extracted, and different methods of feature selection based on genetic algorithm and particle swarm optimization (PSO and evaluation function neural networks in MATLAB (MathWorks, MA, USA, were performed. Results: Between algorithms used, particle swarm optimization algorithm accuracy was more appropriate, for the mean square error (MSE, root mean square error (RMSE and mean absolute error (MAE were 0.0262, 0.1621 and 0.1164, respectively. Conclusion: In this article, the most important characteristics were identified using methods of feature selection and the stable dose had been predicted based on artificial neural networks. The output is acceptable and with less features, it is possible to achieve the prediction warfarin dose accurately. Since the prescribed dose for the patients is important, the output of the obtained model can be used as a decision support system.

  15. Context-dependent feature selection using unsupervised contexts applied to GPR-based landmine detection

    Science.gov (United States)

    Ratto, Christopher R.; Torrione, Peter A.; Collins, Leslie M.

    2010-04-01

    Context-dependent classification techniques applied to landmine detection with ground-penetrating radar (GPR) have demonstrated substantial performance improvements over conventional classification algorithms. Context-dependent algorithms compute a decision statistic by integrating over uncertainty in the unknown, but probabilistically inferable, context of the observation. When applied to GPR, contexts may be defined by differences in electromagnetic properties of the subsurface environment, which are due to discrepancies in soil composition, moisture levels, and surface texture. Context-dependent Feature Selection (CDFS) is a technique developed for selecting a unique subset of features for classifying landmines from clutter in different environmental contexts. In past work, context definitions were assumed to be soil moisture conditions which were known during training. However, knowledge of environmental conditions could be difficult to obtain in the field. In this paper, we utilize an unsupervised learning algorithm for defining contexts which are unknown a priori. Our method performs unsupervised context identification based on similarities in physics-based and statistical features that characterize the subsurface environment of the raw GPR data. Results indicate that utilizing this contextual information improves classification performance, and provides performance improvements over non-context-dependent approaches. Implications for on-line context identification will be suggested as a possible avenue for future work.

  16. Cuckoo search optimisation for feature selection in cancer classification: a new approach.

    Science.gov (United States)

    Gunavathi, C; Premalatha, K

    2015-01-01

    Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.

  17. Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection

    Directory of Open Access Journals (Sweden)

    Xiuquan Du

    2014-01-01

    Full Text Available Identifying cancer-associated mutations (driver mutations is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method.

  18. Feature Selection Applying Statistical and Neurofuzzy Methods to EEG-Based BCI.

    Science.gov (United States)

    Martinez-Leon, Juan-Antonio; Cano-Izquierdo, Jose-Manuel; Ibarrola, Julio

    2015-01-01

    This paper presents an investigation aimed at drastically reducing the processing burden required by motor imagery brain-computer interface (BCI) systems based on electroencephalography (EEG). In this research, the focus has moved from the channel to the feature paradigm, and a 96% reduction of the number of features required in the process has been achieved maintaining and even improving the classification success rate. This way, it is possible to build cheaper, quicker, and more portable BCI systems. The data set used was provided within the framework of BCI Competition III, which allows it to compare the presented results with the classification accuracy achieved in the contest. Furthermore, a new three-step methodology has been developed which includes a feature discriminant character calculation stage; a score, order, and selection phase; and a final feature selection step. For the first stage, both statistics method and fuzzy criteria are used. The fuzzy criteria are based on the S-dFasArt classification algorithm which has shown excellent performance in previous papers undertaking the BCI multiclass motor imagery problem. The score, order, and selection stage is used to sort the features according to their discriminant nature. Finally, both order selection and Group Method Data Handling (GMDH) approaches are used to choose the most discriminant ones.

  19. Texture feature selection with relevance learning to classify interstitial lung disease patterns

    Science.gov (United States)

    Huber, Markus B.; Bunte, Kerstin; Nagarajan, Mahesh B.; Biehl, Michael; Ray, Lawrence A.; Wismueller, Axel

    2011-03-01

    The Generalized Matrix Learning Vector Quantization (GMLVQ) is used to estimate the relevance of texture features in their ability to classify interstitial lung disease patterns in high-resolution computed tomography (HRCT) images. After a stochastic gradient descent, the GMLVQ algorithm provides a discriminative distance measure of relevance factors, which can account for pairwise correlations between different texture features and their importance for the classification of healthy and diseased patterns. Texture features were extracted from gray-level co-occurrence matrices (GLCMs), and were ranked and selected according to their relevance obtained by GMLVQ and, for comparison, to a mutual information (MI) criteria. A k-nearest-neighbor (kNN) classifier and a Support Vector Machine with a radial basis function kernel (SVMrbf) were optimized in a 10-fold crossvalidation for different texture feature sets. In our experiment with real-world data, the feature sets selected by the GMLVQ approach had a significantly better classification performance compared with feature sets selected by a MI ranking.

  20. Energy Optimized Link Selection Algorithm for Mobile Cloud Computing

    Directory of Open Access Journals (Sweden)

    K.Ravindranath

    2015-03-01

    Full Text Available Mobile cloud computing is the revolutionary distributed computing research area which consists of three different domains: cloud computing, wireless networks and mobile computing targeting to improve the task computational capabilities of the mobile devices in order to minimize the energy consumption. Heavy computations can be offloaded to the cloud to decrease energy consumption for the mobile device. In some mobile cloud applications, it has been more energy inefficient to use the cloud compared to the conventional computing conducted in the local device. Despite mobile cloud computing being a reliable idea, still faces several problems for mobile phones such as storage, short battery life and so on. One of the most important concerns for mobile devices is low energy consumption. Different network links has different bandwidths to uplink and downlink task as well as data transmission from mobile to cloud or vice-versa. In this paper, a novel optimal link selection algorithm is proposed to minimize the mobile energy. In the first phase, all available networks are scanned and then signal strength is calculated. All the calculated signals along with network locations are given input to the optimal link selection algorithm. After the execution of link selection algorithm, an optimal network link is selected.

  1. Textural feature selection for enhanced detection of stationary humans in through-the-wall radar imagery

    Science.gov (United States)

    Chaddad, A.; Ahmad, F.; Amin, M. G.; Sevigny, P.; DiFilippo, D.

    2014-05-01

    Feature-based methods have been recently considered in the literature for detection of stationary human targets in through-the-wall radar imagery. Specifically, textural features, such as contrast, correlation, energy, entropy, and homogeneity, have been extracted from gray-level co-occurrence matrices (GLCMs) to aid in discriminating the true targets from multipath ghosts and clutter that closely mimic the target in size and intensity. In this paper, we address the task of feature selection to identify the relevant subset of features in the GLCM domain, while discarding those that are either redundant or confusing, thereby improving the performance of feature-based scheme to distinguish between targets and ghosts/clutter. We apply a Decision Tree algorithm to find the optimal combination of co-occurrence based textural features for the problem at hand. We employ a K-Nearest Neighbor classifier to evaluate the performance of the optimal textural feature based scheme in terms of its target and ghost/clutter discrimination capability and use real-data collected with the vehicle-borne multi-channel through-the-wall radar imaging system by Defence Research and Development Canada. For the specific data analyzed, it is shown that the identified dominant features yield a higher classification accuracy, with lower number of false alarms and missed detections, compared to the full GLCM based feature set.

  2. Dynamical transitions in the evolution of learning algorithms by selection

    CERN Document Server

    Neirotti, J P; Neirotti, Juan Pablo; Caticha, Nestor

    2002-01-01

    We study the evolution of artificial learning systems by means of selection. Genetic programming is used to generate a sequence of populations of algorithms which can be used by neural networks for supervised learning of a rule that generates examples. In opposition to concentrating on final results, which would be the natural aim while designing good learning algorithms, we study the evolution process and pay particular attention to the temporal order of appearance of functional structures responsible for the improvements in the learning process, as measured by the generalization capabilities of the resulting algorithms. The effect of such appearances can be described as dynamical phase transitions. The concepts of phenotypic and genotypic entropies, which serve to describe the distribution of fitness in the population and the distribution of symbols respectively, are used to monitor the dynamics. In different runs the phase transitions might be present or not, with the system finding out good solutions, or ...

  3. Feature Extraction and Selection From the Perspective of Explosive Detection

    Energy Technology Data Exchange (ETDEWEB)

    Sengupta, S K

    2009-09-01

    ) digitized 3-dimensional attenuation images with a voxel resolution of the order of one quarter of a milimeter. In the task of feature extraction and subsequent selection of an appropriate subset thereof, several important factors need to be considered. Foremost among them are: (1) Definition of the sampling unit from which the features will be extracted for the purpose of detection/ identification of the explosives. (2) The choice of features ( given the sampling unit) to be extracted that can be used to signal the existence / identity of the explosive. (3) Robustness of the computed features under different inspection conditions. To attain robustness, invariance under the transformations of translation, scaling, rotation and change of orientation is highly desirable. (4) The computational costs in the process of feature extraction, selection and their use in explosive detection/ identification In the search for extractable features, we have done a thorough literature survey with the above factors in mind and come out with a list of features that could possibly help us in meeting our objective. We are assuming that features will be based on sampling units that are single CT slices of the target. This may however change when appropriate modifications should be made to the feature extraction process. We indicate below some of the major types of features in 2- or 3-dimensional images that have been used in the literature on application of pattern recognition (PR) techniques in image understanding and are possibly pertinent to our study. In the following paragraph, we briefly indicate the motivation that guided us in the choice of these features, and identify the nature of the constraints. The principal feature types derivable from an image will be discussed in section 2. Once the features are extracted, one must select a subset of this feature set that will retain the most useful information and remove any redundant and irrelevant information that may have a detrimental effect

  4. Feature Selection for Generator Excitation Neurocontroller Development Using Filter Technique

    Directory of Open Access Journals (Sweden)

    Abdul Ghani Abro

    2011-09-01

    Full Text Available Essentially, motive behind using control system is to generate suitable control signal for yielding desired response of a physical process. Control of synchronous generator has always remained very critical in power system operation and control. For certain well known reasons power generators are normally operated well below their steady state stability limit. This raises demand for efficient and fast controllers. Artificial intelligence has been reported to give revolutionary outcomes in the field of control engineering. Artificial Neural Network (ANN, a branch of artificial intelligence has been used for nonlinear and adaptive control, utilizing its inherent observability. The overall performance of neurocontroller is dependent upon input features too. Selecting optimum features to train a neurocontroller optimally is very critical. Both quality and size of data are of equal importance for better performance. In this work filter technique is employed to select independent factors for ANN training.

  5. Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

    CERN Document Server

    Donalek, Ciro; Djorgovski, S G; Mahabal, Ashish A; Graham, Matthew J; Fuchs, Thomas J; Turmon, Michael J; Philip, N Sajeeth; Yang, Michael Ting-Chang; Longo, Giuseppe

    2013-01-01

    The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection is quickly becoming a crucial task in analyzing astronomical data sets. Using data sets extracted from the ongoing Catalina Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a variety of feature selection strategies used to identify the subsets that give the most information and the results achieved applying these techniques to three major astronomical problems.

  6. Acute Exercise Modulates Feature-selective Responses in Human Cortex.

    Science.gov (United States)

    Bullock, Tom; Elliott, James C; Serences, John T; Giesbrecht, Barry

    2017-04-01

    An organism's current behavioral state influences ongoing brain activity. Nonhuman mammalian and invertebrate brains exhibit large increases in the gain of feature-selective neural responses in sensory cortex during locomotion, suggesting that the visual system becomes more sensitive when actively exploring the environment. This raises the possibility that human vision is also more sensitive during active movement. To investigate this possibility, we used an inverted encoding model technique to estimate feature-selective neural response profiles from EEG data acquired from participants performing an orientation discrimination task. Participants (n = 18) fixated at the center of a flickering (15 Hz) circular grating presented at one of nine different orientations and monitored for a brief shift in orientation that occurred on every trial. Participants completed the task while seated on a stationary exercise bike at rest and during low- and high-intensity cycling. We found evidence for inverted-U effects; such that the peak of the reconstructed feature-selective tuning profiles was highest during low-intensity exercise compared with those estimated during rest and high-intensity exercise. When modeled, these effects were driven by changes in the gain of the tuning curve and in the profile bandwidth during low-intensity exercise relative to rest. Thus, despite profound differences in visual pathways across species, these data show that sensitivity in human visual cortex is also enhanced during locomotive behavior. Our results reveal the nature of exercise-induced gain on feature-selective coding in human sensory cortex and provide valuable evidence linking the neural mechanisms of behavior state across species.

  7. A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification.

    Science.gov (United States)

    Gillies, Christopher E; Siadat, Mohammad-Reza; Patel, Nilesh V; Wilson, George D

    2013-12-01

    Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by δ, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL(G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL(G) ≥ 0.696; (2) it decreases classification accuracy when BCAL(G) ≤ 0.389; (3) it provides marginal accuracy improvement when 0.389genes in a biological condition increases beyond 50 and

  8. Multiple ant colony algorithm method for selecting tag SNPs.

    Science.gov (United States)

    Liao, Bo; Li, Xiong; Zhu, Wen; Li, Renfa; Wang, Shulin

    2012-10-01

    The search for the association between complex disease and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. Finding a set of tag SNPs for haplotyping in a great number of samples is an important step to reduce cost for association study. Therefore, it is essential to select tag SNPs with more efficient algorithms. In this paper, we model problem of selection tag SNPs by MINIMUM TEST SET and use multiple ant colony algorithm (MACA) to search a smaller set of tag SNPs for haplotyping. The various experimental results on various datasets show that the running time of our method is less than GTagger and MLR. And MACA can find the most representative SNPs for haplotyping, so that MACA is more stable and the number of tag SNPs is also smaller than other evolutionary methods (like GTagger and NSGA-II). Our software is available upon request to the corresponding author.

  9. Effective automated feature construction and selection for classification of biological sequences.

    Directory of Open Access Journals (Sweden)

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  10. Processing of Feature Selectivity in Cortical Networks with Specific Connectivity.

    Directory of Open Access Journals (Sweden)

    Sadra Sadeh

    Full Text Available Although non-specific at the onset of eye opening, networks in rodent visual cortex attain a non-random structure after eye opening, with a specific bias for connections between neurons of similar preferred orientations. As orientation selectivity is already present at eye opening, it remains unclear how this specificity in network wiring contributes to feature selectivity. Using large-scale inhibition-dominated spiking networks as a model, we show that feature-specific connectivity leads to a linear amplification of feedforward tuning, consistent with recent electrophysiological single-neuron recordings in rodent neocortex. Our results show that optimal amplification is achieved at an intermediate regime of specific connectivity. In this configuration a moderate increase of pairwise correlations is observed, consistent with recent experimental findings. Furthermore, we observed that feature-specific connectivity leads to the emergence of orientation-selective reverberating activity, and entails pattern completion in network responses. Our theoretical analysis provides a mechanistic understanding of subnetworks' responses to visual stimuli, and casts light on the regime of operation of sensory cortices in the presence of specific connectivity.

  11. STATISTICAL PROBABILITY BASED ALGORITHM FOR EXTRACTING FEATURE POINTS IN 2-DIMENSIONAL IMAGE

    Institute of Scientific and Technical Information of China (English)

    Guan Yepeng; Gu Weikang; Ye Xiuqing; Liu Jilin

    2004-01-01

    An algorithm for automatically extracting feature points is developed after the area of feature points in 2-dimensional (2D) imagebeing located by probability theory, correlated methods and criterion for abnormity. Feature points in 2D image can be extracted only by calculating standard deviation of gray within sampled pixels area in our approach statically. While extracting feature points, the limitation to confirm threshold by tentative method according to some a priori information on processing image can be avoided. It is proved that the proposed algorithm is valid and reliable by extracting feature points on actual natural images with abundant and weak texture, including multi-object with complex background, respectively. It can meet the demand of extracting feature points of 2D image automatically in machine vision system.

  12. Self-adaptive MOEA feature selection for classification of bankruptcy prediction data.

    Science.gov (United States)

    Gaspar-Cunha, A; Recio, G; Costa, L; Estébanez, C

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.

  13. Self-Adaptive MOEA Feature Selection for Classification of Bankruptcy Prediction Data

    Directory of Open Access Journals (Sweden)

    A. Gaspar-Cunha

    2014-01-01

    Full Text Available Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy. The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.

  14. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data

    Directory of Open Access Journals (Sweden)

    Rabia Aziz

    2016-06-01

    Full Text Available Feature (gene selection and classification of microarray data are the two most interesting machine learning challenges. In the present work two existing feature selection/extraction algorithms, namely independent component analysis (ICA and fuzzy backward feature elimination (FBFE are used which is a new combination of selection/extraction. The main objective of this paper is to select the independent components of the DNA microarray data using FBFE to improve the performance of support vector machine (SVM and Naïve Bayes (NB classifier, while making the computational expenses affordable. To show the validity of the proposed method, it is applied to reduce the number of genes for five DNA microarray datasets namely; colon cancer, acute leukemia, prostate cancer, lung cancer II, and high-grade glioma. Now these datasets are then classified using SVM and NB classifiers. Experimental results on these five microarray datasets demonstrate that gene selected by proposed approach, effectively improve the performance of SVM and NB classifiers in terms of classification accuracy. We compare our proposed method with principal component analysis (PCA as a standard extraction algorithm and find that the proposed method can obtain better classification accuracy, using SVM and NB classifiers with a smaller number of selected genes than the PCA. The curve between the average error rate and number of genes with each dataset represents the selection of required number of genes for the highest accuracy with our proposed method for both the classifiers. ROC shows best subset of genes for both the classifier of different datasets with propose method.

  15. Core Business Selection Based on Ant Colony Clustering Algorithm

    OpenAIRE

    Yu Lan; Yan Bo; Yao Baozhen

    2014-01-01

    Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant c...

  16. Local anesthesia selection algorithm in patients with concomitant somatic diseases.

    Science.gov (United States)

    Anisimova, E N; Sokhov, S T; Letunova, N Yu; Orekhova, I V; Gromovik, M V; Erilin, E A; Ryazantsev, N A

    2016-01-01

    The paper presents basic principles of local anesthesia selection in patients with concomitant somatic diseases. These principles are history taking; analysis of drugs interaction with local anesthetic and sedation agents; determination of the functional status of the patient; patient anxiety correction; dental care with monitoring of hemodynamics parameters. It was found that adhering to this algorithm promotes prevention of urgent conditions in patients in outpatient dentistry.

  17. WEB SERVICE SELECTION ALGORITHM BASED ON PRINCIPAL COMPONENT ANALYSIS

    Institute of Scientific and Technical Information of China (English)

    Kang Guosheng; Liu Jianxun; Tang Mingdong; Cao Buqing

    2013-01-01

    Existing Web service selection approaches usually assume that preferences of users have been provided in a quantitative form by users.However,due to the subjectivity and vagueness of preferences,it may be impractical for users to specify quantitative and exact preferences.Moreover,due to that Quality of Service (QoS) attributes are often interrelated,existing Web service selection approaches which employ weighted summation of QoS attribute values to compute the overall QoS of Web services may produce inaccurate results,since they do not take correlations among QoS attributes into account.To resolve these problems,a Web service selection framework considering user's preference priority is proposed,which incorporates a searching mechanism with QoS range setting to identify services satisfying the user's QoS constraints.With the identified service candidates,based on the idea of Principal Component Analysis (PCA),an algorithm of Web service selection named PCAoWSS (Web Service Selection based on PCA) is proposed,which can eliminate the correlations among QoS attributes and compute the overall QoS of Web services accurately.After computing the overall QoS for each service,the algorithm ranks the Web service candidates based on their overall QoS and recommends services with top QoS values to users.Finally,the effectiveness and feasibility of our approach are validated by experiments,i.e.the selected Web service by our approach is given high average evaluation than other ones by users and the time cost of PCA-WSS algorithm is not affected acutely by the number of service candidates.

  18. Feature Selection by Merging Sequential Bidirectional Search into Relevance Vector Machine in Condition Monitoring

    Institute of Scientific and Technical Information of China (English)

    ZHANG Kui; DONG Yu; BALL Andrew

    2015-01-01

    For more accurate fault detection and diagnosis, there is an increasing trend to use a large number of sensors and to collect data at high frequency. This inevitably produces large-scale data and causes difficulties in fault classification. Actually, the classification methods are simply intractable when applied to high-dimensional condition monitoring data. In order to solve the problem, engineers have to resort to complicated feature extraction methods to reduce the dimensionality of data. However, the features transformed by the methods cannot be understood by the engineers due to a loss of the original engineering meaning. In this paper, other forms of dimensionality reduction technique(feature selection methods) are employed to identify machinery condition, based only on frequency spectrum data. Feature selection methods are usually divided into three main types: filter, wrapper and embedded methods. Most studies are mainly focused on the first two types, whilst the development and application of the embedded feature selection methods are very limited. This paper attempts to explore a novel embedded method. The method is formed by merging a sequential bidirectional search algorithm into scale parameters tuning within a kernel function in the relevance vector machine. To demonstrate the potential for applying the method to machinery fault diagnosis, the method is implemented to rolling bearing experimental data. The results obtained by using the method are consistent with the theoretical interpretation, proving that this algorithm has important engineering significance in revealing the correlation between the faults and relevant frequency features. The proposed method is a theoretical extension of relevance vector machine, and provides an effective solution to detect the fault-related frequency components with high efficiency.

  19. Feature selection by merging sequential bidirectional search into relevance vector machine in condition monitoring

    Science.gov (United States)

    Zhang, Kui; Dong, Yu; Ball, Andrew

    2015-11-01

    For more accurate fault detection and diagnosis, there is an increasing trend to use a large number of sensors and to collect data at high frequency. This inevitably produces large-scale data and causes difficulties in fault classification. Actually, the classification methods are simply intractable when applied to high-dimensional condition monitoring data. In order to solve the problem, engineers have to resort to complicated feature extraction methods to reduce the dimensionality of data. However, the features transformed by the methods cannot be understood by the engineers due to a loss of the original engineering meaning. In this paper, other forms of dimensionality reduction technique(feature selection methods) are employed to identify machinery condition, based only on frequency spectrum data. Feature selection methods are usually divided into three main types: filter, wrapper and embedded methods. Most studies are mainly focused on the first two types, whilst the development and application of the embedded feature selection methods are very limited. This paper attempts to explore a novel embedded method. The method is formed by merging a sequential bidirectional search algorithm into scale parameters tuning within a kernel function in the relevance vector machine. To demonstrate the potential for applying the method to machinery fault diagnosis, the method is implemented to rolling bearing experimental data. The results obtained by using the method are consistent with the theoretical interpretation, proving that this algorithm has important engineering significance in revealing the correlation between the faults and relevant frequency features. The proposed method is a theoretical extension of relevance vector machine, and provides an effective solution to detect the fault-related frequency components with high efficiency.

  20. Automated EEG artifact elimination by applying machine learning algorithms to ICA-based features

    Science.gov (United States)

    Radüntz, Thea; Scouten, Jon; Hochmuth, Olaf; Meffert, Beate

    2017-08-01

    Objective. Biological and non-biological artifacts cause severe problems when dealing with electroencephalogram (EEG) recordings. Independent component analysis (ICA) is a widely used method for eliminating various artifacts from recordings. However, evaluating and classifying the calculated independent components (IC) as artifact or EEG is not fully automated at present. Approach. In this study, we propose a new approach for automated artifact elimination, which applies machine learning algorithms to ICA-based features. Main results. We compared the performance of our classifiers with the visual classification results given by experts. The best result with an accuracy rate of 95% was achieved using features obtained by range filtering of the topoplots and IC power spectra combined with an artificial neural network. Significance. Compared with the existing automated solutions, our proposed method is not limited to specific types of artifacts, electrode configurations, or number of EEG channels. The main advantages of the proposed method is that it provides an automatic, reliable, real-time capable, and practical tool, which avoids the need for the time-consuming manual selection of ICs during artifact removal.

  1. Feature selection and multi-kernel learning for adaptive graph regularized nonnegative matrix factorization

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-09-20

    Nonnegative matrix factorization (NMF), a popular part-based representation technique, does not capture the intrinsic local geometric structure of the data space. Graph regularized NMF (GNMF) was recently proposed to avoid this limitation by regularizing NMF with a nearest neighbor graph constructed from the input data set. However, GNMF has two main bottlenecks. First, using the original feature space directly to construct the graph is not necessarily optimal because of the noisy and irrelevant features and nonlinear distributions of data samples. Second, one possible way to handle the nonlinear distribution of data samples is by kernel embedding. However, it is often difficult to choose the most suitable kernel. To solve these bottlenecks, we propose two novel graph-regularized NMF methods, AGNMFFS and AGNMFMK, by introducing feature selection and multiple-kernel learning to the graph regularized NMF, respectively. Instead of using a fixed graph as in GNMF, the two proposed methods learn the nearest neighbor graph that is adaptive to the selected features and learned multiple kernels, respectively. For each method, we propose a unified objective function to conduct feature selection/multi-kernel learning, NMF and adaptive graph regularization simultaneously. We further develop two iterative algorithms to solve the two optimization problems. Experimental results on two challenging pattern classification tasks demonstrate that the proposed methods significantly outperform state-of-the-art data representation methods.

  2. Research into a Feature Selection Method for Hyperspectral Imagery Using PSO and SVM

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Classification and recognition of hyperspectral remote sensing images is not the same as that of conventional multi-spectral remote sensing images.We propose, a novel feature selection and classification method for hyperspectral images by combining the global optimization ability of particle swarm optimization (PSO) algorithm and the superior classification performance of a support vector machine (SVM).Global optimal search performance of PSO is improved by using a chaotic optimization search technique.Granularity based grid search strategy is used to optimize the SVM model parameters.Parameter optimization and classification of the SVM are addressed using the training date corresponding to the feature subset.A false classification rate is adopted as a fitness function.Tests of feature selection and classification are carried out on a hyperspectral data set.Classification performances are also compared among different feature extraction methods commonly used today.Results indicate that this hybrid method has a higher classification accuracy and can effectively extract optimal bands.A feasible approach is provided for feature selection and classification of hyperspectral image data.

  3. Suitable features selection for monitoring thermal condition of electrical equipment using infrared thermography

    Science.gov (United States)

    Huda, A. S. N.; Taib, S.

    2013-11-01

    Monitoring the thermal condition of electrical equipment is necessary for maintaining the reliability of electrical system. The degradation of electrical equipment can cause excessive overheating, which can lead to the eventual failure of the equipment. Additionally, failure of equipment requires a lot of maintenance cost, manpower and can also be catastrophic- causing injuries or even deaths. Therefore, the recognition processof equipment conditions as normal and defective is an essential step towards maintaining reliability and stability of the system. The study introduces infrared thermography based condition monitoring of electrical equipment. Manual analysis of thermal image for detecting defects and classifying the status of equipment take a lot of time, efforts and can also lead to incorrect diagnosis results. An intelligent system that can separate the equipment automatically could help to overcome these problems. This paper discusses an intelligent classification system for the conditions of equipment using neural networks. Three sets of features namely first order histogram based statistical, grey level co-occurrence matrix and component based intensity features are extracted by image analysis, which are used as input data for the neural networks. The multilayered perceptron networks are trained using four different training algorithms namely Resilient back propagation, Bayesian Regulazation, Levenberg-Marquardt and Scale conjugate gradient. The experimental results show that the component based intensity features perform better compared to other two sets of features. Finally, after selecting the best features, multilayered perceptron network trained using Levenberg-Marquardt algorithm achieved the best results to classify the conditions of electrical equipment.

  4. A soft computing based approach using modified selection strategy for feature reduction of medical systems.

    Science.gov (United States)

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  5. A Soft Computing Based Approach Using Modified Selection Strategy for Feature Reduction of Medical Systems

    Directory of Open Access Journals (Sweden)

    Kursat Zuhtuogullari

    2013-01-01

    Full Text Available The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  6. Inference for feature selection using the Lasso with high-dimensional data

    DEFF Research Database (Denmark)

    Brink-Jensen, Kasper; Ekstrøm, Claus Thorn

    2014-01-01

    Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These methods identify and rank variables of importance but do...... not generally provide any inference of the selected variables. Thus, the variables selected might be the "most important" but need not be significant. We propose a significance test for the selection found by the Lasso. We introduce a procedure that computes inference and p-values for features chosen...... by the Lasso. This method rephrases the null hypothesis and uses a randomization approach which ensures that the error rate is controlled even for small samples. We demonstrate the ability of the algorithm to compute $p$-values of the expected magnitude with simulated data using a multitude of scenarios...

  7. How can selection of biologically inspired features improve the performance of a robust object recognition model?

    Directory of Open Access Journals (Sweden)

    Masoud Ghodrati

    Full Text Available Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

  8. How can selection of biologically inspired features improve the performance of a robust object recognition model?

    Science.gov (United States)

    Ghodrati, Masoud; Khaligh-Razavi, Seyed-Mahdi; Ebrahimpour, Reza; Rajaei, Karim; Pooyan, Mohammad

    2012-01-01

    Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

  9. Algorithms of control parameters selection for automation of FDM 3D printing process

    Directory of Open Access Journals (Sweden)

    Kogut Paweł

    2017-01-01

    Full Text Available The paper presents algorithms of control parameters selection of the Fused Deposition Modelling (FDM technology in case of an open printing solutions environment and 3DGence ONE printer. The following parameters were distinguished: model mesh density, material flow speed, cooling performance, retraction and printing speeds. These parameters are independent in principle printing system, but in fact to a certain degree that results from the selected printing equipment features. This is the first step for automation of the 3D printing process in FDM technology.

  10. Strain gage selection in loads equations using a genetic algorithm

    Science.gov (United States)

    1994-01-01

    Traditionally, structural loads are measured using strain gages. A loads calibration test must be done before loads can be accurately measured. In one measurement method, a series of point loads is applied to the structure, and loads equations are derived via the least squares curve fitting algorithm using the strain gage responses to the applied point loads. However, many research structures are highly instrumented with strain gages, and the number and selection of gages used in a loads equation can be problematic. This paper presents an improved technique using a genetic algorithm to choose the strain gages used in the loads equations. Also presented are a comparison of the genetic algorithm performance with the current T-value technique and a variant known as the Best Step-down technique. Examples are shown using aerospace vehicle wings of high and low aspect ratio. In addition, a significant limitation in the current methods is revealed. The genetic algorithm arrived at a comparable or superior set of gages with significantly less human effort, and could be applied in instances when the current methods could not.

  11. An ACO Algorithm for Effective Cluster Head Selection

    CERN Document Server

    Sampath, Amritha; Thampi, Sabu M; 10.4304/jait.2.1.50-56

    2011-01-01

    This paper presents an effective algorithm for selecting cluster heads in mobile ad hoc networks using ant colony optimization. A cluster in an ad hoc network consists of a cluster head and cluster members which are at one hop away from the cluster head. The cluster head allocates the resources to its cluster members. Clustering in MANET is done to reduce the communication overhead and thereby increase the network performance. A MANET can have many clusters in it. This paper presents an algorithm which is a combination of the four main clustering schemes- the ID based clustering, connectivity based, probability based and the weighted approach. An Ant colony optimization based approach is used to minimize the number of clusters in MANET. This can also be considered as a minimum dominating set problem in graph theory. The algorithm considers various parameters like the number of nodes, the transmission range etc. Experimental results show that the proposed algorithm is an effective methodology for finding out t...

  12. A convergent mean shift algorithm to select targets for LAMOST

    Institute of Scientific and Technical Information of China (English)

    Guang-Wei Li; Gang Zhao

    2009-01-01

    This paper firstly finds that the Mean Shift Algorithm used by the Observation Control System (OCS) Research Group of the University of Science and Technology of China in Survey Strategy System 2.10 (SSS2.10) to select targets for the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) is not convergent in theory. By carefully studying the mathematical formulation of the Mean Shift Algorithm, we find that it tries to find a point where some objective function achieves its maximum value; the Mean Shift Vector can be regarded as the ascension direction for the objective function. If we regard the objective function as the numerical description for the imaging quality of all targets covered by the focal panel, then the Mean Shift Algorithm can find the place where the imaging quality is the best. So, the problem of selecting targets is equal to the problem of finding the place where the imaging quality is the best. In addition, we also give some effective heuristics to improve computational speed and propose an effective method to assign point sources to the respective fibers. As a result, our program runs fast, and it costs only several seconds to generate an observation.

  13. Automatic feature learning using multichannel ROI based on deep structured algorithms for computerized lung cancer diagnosis.

    Science.gov (United States)

    Sun, Wenqing; Zheng, Bin; Qian, Wei

    2017-04-13

    This study aimed to analyze the ability of extracting automatically generated features using deep structured algorithms in lung nodule CT image diagnosis, and compare its performance with traditional computer aided diagnosis (CADx) systems using hand-crafted features. All of the 1018 cases were acquired from Lung Image Database Consortium (LIDC) public lung cancer database. The nodules were segmented according to four radiologists' markings, and 13,668 samples were generated by rotating every slice of nodule images. Three multichannel ROI based deep structured algorithms were designed and implemented in this study: convolutional neural network (CNN), deep belief network (DBN), and stacked denoising autoencoder (SDAE). For the comparison purpose, we also implemented a CADx system using hand-crafted features including density features, texture features and morphological features. The performance of every scheme was evaluated by using a 10-fold cross-validation method and an assessment index of the area under the receiver operating characteristic curve (AUC). The observed highest area under the curve (AUC) was 0.899±0.018 achieved by CNN, which was significantly higher than traditional CADx with the AUC=0.848±0.026. The results from DBN was also slightly higher than CADx, while SDAE was slightly lower. By visualizing the automatic generated features, we found some meaningful detectors like curvy stroke detectors from deep structured schemes. The study results showed the deep structured algorithms with automatically generated features can achieve desirable performance in lung nodule diagnosis. With well-tuned parameters and large enough dataset, the deep learning algorithms can have better performance than current popular CADx. We believe the deep learning algorithms with similar data preprocessing procedure can be used in other medical image analysis areas as well. Copyright © 2017. Published by Elsevier Ltd.

  14. Parameter Selection for Ant Colony Algorithm Based on Bacterial Foraging Algorithm

    Directory of Open Access Journals (Sweden)

    Peng Li

    2016-01-01

    Full Text Available The optimal performance of the ant colony algorithm (ACA mainly depends on suitable parameters; therefore, parameter selection for ACA is important. We propose a parameter selection method for ACA based on the bacterial foraging algorithm (BFA, considering the effects of coupling between different parameters. Firstly, parameters for ACA are mapped into a multidimensional space, using a chemotactic operator to ensure that each parameter group approaches the optimal value, speeding up the convergence for each parameter set. Secondly, the operation speed for optimizing the entire parameter set is accelerated using a reproduction operator. Finally, the elimination-dispersal operator is used to strengthen the global optimization of the parameters, which avoids falling into a local optimal solution. In order to validate the effectiveness of this method, the results were compared with those using a genetic algorithm (GA and a particle swarm optimization (PSO, and simulations were conducted using different grid maps for robot path planning. The results indicated that parameter selection for ACA based on BFA was the superior method, able to determine the best parameter combination rapidly, accurately, and effectively.

  15. Access Network Selection Based on Fuzzy Logic and Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Mohammed Alkhawlani

    2008-01-01

    Full Text Available In the next generation of heterogeneous wireless networks (HWNs, a large number of different radio access technologies (RATs will be integrated into a common network. In this type of networks, selecting the most optimal and promising access network (AN is an important consideration for overall networks stability, resource utilization, user satisfaction, and quality of service (QoS provisioning. This paper proposes a general scheme to solve the access network selection (ANS problem in the HWN. The proposed scheme has been used to present and design a general multicriteria software assistant (SA that can consider the user, operator, and/or the QoS view points. Combined fuzzy logic (FL and genetic algorithms (GAs have been used to give the proposed scheme the required scalability, flexibility, and simplicity. The simulation results show that the proposed scheme and SA have better and more robust performance over the random-based selection.

  16. An artificial bee colony algorithm for uncertain portfolio selection.

    Science.gov (United States)

    Chen, Wei

    2014-01-01

    Portfolio selection is an important issue for researchers and practitioners. In this paper, under the assumption that security returns are given by experts' evaluations rather than historical data, we discuss the portfolio adjusting problem which takes transaction costs and diversification degree of portfolio into consideration. Uncertain variables are employed to describe the security returns. In the proposed mean-variance-entropy model, the uncertain mean value of the return is used to measure investment return, the uncertain variance of the return is used to measure investment risk, and the entropy is used to measure diversification degree of portfolio. In order to solve the proposed model, a modified artificial bee colony (ABC) algorithm is designed. Finally, a numerical example is given to illustrate the modelling idea and the effectiveness of the proposed algorithm.

  17. Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm

    KAUST Repository

    Abbas, Ahmed

    2013-01-07

    A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into p-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx. © 2013

  18. Automatic peak selection by a Benjamini-Hochberg-based algorithm.

    Directory of Open Access Journals (Sweden)

    Ahmed Abbas

    Full Text Available A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into [Formula: see text]-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx.

  19. Feature-Selective Attentional Modulations in Human Frontoparietal Cortex.

    Science.gov (United States)

    Ester, Edward F; Sutterer, David W; Serences, John T; Awh, Edward

    2016-08-03

    Control over visual selection has long been framed in terms of a dichotomy between "source" and "site," where top-down feedback signals originating in frontoparietal cortical areas modulate or bias sensory processing in posterior visual areas. This distinction is motivated in part by observations that frontoparietal cortical areas encode task-level variables (e.g., what stimulus is currently relevant or what motor outputs are appropriate), while posterior sensory areas encode continuous or analog feature representations. Here, we present evidence that challenges this distinction. We used fMRI, a roving searchlight analysis, and an inverted encoding model to examine representations of an elementary feature property (orientation) across the entire human cortical sheet while participants attended either the orientation or luminance of a peripheral grating. Orientation-selective representations were present in a multitude of visual, parietal, and prefrontal cortical areas, including portions of the medial occipital cortex, the lateral parietal cortex, and the superior precentral sulcus (thought to contain the human homolog of the macaque frontal eye fields). Additionally, representations in many-but not all-of these regions were stronger when participants were instructed to attend orientation relative to luminance. Collectively, these findings challenge models that posit a strict segregation between sources and sites of attentional control on the basis of representational properties by demonstrating that simple feature values are encoded by cortical regions throughout the visual processing hierarchy, and that representations in many of these areas are modulated by attention. Influential models of visual attention posit a distinction between top-down control and bottom-up sensory processing networks. These models are motivated in part by demonstrations showing that frontoparietal cortical areas associated with top-down control represent abstract or categorical stimulus

  20. An improved algorithm for information hiding based on features of Arabic text: A Unicode approach

    Directory of Open Access Journals (Sweden)

    A.A. Mohamed

    2014-07-01

    Full Text Available Steganography means how to hide secret information in a cover media, so that other individuals fail to realize their existence. Due to the lack of data redundancy in the text file in comparison with other carrier files, text steganography is a difficult problem to solve. In this paper, we proposed a new promised steganographic algorithm for Arabic text based on features of Arabic text. The focus is on more secure algorithm and high capacity of the carrier. Our extensive experiments using the proposed algorithm resulted in a high capacity of the carrier media. The embedding capacity rate ratio of the proposed algorithm is high. In addition, our algorithm can resist traditional attacking methods since it makes the changes in carrier text as minimum as possible.

  1. An image-tracking algorithm based on object center distance-weighting and image feature recognition

    Institute of Scientific and Technical Information of China (English)

    JIANG Shuhong; WANG Qin; ZHANG Jianqiu; HU Bo

    2007-01-01

    Areal-time image-tracking algorithm is proposed.which gives small weights to pixels farther from the object center and uses the quantized image gray scales as a template.It identifies the target's location by the mean-shift iteration method and arrives at the target's scale by using image feature recognition.It improves the kernel-based algorithm in tracking scale-changing targets.A decimation mcthod is proposed to track large-sized targets and real-time experimental results verify the effectiveness of the proposed algorithm.

  2. Medical Image Fusion Algorithm Based on Nonlinear Approximation of Contourlet Transform and Regional Features

    Directory of Open Access Journals (Sweden)

    Hui Huang

    2017-01-01

    Full Text Available According to the pros and cons of contourlet transform and multimodality medical imaging, here we propose a novel image fusion algorithm that combines nonlinear approximation of contourlet transform with image regional features. The most important coefficient bands of the contourlet sparse matrix are retained by nonlinear approximation. Low-frequency and high-frequency regional features are also elaborated to fuse medical images. The results strongly suggested that the proposed algorithm could improve the visual effects of medical image fusion and image quality, image denoising, and enhancement.

  3. Characterization of computer network events through simultaneous feature selection and clustering of intrusion alerts

    Science.gov (United States)

    Chen, Siyue; Leung, Henry; Dondo, Maxwell

    2014-05-01

    As computer network security threats increase, many organizations implement multiple Network Intrusion Detection Systems (NIDS) to maximize the likelihood of intrusion detection and provide a comprehensive understanding of intrusion activities. However, NIDS trigger a massive number of alerts on a daily basis. This can be overwhelming for computer network security analysts since it is a slow and tedious process to manually analyse each alert produced. Thus, automated and intelligent clustering of alerts is important to reveal the structural correlation of events by grouping alerts with common features. As the nature of computer network attacks, and therefore alerts, is not known in advance, unsupervised alert clustering is a promising approach to achieve this goal. We propose a joint optimization technique for feature selection and clustering to aggregate similar alerts and to reduce the number of alerts that analysts have to handle individually. More precisely, each identified feature is assigned a binary value, which reflects the feature's saliency. This value is treated as a hidden variable and incorporated into a likelihood function for clustering. Since computing the optimal solution of the likelihood function directly is analytically intractable, we use the Expectation-Maximisation (EM) algorithm to iteratively update the hidden variable and use it to maximize the expected likelihood. Our empirical results, using a labelled Defense Advanced Research Projects Agency (DARPA) 2000 reference dataset, show that the proposed method gives better results than the EM clustering without feature selection in terms of the clustering accuracy.

  4. Selection of clinical features for pattern recognition applied to gait analysis.

    Science.gov (United States)

    Altilio, Rosa; Paoloni, Marco; Panella, Massimo

    2017-04-01

    This paper deals with the opportunity of extracting useful information from medical data retrieved directly from a stereophotogrammetric system applied to gait analysis. A feature selection method to exhaustively evaluate all the possible combinations of the gait parameters is presented, in order to find the best subset able to classify among diseased and healthy subjects. This procedure will be used for estimating the performance of widely used classification algorithms, whose performance has been ascertained in many real-world problems with respect to well-known classification benchmarks, both in terms of number of selected features and classification accuracy. Precisely, support vector machine, Naive Bayes and K nearest neighbor classifiers can obtain the lowest classification error, with an accuracy greater than 97 %. For the considered classification problem, the whole set of features will be proved to be redundant and it can be significantly pruned. Namely, groups of 3 or 5 features only are able to preserve high accuracy when the aim is to check the anomaly of a gait. The step length and the swing speed are the most informative features for the gait analysis, but also cadence and stride may add useful information for the movement evaluation.

  5. Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

    Directory of Open Access Journals (Sweden)

    Junbao Zheng

    2012-03-01

    Full Text Available Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor as well as its parallel channels (inner factor. The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.

  6. A novel feature ranking algorithm for biometric recognition with PPG signals.

    Science.gov (United States)

    Reşit Kavsaoğlu, A; Polat, Kemal; Recep Bozkurt, M

    2014-06-01

    This study is intended for describing the application of the Photoplethysmography (PPG) signal and the time domain features acquired from its first and second derivatives for biometric identification. For this purpose, a sum of 40 features has been extracted and a feature-ranking algorithm is proposed. This proposed algorithm calculates the contribution of each feature to biometric recognition and collocates the features, the contribution of which is from great to small. While identifying the contribution of the features, the Euclidean distance and absolute distance formulas are used. The efficiency of the proposed algorithms is demonstrated by the results of the k-NN (k-nearest neighbor) classifier applications of the features. During application, each 15-period-PPG signal belonging to two different durations from each of the thirty healthy subjects were used with a PPG data acquisition card. The first PPG signals recorded from the subjects were evaluated as the 1st configuration; the PPG signals recorded later at a different time as the 2nd configuration and the combination of both were evaluated as the 3rd configuration. When the results were evaluated for the k-NN classifier model created along with the proposed algorithm, an identification of 90.44% for the 1st configuration, 94.44% for the 2nd configuration, and 87.22% for the 3rd configuration has successfully been attained. The obtained results showed that both the proposed algorithm and the biometric identification model based on this developed PPG signal are very promising for contactless recognizing the people with the proposed method.

  7. Adaptive Equalizer Using Selective Partial Update Algorithm and Selective Regressor Affine Projection Algorithm over Shallow Water Acoustic Channels

    Directory of Open Access Journals (Sweden)

    Masoumeh Soflaei

    2014-01-01

    Full Text Available One of the most important problems of reliable communications in shallow water channels is intersymbol interference (ISI which is due to scattering from surface and reflecting from bottom. Using adaptive equalizers in receiver is one of the best suggested ways for overcoming this problem. In this paper, we apply the family of selective regressor affine projection algorithms (SR-APA and the family of selective partial update APA (SPU-APA which have low computational complexity that is one of the important factors that influences adaptive equalizer performance. We apply experimental data from Strait of Hormuz for examining the efficiency of the proposed methods over shallow water channel. We observe that the values of the steady-state mean square error (MSE of SR-APA and SPU-APA decrease by 5.8 (dB and 5.5 (dB, respectively, in comparison with least mean square (LMS algorithm. Also the families of SPU-APA and SR-APA have better convergence speed than LMS type algorithm.

  8. Text Feature Weighting For Summarization Of Document Bahasa Indonesia Using Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Aristoteles.

    2012-05-01

    Full Text Available This paper aims to perform the text feature weighting for summarization of document bahasa Indonesia using genetic algorithm. There are eleven text features, i.e, sentence position (f1, positive keywords in sentence (f2, negative keywords in sentence (f3, sentence centrality (f4, sentence resemblance to the title (f5, sentence inclusion of name entity (f6, sentence inclusion of numerical data (f7, sentence relative length (f8, bushy path of the node (f9, summation of similarities for each node (f10, and latent semantic feature (f11. We investigate the effect of the first ten sentence features on the summarization task. Then, we use latent semantic feature to increase the accuracy. All feature score functions are used to train a genetic algorithm model to obtain a suitable combination of feature weights. Evaluation of text summarization uses F-measure. The F-measure directly related to the compression rate. The results showed that adding f11 increases the F-measure by 3.26% and 1.55% for compression ratio of 10% and 30%, respectively. On the other hand, it decreases the F-measure by 0.58% for compression ratio of 20%. Analysis of text feature weight showed that only using f2, f4, f5, and f11 can deliver a similar performance using all eleven features.

  9. The production route selection algorithm in virtual manufacturing networks

    Science.gov (United States)

    Krenczyk, D.; Skolud, B.; Olender, M.

    2017-08-01

    The increasing requirements and competition in the global market are challenges for the companies profitability in production and supply chain management. This situation became the basis for construction of virtual organizations, which are created in response to temporary needs. The problem of the production flow planning in virtual manufacturing networks is considered. In the paper the algorithm of the production route selection from the set of admissible routes, which meets the technology and resource requirements and in the context of the criterion of minimum cost is proposed.

  10. Spectrum Feature Retrieval and Comparison of Remote Sensing Images Using Improved ISODATA Algorithm

    Institute of Scientific and Technical Information of China (English)

    刘磊; 敬忠良; 肖刚

    2004-01-01

    Due to the large quantities of data and high relativity of the spectra of remote sensing images, K-L transformation is used to eliminate the relativity. An improved ISODATA(Interative Self-Organizing Data Analysis Technique A) algorithm is used to extract the spectrum features of the images. The computation is greatly reduced and the dynamic arguments are realized. The comparison of features between two images is carried out, and good results are achieved in simulation.

  11. An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection

    Science.gov (United States)

    Meng, Jin; Li, Heng

    2017-07-01

    An efficient method for uncertainty quantification for flow in porous media is studied in this paper, where response surface of sparse polynomial chaos expansion (PCE) is constructed with the aid of feature selection method. The number of basis functions in PCE grows exponentially as the random dimensionality increases, which makes the computational cost unaffordable in high-dimensional problems. In this study, a feature selection method is introduced to select major stochastic features for the PCE by running a limited number of simulations, and the resultant PCE is termed as sparse PCE. Specifically, the least absolute shrinkage and selection operator modified least angle regression algorithm (LASSO-LAR) is applied for feature selection and the selected features are assessed by cross-validation (CV). Besides, inherited samples are utilized to make the algorithm self-adaptive. In this study, we test the performance of sparse PCE for uncertainty quantification for flow in heterogeneous media with different spatial variability. The statistical moments and probability density function of the output random field are accurately estimated through the sparse PCE, meanwhile the computational efforts are greatly reduced compared to the Monte Carlo method.

  12. Feature selection for anomaly–based network intrusion detection using cluster validity indices

    CSIR Research Space (South Africa)

    Naidoo, Tyrone

    2015-09-01

    Full Text Available data, which is rarely available in operational networks. It uses normalized cluster validity indices as an objective function that is optimized over the search space of candidate feature subsets via a genetic algorithm. Feature sets produced...

  13. Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface

    Directory of Open Access Journals (Sweden)

    Michael H. Thaut

    2005-11-01

    Full Text Available Most EEG-based BCI systems make use of well-studied patterns of brain activity. However, those systems involve tasks that indirectly map to simple binary commands such as “yes” or “no” or require many weeks of biofeedback training. We hypothesized that signal processing and machine learning methods can be used to discriminate EEG in a direct “yes”/“no” BCI from a single session. Blind source separation (BSS and spectral transformations of the EEG produced a 180-dimensional feature space. We used a modified genetic algorithm (GA wrapped around a support vector machine (SVM classifier to search the space of feature subsets. The GA-based search found feature subsets that outperform full feature sets and random feature subsets. Also, BSS transformations of the EEG outperformed the original time series, particularly in conjunction with a subset search of both spaces. The results suggest that BSS and feature selection can be used to improve the performance of even a “direct,” single-session BCI.

  14. Feature selection using angle modulated simulated Kalman filter for peak classification of EEG signals.

    Science.gov (United States)

    Adam, Asrul; Ibrahim, Zuwairie; Mokhtar, Norrima; Shapiai, Mohd Ibrahim; Mubin, Marizan; Saad, Ismail

    2016-01-01

    In the existing electroencephalogram (EEG) signals peak classification research, the existing models, such as Dumpala, Acir, Liu, and Dingle peak models, employ different set of features. However, all these models may not be able to offer good performance for various applications and it is found to be problem dependent. Therefore, the objective of this study is to combine all the associated features from the existing models before selecting the best combination of features. A new optimization algorithm, namely as angle modulated simulated Kalman filter (AMSKF) will be employed as feature selector. Also, the neural network random weight method is utilized in the proposed AMSKF technique as a classifier. In the conducted experiment, 11,781 samples of peak candidate are employed in this study for the validation purpose. The samples are collected from three different peak event-related EEG signals of 30 healthy subjects; (1) single eye blink, (2) double eye blink, and (3) eye movement signals. The experimental results have shown that the proposed AMSKF feature selector is able to find the best combination of features and performs at par with the existing related studies of epileptic EEG events classification.

  15. Materials Selection Criteria for Nuclear Power Applications: A Decision Algorithm

    Science.gov (United States)

    Rodríguez-Prieto, Álvaro; Camacho, Ana María; Sebastián, Miguel Ángel

    2016-02-01

    An innovative methodology based on stringency levels is proposed in this paper and improves the current selection method for structural materials used in demanding industrial applications. This paper describes a new approach for quantifying the stringency of materials requirements based on a novel deterministic algorithm to prevent potential failures. We have applied the new methodology to different standardized specifications used in pressure vessels design, such as SA-533 Grade B Cl.1, SA-508 Cl.3 (issued by the American Society of Mechanical Engineers), DIN 20MnMoNi55 (issued by the German Institute of Standardization) and 16MND5 (issued by the French Nuclear Commission) specifications and determine the influence of design code selection. This study is based on key scientific publications on the influence of chemical composition on the mechanical behavior of materials, which were not considered when the technological requirements were established in the aforementioned specifications. For this purpose, a new method to quantify the efficacy of each standard has been developed using a deterministic algorithm. The process of assigning relative weights was performed by consulting a panel of experts in materials selection for reactor pressure vessels to provide a more objective methodology; thus, the resulting mathematical calculations for quantitative analysis are greatly simplified. The final results show that steel DIN 20MnMoNi55 is the best material option. Additionally, more recently developed materials such as DIN 20MnMoNi55, 16MND5 and SA-508 Cl.3 exhibit mechanical requirements more stringent than SA-533 Grade B Cl.1. The methodology presented in this paper can be used as a decision tool in selection of materials for a wide range of applications.

  16. Applying a Locally Linear Embedding Algorithm for Feature Extraction and Visualization of MI-EEG

    Directory of Open Access Journals (Sweden)

    Mingai Li

    2016-01-01

    Full Text Available Robotic-assisted rehabilitation system based on Brain-Computer Interface (BCI is an applicable solution for stroke survivors with a poorly functioning hemiparetic arm. The key technique for rehabilitation system is the feature extraction of Motor Imagery Electroencephalography (MI-EEG, which is a nonlinear time-varying and nonstationary signal with remarkable time-frequency characteristic. Though a few people have made efforts to explore the nonlinear nature from the perspective of manifold learning, they hardly take into full account both time-frequency feature and nonlinear nature. In this paper, a novel feature extraction method is proposed based on the Locally Linear Embedding (LLE algorithm and DWT. The multiscale multiresolution analysis is implemented for MI-EEG by DWT. LLE is applied to the approximation components to extract the nonlinear features, and the statistics of the detail components are calculated to obtain the time-frequency features. Then, the two features are combined serially. A backpropagation neural network is optimized by genetic algorithm and employed as a classifier to evaluate the effectiveness of the proposed method. The experiment results of 10-fold cross validation on a public BCI Competition dataset show that the nonlinear features visually display obvious clustering distribution and the fused features improve the classification accuracy and stability. This paper successfully achieves application of manifold learning in BCI.

  17. [Feature extraction for breast cancer data based on geometric algebra theory and feature selection using differential evolution].

    Science.gov (United States)

    Li, Jing; Hong, Wenxue

    2014-12-01

    The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.

  18. An efficient fractal image coding algorithm using unified feature and DCT

    Energy Technology Data Exchange (ETDEWEB)

    Zhou Yiming [Department of Automation, Tsinghua University, Beijing 100084 (China)], E-mail: zhouym02@mails.tsinghua.edu.cn; Zhang Chao; Zhang Zengke [Department of Automation, Tsinghua University, Beijing 100084 (China)

    2009-02-28

    Fractal image compression is a promising technique to improve the efficiency of image storage and image transmission with high compression ratio, however, the huge time consumption for the fractal image coding is a great obstacle to the practical applications. In order to improve the fractal image coding, efficient fractal image coding algorithms using a special unified feature and a DCT coder are proposed in this paper. Firstly, based on a necessary condition to the best matching search rule during fractal image coding, the fast algorithm using a special unified feature (UFC) is addressed, and it can reduce the search space obviously and exclude most inappropriate matching subblocks before the best matching search. Secondly, on the basis of UFC algorithm, in order to improve the quality of the reconstructed image, a DCT coder is combined to construct a hybrid fractal image algorithm (DUFC). Experimental results show that the proposed algorithms can obtain good quality of the reconstructed images and need much less time than the baseline fractal coding algorithm.

  19. Multiobjective optimization using an immunodominance and clonal selection inspired algorithm

    Institute of Scientific and Technical Information of China (English)

    GONG MaoGuo; JIAO LiCheng; MA WenPing; DU HaiFeng

    2008-01-01

    Based on the mechanisms of immunodominance and clonal selection theory, we propose a new multiobjective optimization algorithm, immune dominance clonal multiobjective algorithm (IDCMA). IDCMA is unique in that its fitness values of current dominated individuals are assigned as the values of a custom distance measure, termed as Ab-Ab affinity, between the dominated individuals and one of the nondominated individuals found so far. According to the values of Ab-Ab affin-ity, all dominated individuals (antibodies) are divided into two kinds, subdominant antibodies and cryptic antibodies. Moreover, local search only applies to the sub-dominant antibodies, while the cryptic antibodies are redundant and have no func-tion during local search, but they can become subdominant (active) antibodies during the subsequent evolution. Furthermore, a new immune operation, clonal proliferation is provided to enhance local search. Using the clonal proliferation operation, IDCMA reproduces individuals and selects their improved maturated progenies after local search, so single individuals can exploit their surrounding space effectively and the newcomers yield a broader exploration of the search space. The performance comparison of IDCMA with MISA, NSGA-II, SPEA, PAES, NSGA, VEGA, NPGA, and HLGA in solving six well-known multiobjective function optimization problems and nine multiobjective 0/1 knapsack problems shows that IDCMA has a good performance in converging to approximate Pareto-optimal fronts with a good distribution.

  20. PROSODIC FEATURE BASED TEXT DEPENDENT SPEAKER RECOGNITION USING MACHINE LEARNING ALGORITHMS

    Directory of Open Access Journals (Sweden)

    Sunil Agrawal

    2010-10-01

    Full Text Available Most of us are aware of the fact that voices of different individuals do not sound alike. The ability of recognizing a person solely from his voice is known as speaker recognition. Speaker recognition can not only assist in building better access control systems and security apparatus, it can be a useful tool in many other areas such as forensic speech analysis. The choice of features plays an important role in the performance of ML algorithm. Here we propose prosodic features based text dependent speaker recognition where the prosodic features can be extracted through linear predictive coding. Formants are efficient parameters to characterize a speaker’s voice. Formants are combined with their corresponding amplitudes, fundamental frequency, duration of speech utterance and energy ofthe windowed section. This feature vector is input to machine learning (ML algorithms for recognition. We investigate the performance of four ML algorithms namely MLP, RBFN, C4.5 decision tree, and BayesNet. Out of these ML algorithms, C4.5 decision tree performance is consistent. MLP performs better for gender recognition and experimental results show that RBFN gives better performance for increased population size.

  1. VHDL implementation of feature-extraction algorithm for the PANDA electromagnetic calorimeter

    NARCIS (Netherlands)

    Guliyev, E.; Kavatsyuk, M.; Lemmens, P. J. J.; Tambave, G.; Löhner, H.

    2012-01-01

    A simple, efficient, and robust feature-extraction algorithm, developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA spectrometer at FAIR, Darmstadt, is implemented in VHDL for a commercial 16 bit 100 MHz sampling ADC. The source-code is available as an open-

  2. VHDL Implementation of Feature-Extraction Algorithm for the PANDA Electromagnetic Calorimeter

    NARCIS (Netherlands)

    Kavatsyuk, M.; Guliyev, E.; Lemmens, P. J. J.; Löhner, H.; Tambave, G.

    2010-01-01

    The feature-extraction algorithm, developed for the digital front-end electronics of the electromagnetic calorimeter of the PANDA detector at the future FAIR facility, is implemented in VHDL for a commercial 16 bit 100 MHz sampling ADC. The use of modified firmware with the running on-line data-proc

  3. Influence of Topological Features on Spatially-Structured Evolutionary Algorithms Dynamics

    CERN Document Server

    DeFelice, Matteo; Panzieri, Stefano

    2012-01-01

    In the last decades, complex networks theory significantly influenced other disciplines on the modeling of both static and dynamic aspects of systems observed in nature. This work aims to investigate the effects of networks' topological features on the dynamics of an evolutionary algorithm, considering in particular the ability to find a large number of optima on multi-modal problems. We introduce a novel spatially-structured evolutionary algorithm and we apply it on two combinatorial problems: ONEMAX and the multi-modal NMAX. Considering three different network models we investigate the relationships between their features, algorithm's convergence and its ability to find multiple optima (for the multi-modal problem). In order to perform a deeper analysis we investigate the introduction of weighted graphs with time-varying weights. The results show that networks with a large Average Path Length lead to an higher number of optima and a consequent slow exploration dynamics (i.e. low First Hitting Time). Further...

  4. Feature Extraction of Localized Scattering Centers Using the Modified TLS-Prony Algorithm and Its Applications

    Institute of Scientific and Technical Information of China (English)

    王军

    2002-01-01

    This paper presents an all-parametric model of radar target in optic region, in which the localized scattering center's frequency and aspect angle dependent scattering level, distance and azimuth locations are modeled as the feature vectors. And the traditional TLS-Prony algorithm is modified to extract these feature vectors. The analysis of CramerRao bound shows that the modified algorithm not only improves the restriction of high signal-to-noise ratio (SNR)threshold of traditional TLS-Prony algorithm, but also is suitable to the extraction of big damped coefficients and highresolution estimation of near separation poles. Finally, an illustrative example is presented to verify its practicability in the applications. The experimental results show that the method developed can not only recognize two airplane-like targets with similar shape at low SNR, but also compress the original radar data with high fidelity.

  5. A Pattern Recognition Feature Optimization Tool Using the Visual Empirical Region of Influence Algorithm

    Energy Technology Data Exchange (ETDEWEB)

    MARTINEZ, RUBEL F.

    2002-06-01

    This document is the second in a series that describe graphical user interface tools developed to control the Visual Empirical Region of Influence (VERI) algorithm. In this paper we describe a user interface designed to optimize the VERI algorithm results. The optimization mode uses a brute force method of searching through the combinations of features in a data set for features that produce the best pattern recognition results. With a small number of features in a data set an exact solution can be determined. However, the number of possible combinations increases exponentially with the number of features and an alternate means of finding a solution must be found. We developed and implemented a technique for finding solutions in data sets with both small and large numbers of features. This document illustrates step-by-step examples of how to use the interface and how to interpret the results. It is written in two parts, part I deals with using the interface to find the best combination from all possible sets of features, part II describes how to use the tool to find a good solution in data sets with a large number of features. The VERI Optimization Interface Tool was written using the Tcl/Tk Graphical User Interface (GUI) programming language, version 8.1. Although the Tcl/Tk packages are designed to run on multiple computer platforms, we have concentrated our efforts to develop a user interface for the ubiquitous DOS environment. The VERI algorithms are compiled, executable programs. The optimization interface executes the VERI algorithm in Leave-One-Out mode using the Euclidean metric. For a thorough description of the type of data analysis we perform, and for a general Pattern Recognition tutorial, refer to our website at: http://www.sandia.gov/imrl/XVisionScience/Xusers.htm.

  6. Image registration algorithm using Mexican hat function-based operator and grouped feature matching strategy.

    Directory of Open Access Journals (Sweden)

    Feng Jin

    Full Text Available Feature detection and matching are crucial for robust and reliable image registration. Although many methods have been developed, they commonly focus on only one class of image features. The methods that combine two or more classes of features are still novel and significant. In this work, methods for feature detection and matching are proposed. A Mexican hat function-based operator is used for image feature detection, including the local area detection and the feature point detection. For the local area detection, we use the Mexican hat operator for image filtering, and then the zero-crossing points are extracted and merged into the area borders. For the feature point detection, the Mexican hat operator is performed in scale space to get the key points. After the feature detection, an image registration is achieved by using the two classes of image features. The feature points are grouped according to a standardized region that contains correspondence to the local area, precise registration is achieved eventually by the grouped points. An image transformation matrix is estimated by the feature points in a region and then the best one is chosen through competition of a set of the transformation matrices. This strategy has been named the Grouped Sample Consensus (GCS. The GCS has also ability for removing the outliers effectively. The experimental results show that the proposed algorithm has high registration accuracy and small computational volume.

  7. An Algorithm of Image Contrast Enhancement Based on Pixels Neighborhood’s Local Feature

    Directory of Open Access Journals (Sweden)

    Chen Yan

    2013-12-01

    Full Text Available In this study, we proposed an algorithm of Image Contrast enhancement based on local feature to acquire edge information of image, remove Ray Imaging noise and overcome edge blurry and other defects. This method can extract edge features and finish contrast enhancement in varying degrees for pixels neighborhood with different characteristics by using neighborhood local variance and complexity function, which can achieve local features enhancement. The stimulation shows that the method can not only enhance the contrast of the entire image, but also effectively preserves image edge information and improve image quality.

  8. Automatic Correction Algorithm of Hyfrology Feature Attribute in National Geographic Census

    Science.gov (United States)

    Li, C.; Guo, P.; Liu, X.

    2017-09-01

    A subset of the attributes of hydrologic features data in national geographic census are not clear, the current solution to this problem was through manual filling which is inefficient and liable to mistakes. So this paper proposes an automatic correction algorithm of hydrologic features attribute. Based on the analysis of the structure characteristics and topological relation, we put forward three basic principles of correction which include network proximity, structure robustness and topology ductility. Based on the WJ-III map workstation, we realize the automatic correction of hydrologic features. Finally, practical data is used to validate the method. The results show that our method is highly reasonable and efficient.

  9. An Organelle Correlation-Guided Feature Selection Approach for Classifying Multi-Label Subcellular Bio-images.

    Science.gov (United States)

    Shao, Wei; Liu, Mingxia; Xu, Ying-Ying; Shen, Hong-Bin; Zhang, Daoqiang

    2017-03-03

    Nowadays, with the advances in microscopic imaging, accurate classification of bioimage-based protein subcellular location pattern has attracted as much attention as ever. One of the basic challenging problems is how to select the useful feature components among thousands of potential features to describe the images. This is not an easy task especially considering there is a high ratio of multi-location proteins. Existing feature selection methods seldom take the correlation among different cellular compartments into consideration, and thus may miss some features that will be co-important for several subcellular locations. To deal with this problem, we make use of the important structural correlation among different cellular compartments and propose an organelle structural correlation regularized feature selection method CSF (Common-Sets of Features) in this paper. We formulate the multi-label classification problem by adopting a group-sparsity regularizer to select common subsets of relevant features from different cellular compartments. In addition, we also add a cell structural correlation regularized Laplacian term, which utilizes the prior biological structural information to capture the intrinsic dependency among different cellular compartments. The CSF provides a new feature selection strategy for multi-label bio-image subcellular pattern classifications, and the experimental results also show its superiority when comparing with several existing algorithms.

  10. Microcanonical Annealing and Threshold Accepting for Parameter Determination and Feature Selection of Support Vector Machines

    Directory of Open Access Journals (Sweden)

    Seyyid Ahmed Medjahed

    2016-12-01

    Full Text Available Support vector machine (SVM is a popular classification technique with many diverse applications. Parameter determination and feature selection significantly influences the classification accuracy rate and the SVM model quality. This paper proposes two novel approaches based on: Microcanonical Annealing (MA-SVM and Threshold Accepting (TA-SVM to determine the optimal value parameter and the relevant features subset, without reducing SVM classification accuracy. In order to evaluate the performance of MA-SVM and TA-SVM, several public datasets are employed to compute the classification accuracy rate. The proposed approaches were tested in the context of medical diagnosis. Also, we tested the approaches on DNA microarray datasets used for cancer diagnosis. The results obtained by the MA-SVM and TA-SVM algorithms are shown to be superior and have given a good performance in the DNA microarray data sets which are characterized by the large number of features. Therefore, the MA-SVM and TA-SVM approaches are well suited for parameter determination and feature selection in SVM.

  11. An Enhanced Grey Wolf Optimization Based Feature Selection Wrapped Kernel Extreme Learning Machine for Medical Diagnosis

    Science.gov (United States)

    Li, Qiang; Zhao, Xuehua; Cai, ZhenNao; Tong, Changfei; Liu, Wenbin; Tian, Xin

    2017-01-01

    In this study, a new predictive framework is proposed by integrating an improved grey wolf optimization (IGWO) and kernel extreme learning machine (KELM), termed as IGWO-KELM, for medical diagnosis. The proposed IGWO feature selection approach is used for the purpose of finding the optimal feature subset for medical data. In the proposed approach, genetic algorithm (GA) was firstly adopted to generate the diversified initial positions, and then grey wolf optimization (GWO) was used to update the current positions of population in the discrete searching space, thus getting the optimal feature subset for the better classification purpose based on KELM. The proposed approach is compared against the original GA and GWO on the two common disease diagnosis problems in terms of a set of performance metrics, including classification accuracy, sensitivity, specificity, precision, G-mean, F-measure, and the size of selected features. The simulation results have proven the superiority of the proposed method over the other two competitive counterparts. PMID:28246543

  12. Feature selection and classification of multiparametric medical images using bagging and SVM

    Science.gov (United States)

    Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

    2008-03-01

    This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.

  13. Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection

    Directory of Open Access Journals (Sweden)

    Park Jinho

    2012-06-01

    Full Text Available Abstract Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1 the area between QRS offset and T-peak points, 2 the normalized and signed sum from QRS offset to effective zero voltage point, and 3 the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE and support vector machine (SVM methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical

  14. Tracking features in retinal images of adaptive optics confocal scanning laser ophthalmoscope using KLT-SIFT algorithm.

    Science.gov (United States)

    Li, Hao; Lu, Jing; Shi, Guohua; Zhang, Yudong

    2010-06-28

    With the use of adaptive optics (AO), high-resolution microscopic imaging of living human retina in the single cell level has been achieved. In an adaptive optics confocal scanning laser ophthalmoscope (AOSLO) system, with a small field size (about 1 degree, 280 μm), the motion of the eye severely affects the stabilization of the real-time video images and results in significant distortions of the retina images. In this paper, Scale-Invariant Feature Transform (SIFT) is used to abstract stable point features from the retina images. Kanade-Lucas-Tomasi(KLT) algorithm is applied to track the features. With the tracked features, the image distortion in each frame is removed by the second-order polynomial transformation, and 10 successive frames are co-added to enhance the image quality. Features of special interest in an image can also be selected manually and tracked by KLT. A point on a cone is selected manually, and the cone is tracked from frame to frame.

  15. Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

    Science.gov (United States)

    Laimighofer, Michael; Krumsiek, Jan; Theis, Fabian J.

    2016-01-01

    Abstract With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN. PMID:26894327

  16. Selective Marketing for Retailers to promote Stock using improved Ant Colony Algorithm

    Directory of Open Access Journals (Sweden)

    S.SURIYA

    2013-10-01

    Full Text Available Data mining is a knowledge discovery process which deals with analysing large storage of data in order to identify the relevant data. It is a powerful tool to uncover relationships within the data.Association rule mining is an important data mining model to mine frequent items in huge repository of data. It frames out association rules with the help of minimum support and confidence value which inturns paves way to identify the occurrence of frequent item sets. Frequent pattern mining starts from analysis of customers buying habits. From which various associations between the different items that the customers purchase are identified. With the help of such associations retailers perform selective marketing to promote their business. Biologically inspired algorithms have their process observed in nature as their origin. The best feature of Ant colony algorithm, which is a bio inspired algorithm based on the behaviour of natural ant colonies, is its parallel search over the problem data and previously obtained results from it. Dynamic memory management is done by pheromone updating operation. During each cycle, solutions are constructed by evaluation of the transition probability throughpheromone level modification. An improved pheromone updating rule is used to find out all the frequent items. The proposed approach was tested using MATLAB along with WEKA toolkit. The experimental results prove that the stigmeric communication of improved ant colony algorithm helps in mining the frequent items faster and effectively than the existing algorithms.

  17. Classification Features of US Images Liver Extracted with Co-occurrence Matrix Using the Nearest Neighbor Algorithm

    Science.gov (United States)

    Moldovanu, Simona; Bibicu, Dorin; Moraru, Luminita; Nicolae, Mariana Carmen

    2011-12-01

    Co-occurrence matrix has been applied successfully for echographic images characterization because it contains information about spatial distribution of grey-scale levels in an image. The paper deals with the analysis of pixels in selected regions of interest of an US image of the liver. The useful information obtained refers to texture features such as entropy, contrast, dissimilarity and correlation extract with co-occurrence matrix. The analyzed US images were grouped in two distinct sets: healthy liver and steatosis (or fatty) liver. These two sets of echographic images of the liver build a database that includes only histological confirmed cases: 10 images of healthy liver and 10 images of steatosis liver. The healthy subjects help to compute four textural indices and as well as control dataset. We chose to study these diseases because the steatosis is the abnormal retention of lipids in cells. The texture features are statistical measures and they can be used to characterize irregularity of tissues. The goal is to extract the information using the Nearest Neighbor classification algorithm. The K-NN algorithm is a powerful tool to classify features textures by means of grouping in a training set using healthy liver, on the one hand, and in a holdout set using the features textures of steatosis liver, on the other hand. The results could be used to quantify the texture information and will allow a clear detection between health and steatosis liver.

  18. Online Learning of Hierarchical Pitman-Yor Process Mixture of Generalized Dirichlet Distributions With Feature Selection.

    Science.gov (United States)

    Fan, Wentao; Sallay, Hassen; Bouguila, Nizar

    2016-06-09

    In this paper, a novel statistical generative model based on hierarchical Pitman-Yor process and generalized Dirichlet distributions (GDs) is presented. The proposed model allows us to perform joint clustering and feature selection thanks to the interesting properties of the GD distribution. We develop an online variational inference algorithm, formulated in terms of the minimization of a Kullback-Leibler divergence, of our resulting model that tackles the problem of learning from high-dimensional examples. This variational Bayes formulation allows simultaneously estimating the parameters, determining the model's complexity, and selecting the appropriate relevant features for the clustering structure. Moreover, the proposed online learning algorithm allows data instances to be processed in a sequential manner, which is critical for large-scale and real-time applications. Experiments conducted using challenging applications, namely, scene recognition and video segmentation, where our approach is viewed as an unsupervised technique for visual learning in high-dimensional spaces, showed that the proposed approach is suitable and promising.

  19. An Optimal Remanufacturing Centre Selection Algorithm for Reverse Logistics Alliance

    Directory of Open Access Journals (Sweden)

    Uzma Hameed

    2013-07-01

    Full Text Available Reverse logistics has been an emerging field both in academic as well as in applied research since last two decades because of increasing consumer awareness, legislative initiatives and profits associated with reuse of products or components. The costs associated with reverse logistics are usually high and these need to be minimized. The current study focuses on the formulation of alliance for cost reductions in reverse logistics. Remanufacturing, refurbishing, repair, cannibalization and reuse are the processes which add value to the reverse logistics system and are capable of converting it into a profitable venture. Used products contribute a cheaper source of components and spares required to remanufacture a product because of the less costs associated with the labor and material resources when compared with the manufacturing of new parts or products. When a defective part is removed from a product or assembly, it can be restored to its original state of functionality. Instead of purchasing a new, the same can be restored from repair/remanufacture centre just replacing defective part with a new part or spare. Furthermore, for manufacturers to reduce investments in reverse logistics, the formations of alliance and sharing of facilities for remanufacturing can lead to more profitability. In this study a focus has been made for the formation of remanufacturing alliance and an algorithm has been formulated for the selection of optimal remanufacturing center for the reverse logistics alliance. A case company has been selected from emerging Chinese electronic manufacturing industry. The case has been solved by using data set of the selected company with the help of formulated algorithm.

  20. Comparisons of feature extraction algorithm based on unmanned aerial vehicle image

    Science.gov (United States)

    Xi, Wenfei; Shi, Zhengtao; Li, Dongsheng

    2017-07-01

    Feature point extraction technology has become a research hotspot in the photogrammetry and computer vision. The commonly used point feature extraction operators are SIFT operator, Forstner operator, Harris operator and Moravec operator, etc. With the high spatial resolution characteristics, UAV image is different from the traditional aviation image. Based on these characteristics of the unmanned aerial vehicle (UAV), this paper uses several operators referred above to extract feature points from the building images, grassland images, shrubbery images, and vegetable greenhouses images. Through the practical case analysis, the performance, advantages, disadvantages and adaptability of each algorithm are compared and analyzed by considering their speed and accuracy. Finally, the suggestions of how to adapt different algorithms in diverse environment are proposed.

  1. Satellite Imagery Cadastral Features Extractions using Image Processing Algorithms: A Viable Option for Cadastral Science

    Directory of Open Access Journals (Sweden)

    Usman Babawuro

    2012-07-01

    Full Text Available Satellite images are used for feature extraction among other functions. They are used to extract linear features, like roads, etc. These linear features extractions are important operations in computer vision. Computer vision has varied applications in photogrammetric, hydrographic, cartographic and remote sensing tasks. The extraction of linear features or boundaries defining the extents of lands, land covers features are equally important in Cadastral Surveying. Cadastral Surveying is the cornerstone of any Cadastral System. A two dimensional cadastral plan is a model which represents both the cadastral and geometrical information of a two dimensional labeled Image. This paper aims at using and widening the concepts of high resolution Satellite imagery data for extracting representations of cadastral boundaries using image processing algorithms, hence minimizing the human interventions. The Satellite imagery is firstly rectified hence establishing the satellite imagery in the correct orientation and spatial location for further analysis. We, then employ the much available Satellite imagery to extract the relevant cadastral features using computer vision and image processing algorithms. We evaluate the potential of using high resolution Satellite imagery to achieve Cadastral goals of boundary detection and extraction of farmlands using image processing algorithms. This method proves effective as it minimizes the human demerits associated with the Cadastral surveying method, hence providing another perspective of achieving cadastral goals as emphasized by the UN cadastral vision. Finally, as Cadastral science continues to look to the future, this research aimed at the analysis and getting insights into the characteristics and potential role of computer vision algorithms using high resolution satellite imagery for better digital Cadastre that would provide improved socio economic development.

  2. Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

    Directory of Open Access Journals (Sweden)

    Yeom, Ha-Neul

    2014-09-01

    Full Text Available In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.

  3. Digital watermarking algorithm based on scale-invariant feature regions in non-subsampled contourlet transform domain

    Institute of Scientific and Technical Information of China (English)

    Jian Zhao,Na Zhang,Jian Jia,; Huanwei Wang

    2015-01-01

    Contraposing the need of the robust digital watermark for the copyright protection field, a new digital watermarking algo-rithm in the non-subsampled contourlet transform (NSCT) domain is proposed. The largest energy sub-band after NSCT is selected to embed watermark. The watermark is embedded into scale-invariant feature transform (SIFT) regions. During embedding, the initial region is divided into some cirque sub-regions with the same area, and each watermark bit is embedded into one sub-region. Extensive simulation results and comparisons show that the algo-rithm gets a good trade-off of invisibility, robustness and capacity, thus obtaining good quality of the image while being able to effec-tively resist common image processing, and geometric and combo attacks, and normalized similarity is almost al reached.

  4. Mutual information-based feature selection for low-cost BCIs based on motor imagery.

    Science.gov (United States)

    Schiatti, L; Faes, L; Tessadori, J; Barresi, G; Mattos, L

    2016-08-01

    In the present study a feature selection algorithm based on mutual information (MI) was applied to electro-encephalographic (EEG) data acquired during three different motor imagery tasks from two dataset: Dataset I from BCI Competition IV including full scalp recordings from four subjects, and new data recorded from three subjects using the popular low-cost Emotiv EPOC EEG headset. The aim was to evaluate optimal channels and band-power (BP) features for motor imagery tasks discrimination, in order to assess the feasibility of a portable low-cost motor imagery based Brain-Computer Interface (BCI) system. The minimal sub set of features most relevant to task description and less redundant to each other was determined, and the corresponding classification accuracy was assessed offline employing linear support vector machine (SVM) in a 10-fold cross validation scheme. The analysis was performed: (a) on the original full Dataset I from BCI competition IV, (b) on a restricted channels set from Dataset I corresponding to available Emotiv EPOC electrodes locations, and (c) on data recorded with the EPOC system. Results from (a) showed that an offline classification accuracy above 80% can be reached using only 5 features. Limiting the analysis to EPOC channels caused a decrease of classification accuracy, although it still remained above chance level, both for data from (b) and (c). A top accuracy of 70% was achieved using 2 optimal features. These results encourage further research towards the development of portable low cost motor imagery-based BCI systems.

  5. Classifying human voices by using hybrid SFX time-series preprocessing and ensemble feature selection.

    Science.gov (United States)

    Fong, Simon; Lan, Kun; Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC.

  6. A feature matching and fusion-based positive obstacle detection algorithm for field autonomous land vehicles

    Directory of Open Access Journals (Sweden)

    Tao Wu

    2017-03-01

    Full Text Available Positive obstacles will cause damage to field robotics during traveling in field. Field autonomous land vehicle is a typical field robotic. This article presents a feature matching and fusion-based algorithm to detect obstacles using LiDARs for field autonomous land vehicles. There are three main contributions: (1 A novel setup method of compact LiDAR is introduced. This method improved the LiDAR data density and reduced the blind region of the LiDAR sensor. (2 A mathematical model is deduced under this new setup method. The ideal scan line is generated by using the deduced mathematical model. (3 Based on the proposed mathematical model, a feature matching and fusion (FMAF-based algorithm is presented in this article, which is employed to detect obstacles. Experimental results show that the performance of the proposed algorithm is robust and stable, and the computing time is reduced by an order of two magnitudes by comparing with other exited algorithms. This algorithm has been perfectly applied to our autonomous land vehicle, which has won the champion in the challenge of Chinese “Overcome Danger 2014” ground unmanned vehicle.

  7. Brake fault diagnosis using Clonal Selection Classification Algorithm (CSCA – A statistical learning approach

    Directory of Open Access Journals (Sweden)

    R. Jegadeeshwaran

    2015-03-01

    Full Text Available In automobile, brake system is an essential part responsible for control of the vehicle. Any failure in the brake system impacts the vehicle's motion. It will generate frequent catastrophic effects on the vehicle cum passenger's safety. Thus the brake system plays a vital role in an automobile and hence condition monitoring of the brake system is essential. Vibration based condition monitoring using machine learning techniques are gaining momentum. This study is one such attempt to perform the condition monitoring of a hydraulic brake system through vibration analysis. In this research, the performance of a Clonal Selection Classification Algorithm (CSCA for brake fault diagnosis has been reported. A hydraulic brake system test rig was fabricated. Under good and faulty conditions of a brake system, the vibration signals were acquired using a piezoelectric transducer. The statistical parameters were extracted from the vibration signal. The best feature set was identified for classification using attribute evaluator. The selected features were then classified using CSCA. The classification accuracy of such artificial intelligence technique has been compared with other machine learning approaches and discussed. The Clonal Selection Classification Algorithm performs better and gives the maximum classification accuracy (96% for the fault diagnosis of a hydraulic brake system.

  8. Intelligent feature selection techniques for pattern classification of Lamb wave signals

    Energy Technology Data Exchange (ETDEWEB)

    Hinders, Mark K.; Miller, Corey A. [College of William and Mary, Department of Applied Science, Williamsburg, Virginia 23187-8795 (United States)

    2014-02-18

    Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crosshole tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy.

  9. Determination of Selection Method in Genetic Algorithm for Land Suitability

    Directory of Open Access Journals (Sweden)

    Irfianti Asti Dwi

    2016-01-01

    Full Text Available Genetic Algoirthm is one alternative solution in the field of modeling optimization, automatic programming and machine learning. The purpose of the study was to compare some type of selection methods in Genetic Algorithm for land suitability. Contribution of this research applies the best method to develop region based horticultural commodities. This testing is done by comparing the three methods on the method of selection, the Roulette Wheel, Tournament Selection and Stochastic Universal Sampling. Parameters of the locations used in the test scenarios include Temperature = 27°C, Rainfall = 1200 mm, hummidity = 30%, Cluster fruit = 4, Crossover Probabiitiy (Pc = 0.6, Mutation Probabilty (Pm = 0.2 and Epoch = 10. The second test epoch incluides location parameters consist of Temperature = 30°C, Rainfall = 2000 mm, Humidity = 35%, Cluster fruit = 5, Crossover Probability (Pc = 0.7, Mutation Probability (Pm = 0.3 and Epoch 10. The conclusion of this study shows that the Roulette Wheel is the best method because it produces more stable and fitness value than the other two methods.

  10. Hybrid genetic algorithm approach for selective harmonic control

    Energy Technology Data Exchange (ETDEWEB)

    Dahidah, Mohamed S.A. [Faculty of Engineering, Multimedia University, 63100, Jalan Multimedia-Cyberjaya, Selangor (Malaysia); Agelidis, Vassilios G. [School of Electrical and Information Engineering, The University of Sydney, NSW (Australia); Rao, Machavaram V. [Faculty of Engineering and Technology, Multimedia University, 75450, Jalan Ayer Keroh Lama-Melaka (Malaysia)

    2008-02-15

    The paper presents an optimal solution for a selective harmonic elimination pulse width modulated (SHE-PWM) technique suitable for a high power inverter used in constant frequency utility applications. The main challenge of solving the associated non-linear equations, which are transcendental in nature and, therefore, have multiple solutions, is the convergence, and therefore, an initial point selected considerably close to the exact solution is required. The paper discusses an efficient hybrid real coded genetic algorithm (HRCGA) that reduces significantly the computational burden, resulting in fast convergence. An objective function describing a measure of the effectiveness of eliminating selected orders of harmonics while controlling the fundamental, namely a weighted total harmonic distortion (WTHD) is derived, and a comparison of different operating points is reported. It is observed that the method was able to find the optimal solution for a modulation index that is higher than unity. The theoretical considerations reported in this paper are verified through simulation and experimentally on a low power laboratory prototype. (author)

  11. New evolutions in TRNSYS : a selection of version 16 features

    Energy Technology Data Exchange (ETDEWEB)

    Bradley, D. [Thermal Energy System Specialists, Madison, WI (United States); Kummert, M. [Wisconsin Univ., Madison, WI (United States). Solar Energy Laboratory

    2005-07-01

    TRNSYS is a transient energy simulation package that has undergone continuous improvement since its development in 1975. TRNSYS was initially developed for the simulation of solar thermal processes, but has since expanded into a total energy modeling package. It models each component of an energy system as an individual black box component. Simulating a system involves connecting the inputs and outputs of the components to one another. If certain models are missing, they are quickly developed and added to the package by the international group of developers and users which include the Solar Energy Laboratory at the University of Wisconsin in Madison United States, the Centre Scientifique et Technique du Batiment in Nice France, and Transsolar Energietechnik GmBH in Stuttgart, Germany. This paper presented some of the issues faced by the users in updating the TRNSYS simulation tool to meet the challenges posed by new technologies and to make use of better algorithms and updated computing resources. In particular, it focused on adding new component models to the program and on increasing the ease of use of the program and continuing the trend to move TRNSYS from an academic research tool to a manageable commercial tool. The subset of the features that were added to the sixteenth version of the simulation package in November 2004 were presented. These include modeling the energy transfer between a conditioned building and the surrounding ground, implementing ASHRAE's effective Heat Flow method into TRNSYS, and implementing combined thermal/air flow simulations using a software link between TRNSYS and COMIS or CONTAM for the air flow simulation. A brief description of the hydrogen system components which model hydrogen power systems was also included along with graphical interface enhancements, and a description of simulation engine modifications such as starting time and the drop-in dynamic link libraries (DLL). 12 refs., 5 figs.

  12. Frequency selective surface structure optimized by genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    Lu Jun; Wang Jian-Bo; Sun Guan-Cheng

    2009-01-01

    Frequency selective surface(FSS)is a two-dimensional periodic structure which has prominent characteristics of bandpass or bandblock when interacting with electromagnetic waves.In this paper,the thickness,the dielectric constant,the element graph and the arrangement periodicity of an FSS medium are investigated by Genetic Algorithm(GA)when an electromagnetic wave is incident on the FSS at a wide angle,and an optimized FSS structure and transmission characteristics are obtained.The results show that the optimized structure has better stability in relation to incident angle of electromagnetic wave and preserves the stability of centre frequency even at an incident angle as large as 80°,thereby laying the foundation for the application of FSS to curved surfaces at wide angles.

  13. Noise reduction in selective computational ghost imaging using genetic algorithm

    Science.gov (United States)

    Zafari, Mohammad; Ahmadi-Kandjani, Sohrab; Kheradmand, Reza

    2017-03-01

    Recently, we have presented a selective computational ghost imaging (SCGI) method as an advanced technique for enhancing the security level of the encrypted ghost images. In this paper, we propose a modified method to improve the ghost image quality reconstructed by SCGI technique. The method is based on background subtraction using genetic algorithm (GA) which eliminates background noise and gives background-free ghost images. Analyzing the universal image quality index by using experimental data proves the advantage of this modification method. In particular, the calculated value of the image quality index for modified SCGI over 4225 realization shows an 11 times improvement with respect to SCGI technique. This improvement is 20 times in comparison to conventional CGI technique.

  14. Regularized F-Measure Maximization for Feature Selection and Classification

    Directory of Open Access Journals (Sweden)

    Zhenqiu Liu

    2009-01-01

    benchmark, methylation, and high dimensional microarray data show that the performance of proposed algorithm is better or equivalent compared with the other popular classifiers in limited experiments.

  15. A Retrieval Algorithm of Sheet Metal Parts Based on Relationships of Features

    Institute of Scientific and Technical Information of China (English)

    WANG Dawei; YAN Guangrong; LEI Yi; ZHANG Jiaying

    2012-01-01

    With the rapid increase in the number of three-dimensional (3D) models each year,to quickly and easily find the part desired has become a big challenge of enterprises.Meanwhile,many methods and algorithms have been proposed for part retrieval.However,most of the existing methods are designed for mechanical parts,and can not be well worked for sheet metal part retrieval.An approach to feature-based retrieval of sheet metal parts is presented.Firstly,the features frequently used in sheet metal part design are chosen as the "key words" in retrieval.Based on those features,a relative position model is built to express the different relationships of the features in 3D space.Secondly,a description method of the model is studied.With the description method the relative position of features in sheet metal parts can be expressed by four location description matrices.Thirdly,based on the relative position model and location description matrices,the equivalent definition of relationships of two feature groups is given which is the basis to calculate the similarity of two sheet metal parts.Next,the formula of retrieval algorithm for sheet metal parts is given.Finally,a prototype system is developed to test and verify the effectiveness of the retrieval method suggested.Experiments verify that the new method is able to meet the requirements of searching sheet metal parts and possesses potentials in practical application.

  16. A Novel Semi-blind Watermarking Algorithm Based on Fractal Dimension and Image Feature

    Institute of Scientific and Technical Information of China (English)

    NIRongrong; RUANQiuqi

    2004-01-01

    This paper presents a novel semi-blind watermarking algorithm based on fractal dimension and image feature. An original image is divided into blocks with fixed size. According to the idea of the second generation watermarking[1], the image is analyzed using fractal dimension to attain its feature blocks containing edges and textures that are used in the later embedding process and used to form a feature label. The watermark that is the fusion of the feature label and a binary copyright symbol not only represents the copyright symbol, but also reflects the feature of the image. Arnold iteration transform is employed to increase the security of watermark. Then,DCT (Discrete cosine transform) is applied to the feature blocks. The secure watermark that is adaptive to the individual image is embedded into the relations between middle-frequency coefficients and corresponding DC coefficients. The detection and extraction procedure is a semiblind one which does not use the original image but the watermark. Only those who have the original watermarkand the key can detect and extract the right watermark.This makes the approach authentic and have high securitylevel. Experimental results show that this algorithm can get good perceptual invisibility, adaptability and security.And it is robust against cropping, scribbling, low or highpass filtering, adding noise and JPEG compression.

  17. Feature selection of seismic waveforms for long period event detection at Cotopaxi Volcano

    Science.gov (United States)

    Lara-Cueva, R. A.; Benítez, D. S.; Carrera, E. V.; Ruiz, M.; Rojo-Álvarez, J. L.

    2016-04-01

    Volcano Early Warning Systems (VEWS) have become a research topic in order to preserve human lives and material losses. In this setting, event detection criteria based on classification using machine learning techniques have proven useful, and a number of systems have been proposed in the literature. However, to the best of our knowledge, no comprehensive and principled study has been conducted to compare the influence of the many different sets of possible features that have been used as input spaces in previous works. We present an automatic recognition system of volcano seismicity, by considering feature extraction, event classification, and subsequent event detection, in order to reduce the processing time as a first step towards a high reliability automatic detection system in real-time. We compiled and extracted a comprehensive set of temporal, moving average, spectral, and scale-domain features, for separating long period seismic events from background noise. We benchmarked two usual kinds of feature selection techniques, namely, filter (mutual information and statistical dependence) and embedded (cross-validation and pruning), each of them by using suitable and appropriate classification algorithms such as k Nearest Neighbors (k-NN) and Decision Trees (DT). We applied this approach to the seismicity presented at Cotopaxi Volcano in Ecuador during 2009 and 2010. The best results were obtained by using a 15 s segmentation window, feature matrix in the frequency domain, and DT classifier, yielding 99% of detection accuracy and sensitivity. Selected features and their interpretation were consistent among different input spaces, in simple terms of amplitude and spectral content. Our study provides the framework for an event detection system with high accuracy and reduced computational requirements.

  18. Microarray data classification using the spectral-feature-based TLS ensemble algorithm.

    Science.gov (United States)

    Sun, Zhan-Li; Wang, Han; Lau, Wai-Shing; Seet, Gerald; Wang, Danwei; Lam, Kin-Man

    2014-09-01

    The reliable and accurate identification of cancer categories is crucial to a successful diagnosis and a proper treatment of the disease. In most existing work, samples of gene expression data are treated as one-dimensional signals, and are analyzed by means of some statistical signal processing techniques or intelligent computation algorithms. In this paper, from an image-processing viewpoint, a spectral-feature-based Tikhonov-regularized least-squares (TLS) ensemble algorithm is proposed for cancer classification using gene expression data. In the TLS model, a test sample is represented as a linear combination of the atoms of a dictionary. Two types of dictionaries, namely singular value decomposition (SVD)-based eigenassays and independent component analysis (ICA)-based eigenassays, are proposed for the TLS model, and both are extracted via a two-stage approach. The proposed algorithm is inspired by our finding that, among these eigenassays, the categories of some of the testing samples can be assigned correctly by using the TLS models formed from some of the spectral features, but not for those formed from the original samples only. In order to retain the positive characteristics of these spectral features in making correct category assignments, a strategy of classifier committee learning (CCL) is designed to combine the results obtained from the different spectral features. Experimental results on standard databases demonstrate the feasibility and effectiveness of the proposed method.

  19. Simple mineral mapping algorithm based on multitype spectral diagnostic absorption features: a case study at Cuprite, Nevada

    Science.gov (United States)

    Wei, Jing; Ming, Yanfang; Jia, Qiang; Yang, Dongxu

    2017-04-01

    Hyperspectral remote sensing has been widely used in mineral identification using the particularly useful short-wave infrared (SWIR) wavelengths (1.0 to 2.5 μm). Current mineral mapping methods are easily limited by the sensor's radiometric sensitivity and atmospheric effects. Therefore, a simple mineral mapping algorithm (SMMA) based on the combined application with multitype diagnostic SWIR absorption features for hyperspectral data is proposed. A total of nine absorption features are calculated, respectively, from the airborne visible/infrared imaging spectrometer data, the Hyperion hyperspectral data, and the ground reference spectra data collected from the United States Geological Survey (USGS) spectral library. Based on spectral analysis and statistics, a mineral mapping decision-tree model for the Cuprite mining district in Nevada, USA, is constructed. Then, the SMMA algorithm is used to perform mineral mapping experiments. The mineral map from the USGS (USGS map) in the Cuprite area is selected for validation purposes. Results showed that the SMMA algorithm is able to identify most minerals with high coincidence with USGS map results. Compared with Hyperion data (overall accuracy=74.54%), AVIRIS data showed overall better mineral mapping results (overall accuracy=94.82%) due to low signal-to-noise ratio and high spatial resolution.

  20. Feature Extraction and Selection Scheme for Intelligent Engine Fault Diagnosis Based on 2DNMF, Mutual Information, and NSGA-II

    Directory of Open Access Journals (Sweden)

    Peng-yuan Liu

    2016-01-01

    Full Text Available A novel feature extraction and selection scheme is presented for intelligent engine fault diagnosis by utilizing two-dimensional nonnegative matrix factorization (2DNMF, mutual information, and nondominated sorting genetic algorithms II (NSGA-II. Experiments are conducted on an engine test rig, in which eight different engine operating conditions including one normal condition and seven fault conditions are simulated, to evaluate the presented feature extraction and selection scheme. In the phase of feature extraction, the S transform technique is firstly utilized to convert the engine vibration signals to time-frequency domain, which can provide richer information on engine operating conditions. Then a novel feature extraction technique, named two-dimensional nonnegative matrix factorization, is employed for characterizing the time-frequency representations. In the feature selection phase, a hybrid filter and wrapper scheme based on mutual information and NSGA-II is utilized to acquire a compact feature subset for engine fault diagnosis. Experimental results by adopted three different classifiers have demonstrated that the proposed feature extraction and selection scheme can achieve a very satisfying classification performance with fewer features for engine fault diagnosis.

  1. Road network selection for small-scale maps using an improved centrality-based algorithm

    Directory of Open Access Journals (Sweden)

    Roy Weiss

    2014-12-01

    Full Text Available The road network is one of the key feature classes in topographic maps and databases. In the task of deriving road networks for products at smaller scales, road network selection forms a prerequisite for all other generalization operators, and is thus a fundamental operation in the overall process of topographic map and database production. The objective of this work was to develop an algorithm for automated road network selection from a large-scale (1:10,000 to a small-scale database (1:200,000. The project was pursued in collaboration with swisstopo, the national mapping agency of Switzerland, with generic mapping requirements in mind. Preliminary experiments suggested that a selection algorithm based on betweenness centrality performed best for this purpose, yet also exposed problems. The main contribution of this paper thus consists of four extensions that address deficiencies of the basic centrality-based algorithm and lead to a significant improvement of the results. The first two extensions improve the formation of strokes concatenating the road segments, which is crucial since strokes provide the foundation upon which the network centrality measure is computed. Thus, the first extension ensures that roundabouts are detected and collapsed, thus avoiding interruptions of strokes by roundabouts, while the second introduces additional semantics in the process of stroke formation, allowing longer and more plausible strokes to built. The third extension detects areas of high road density (i.e., urban areas using density-based clustering and then locally increases the threshold of the centrality measure used to select road segments, such that more thinning takes place in those areas. Finally, since the basic algorithm tends to create dead-ends—which however are not tolerated in small-scale maps—the fourth extension reconnects these dead-ends to the main network, searching for the best path in the main heading of the dead-end.

  2. A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders.

    Science.gov (United States)

    Tekin Erguzel, Turker; Tas, Cumhur; Cebi, Merve

    2015-09-01

    Feature selection (FS) and classification are consecutive artificial intelligence (AI) methods used in data analysis, pattern classification, data mining and medical informatics. Beside promising studies in the application of AI methods to health informatics, working with more informative features is crucial in order to contribute to early diagnosis. Being one of the prevalent psychiatric disorders, depressive episodes of bipolar disorder (BD) is often misdiagnosed as major depressive disorder (MDD), leading to suboptimal therapy and poor outcomes. Therefore discriminating MDD and BD at earlier stages of illness could help to facilitate efficient and specific treatment. In this study, a nature inspired and novel FS algorithm based on standard Ant Colony Optimization (ACO), called improved ACO (IACO), was used to reduce the number of features by removing irrelevant and redundant data. The selected features were then fed into support vector machine (SVM), a powerful mathematical tool for data classification, regression, function estimation and modeling processes, in order to classify MDD and BD subjects. Proposed method used coherence, a promising quantitative electroencephalography (EEG) biomarker, values calculated from alpha, theta and delta frequency bands. The noteworthy performance of novel IACO-SVM approach stated that it is possible to discriminate 46 BD and 55 MDD subjects using 22 of 48 features with 80.19% overall classification accuracy. The performance of IACO algorithm was also compared to the performance of standard ACO, genetic algorithm (GA) and particle swarm optimization (PSO) algorithms in terms of their classification accuracy and number of selected features. In order to provide an almost unbiased estimate of classification error, the validation process was performed using nested cross-validation (CV) procedure.

  3. Localization of neural efficiency of the mathematically gifted brain through a feature subset selection method.

    Science.gov (United States)

    Zhang, Li; Gan, John Q; Wang, Haixian

    2015-10-01

    Based on the neural efficiency hypothesis and task-induced EEG gamma-band response (GBR), this study investigated the brain regions where neural resource could be most efficiently recruited by the math-gifted adolescents in response to varying cognitive demands. In this experiment, various GBR-based mental states were generated with three factors (level of mathematical ability, task complexity, and short-term learning) modulating the level of neural activation. A feature subset selection method based on the sequential forward floating search algorithm was used to identify an "optimal" combination of EEG channel locations, where the corresponding GBR feature subset could obtain the highest accuracy in discriminating pairwise mental states influenced by each experiment factor. The integrative results from multi-factor selections suggest that the right-lateral fronto-parietal system is highly involved in neural efficiency of the math-gifted brain, primarily including the bilateral superior frontal, right inferior frontal, right-lateral central and right temporal regions. By means of the localization method based on single-trial classification of mental states, new GBR features and EEG channel-based brain regions related to mathematical giftedness were identified, which could be useful for the brain function improvement of children/adolescents in mathematical learning through brain-computer interface systems.

  4. A multi-feature tracking algorithm enabling adaptation to context variations

    CERN Document Server

    Chau, Duc Phu; Thonnat, Monique

    2011-01-01

    We propose in this paper a tracking algorithm which is able to adapt itself to different scene contexts. A feature pool is used to compute the matching score between two detected objects. This feature pool includes 2D, 3D displacement distances, 2D sizes, color histogram, histogram of oriented gradient (HOG), color covariance and dominant color. An offline learning process is proposed to search for useful features and to estimate their weights for each context. In the online tracking process, a temporal window is defined to establish the links between the detected objects. This enables to find the object trajectories even if the objects are misdetected in some frames. A trajectory filter is proposed to remove noisy trajectories. Experimentation on different contexts is shown. The proposed tracker has been tested in videos belon