WorldWideScience

Sample records for based feature selection

  1. A Genetic Algorithm-Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Babatunde Oluleye

    2014-07-01

    Full Text Available This article details the exploration and application of Genetic Algorithm (GA for feature selection. Particularly a binary GA was used for dimensionality reduction to enhance the performance of the concerned classifiers. In this work, hundred (100 features were extracted from set of images found in the Flavia dataset (a publicly available dataset. The extracted features are Zernike Moments (ZM, Fourier Descriptors (FD, Lengendre Moments (LM, Hu 7 Moments (Hu7M, Texture Properties (TP and Geometrical Properties (GP. The main contributions of this article are (1 detailed documentation of the GA Toolbox in MATLAB and (2 the development of a GA-based feature selector using a novel fitness function (kNN-based classification error which enabled the GA to obtain a combinatorial set of feature giving rise to optimal accuracy. The results obtained were compared with various feature selectors from WEKA software and obtained better results in many ways than WEKA feature selectors in terms of classification accuracy

  2. Feature subset selection based on relevance

    Science.gov (United States)

    Wang, Hui; Bell, David; Murtagh, Fionn

    In this paper an axiomatic characterisation of feature subset selection is presented. Two axioms are presented: sufficiency axiom—preservation of learning information, and necessity axiom—minimising encoding length. The sufficiency axiom concerns the existing dataset and is derived based on the following understanding: any selected feature subset should be able to describe the training dataset without losing information, i.e. it is consistent with the training dataset. The necessity axiom concerns the predictability and is derived from Occam's razor, which states that the simplest among different alternatives is preferred for prediction. The two axioms are then restated in terms of relevance in a concise form: maximising both the r( X; Y) and r( Y; X) relevance. Based on the relevance characterisation, four feature subset selection algorithms are presented and analysed: one is exhaustive and the remaining three are heuristic. Experimentation is also presented and the results are encouraging. Comparison is also made with some well-known feature subset selection algorithms, in particular, with the built-in feature selection mechanism in C4.5.

  3. Feature selection with neighborhood entropy-based cooperative game theory.

    Science.gov (United States)

    Zeng, Kai; She, Kun; Niu, Xinzheng

    2014-01-01

    Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT) yield better performance than classical ones.

  4. Feature Selection for Neural Network Based Stock Prediction

    Science.gov (United States)

    Sugunnasil, Prompong; Somhom, Samerkae

    We propose a new methodology of feature selection for stock movement prediction. The methodology is based upon finding those features which minimize the correlation relation function. We first produce all the combination of feature and evaluate each of them by using our evaluate function. We search through the generated set with hill climbing approach. The self-organizing map based stock prediction model is utilized as the prediction method. We conduct the experiment on data sets of the Microsoft Corporation, General Electric Co. and Ford Motor Co. The results show that our feature selection method can improve the efficiency of the neural network based stock prediction.

  5. Feature Selection for Image Retrieval based on Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Preeti Kushwaha

    2016-12-01

    Full Text Available This paper describes the development and implementation of feature selection for content based image retrieval. We are working on CBIR system with new efficient technique. In this system, we use multi feature extraction such as colour, texture and shape. The three techniques are used for feature extraction such as colour moment, gray level co- occurrence matrix and edge histogram descriptor. To reduce curse of dimensionality and find best optimal features from feature set using feature selection based on genetic algorithm. These features are divided into similar image classes using clustering for fast retrieval and improve the execution time. Clustering technique is done by k-means algorithm. The experimental result shows feature selection using GA reduces the time for retrieval and also increases the retrieval precision, thus it gives better and faster results as compared to normal image retrieval system. The result also shows precision and recall of proposed approach compared to previous approach for each image class. The CBIR system is more efficient and better performs using feature selection based on Genetic Algorithm.

  6. Lazy learner text categorization algorithm based on embedded feature selection

    Institute of Scientific and Technical Information of China (English)

    Yan Peng; Zheng Xuefeng; Zhu Jianyong; Xiao Yunhong

    2009-01-01

    To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.

  7. [Electroencephalogram Feature Selection Based on Correlation Coefficient Analysis].

    Science.gov (United States)

    Zhou, Jinzhi; Tang, Xiaofang

    2015-08-01

    In order to improve the accuracy of classification with small amount of motor imagery training data on the development of brain-computer interface (BCD systems, we proposed an analyzing method to automatically select the characteristic parameters based on correlation coefficient analysis. Throughout the five sample data of dataset IV a from 2005 BCI Competition, we utilized short-time Fourier transform (STFT) and correlation coefficient calculation to reduce the number of primitive electroencephalogram dimension, then introduced feature extraction based on common spatial pattern (CSP) and classified by linear discriminant analysis (LDA). Simulation results showed that the average rate of classification accuracy could be improved by using correlation coefficient feature selection method than those without using this algorithm. Comparing with support vector machine (SVM) optimization features algorithm, the correlation coefficient analysis can lead better selection parameters to improve the accuracy of classification.

  8. Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2015-01-01

    Full Text Available Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs. Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  9. Simultaneous channel and feature selection of fused EEG features based on Sparse Group Lasso.

    Science.gov (United States)

    Wang, Jin-Jia; Xue, Fang; Li, Hui

    2015-01-01

    Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs). Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  10. Feature selection gait-based gender classification under different circumstances

    Science.gov (United States)

    Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

    2014-05-01

    This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.

  11. Using PSO-Based Hierarchical Feature Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Zhiwei Ji

    2014-01-01

    Full Text Available Hepatocellular carcinoma (HCC is one of the most common malignant tumors. Clinical symptoms attributable to HCC are usually absent, thus often miss the best therapeutic opportunities. Traditional Chinese Medicine (TCM plays an active role in diagnosis and treatment of HCC. In this paper, we proposed a particle swarm optimization-based hierarchical feature selection (PSOHFS model to infer potential syndromes for diagnosis of HCC. Firstly, the hierarchical feature representation is developed by a three-layer tree. The clinical symptoms and positive score of patient are leaf nodes and root in the tree, respectively, while each syndrome feature on the middle layer is extracted from a group of symptoms. Secondly, an improved PSO-based algorithm is applied in a new reduced feature space to search an optimal syndrome subset. Based on the result of feature selection, the causal relationships of symptoms and syndromes are inferred via Bayesian networks. In our experiment, 147 symptoms were aggregated into 27 groups and 27 syndrome features were extracted. The proposed approach discovered 24 syndromes which obviously improved the diagnosis accuracy. Finally, the Bayesian approach was applied to represent the causal relationships both at symptom and syndrome levels. The results show that our computational model can facilitate the clinical diagnosis of HCC.

  12. Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data

    Indian Academy of Sciences (India)

    Debarka Sengupta; Indranil Aich; Sanghamitra Bandyopadhyay

    2015-10-01

    Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.

  13. Unsupervised Feature Selection Based on the Morisita Index

    Science.gov (United States)

    Golay, Jean; Kanevski, Mikhail

    2016-04-01

    Recent breakthroughs in technology have radically improved our ability to collect and store data. As a consequence, the size of datasets has been increasing rapidly both in terms of number of variables (or features) and number of instances. Since the mechanism of many phenomena is not well known, too many variables are sampled. A lot of them are redundant and contribute to the emergence of three major challenges in data mining: (1) the complexity of result interpretation, (2) the necessity to develop new methods and tools for data processing, (3) the possible reduction in the accuracy of learning algorithms because of the curse of dimensionality. This research deals with a new algorithm for selecting the smallest subset of features conveying all the information of a dataset (i.e. an algorithm for removing redundant features). It is a new version of the Fractal Dimensionality Reduction (FDR) algorithm [1] and it relies on two ideas: (a) In general, data lie on non-linear manifolds of much lower dimension than that of the spaces where they are embedded. (b) The situation describes in (a) is partly due to redundant variables, since they do not contribute to increasing the dimension of manifolds, called Intrinsic Dimension (ID). The suggested algorithm implements these ideas by selecting only the variables influencing the data ID. Unlike the FDR algorithm, it resorts to a recently introduced ID estimator [2] based on the Morisita index of clustering and to a sequential forward search strategy. Consequently, in addition to its ability to capture non-linear dependences, it can deal with large datasets and its implementation is straightforward in any programming environment. Many real world case studies are considered. They are related to environmental pollution and renewable resources. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158

  14. Dominant Local Binary Pattern Based Face Feature Selection and Detection

    Directory of Open Access Journals (Sweden)

    Kavitha.T

    2010-04-01

    Full Text Available Face Detection plays a major role in Biometrics.Feature selection is a problem of formidable complexity. Thispaper proposes a novel approach to extract face features forface detection. The LBP features can be extracted faster in asingle scan through the raw image and lie in a lower dimensional space, whilst still retaining facial information efficiently. The LBP features are robust to low-resolution images. The dominant local binary pattern (DLBP is used to extract features accurately. A number of trainable methods are emerging in the empirical practice due to their effectiveness. The proposed method is a trainable system for selecting face features from over-completes dictionaries of imagemeasurements. After the feature selection procedure is completed the SVM classifier is used for face detection. The main advantage of this proposal is that it is trained on a very small training set. The classifier is used to increase the selection accuracy. This is not only advantageous to facilitate the datagathering stage, but, more importantly, to limit the training time. CBCL frontal faces dataset is used for training and validation.

  15. Emotion of Physiological Signals Classification Based on TS Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Wang Yujing; Mo Jianlin

    2015-01-01

    This paper propose a method of TS-MLP about emotion recognition of physiological signal.It can recognize emotion successfully by Tabu search which selects features of emotion’s physiological signals and multilayer perceptron that is used to classify emotion.Simulation shows that it has achieved good emotion classification performance.

  16. Linear feature selection in texture analysis - A PLS based method

    DEFF Research Database (Denmark)

    Marques, Joselene; Igel, Christian; Lillholm, Martin

    2013-01-01

    We present a texture analysis methodology that combined uncommitted machine-learning techniques and partial least square (PLS) in a fully automatic framework. Our approach introduces a robust PLS-based dimensionality reduction (DR) step to specifically address outliers and high-dimensional featur...

  17. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  18. [Feature extraction for breast cancer data based on geometric algebra theory and feature selection using differential evolution].

    Science.gov (United States)

    Li, Jing; Hong, Wenxue

    2014-12-01

    The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.

  19. Supervised Feature Subset Selection based on Modified Fuzzy Relative Information Measure for classifier Cart

    Directory of Open Access Journals (Sweden)

    K.SAROJINI,

    2010-06-01

    Full Text Available Feature subset selection is an essential task in data mining. This paper presents a new method for dealing with supervised feature subset selection based on Modified Fuzzy Relative Information Measure (MFRIM. First, Discretization algorithm is applied to discretize numeric features to construct the membership functions of each fuzzy sets of a feature. Then the proposed MFRIM is applied to select the feature subset focusing on boundary samples. The proposed method can select feature subset with minimum number of features, which are relevant to get higher average classification accuracy for datasets. The experimental results with UCI datasets show that the proposed algorithm is effective and efficient in selecting subset with minimum number of features getting higher average classification accuracy than the consistency based feature subset selection method.

  20. A bidirectional feature selection method based on mutual information and redundancy-synergy coefficient

    Institute of Scientific and Technical Information of China (English)

    YANG Sheng; ZHANG Zhi; SHI Peng-fei

    2006-01-01

    Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calculate the mutual information of feature subset. The feature relevancy is defined by mutual information. Redundancy-synergy coefficient, a novel redundancy and synergy measure for features to describe the class feature, is defined. In terms of information maximization rule, a bidirectional heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient is presented. This study' s experiments show the good performance of the new method.

  1. GAIN RATIO BASED FEATURE SELECTION METHOD FOR PRIVACY PRESERVATION

    Directory of Open Access Journals (Sweden)

    R. Praveena Priyadarsini

    2011-04-01

    Full Text Available Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional datasets like adult and census. since, both the data sets are high dimensional, feature subset selection method like Gain Ratio is applied and the attributes of the datasets are ranked and low ranking attributes are filtered to form new reduced data subsets. K-anonymization privacy preservation technique is then applied on reduced datasets. The accuracy of the privacy preserved reduced datasets and the original datasets are compared for their accuracy on the two functionalities of data mining namely classification and clustering using naïve Bayesian and k-means algorithm respectively. Experimental results show that classification and clustering accuracy are comparatively the same for reduced k-anonym zed datasets and the original data sets.

  2. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    Science.gov (United States)

    Wang, Xiaojia; Mao, Qirong; Zhan, Yongzhao

    2008-11-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  3. Soft computing based feature selection for environmental sound classification

    NARCIS (Netherlands)

    Shakoor, A.; May, T.M.; Van Schijndel, N.H.

    2010-01-01

    Environmental sound classification has a wide range of applications,like hearing aids, mobile communication devices, portable media players, and auditory protection devices. Sound classification systemstypically extract features from the input sound. Using too many features increases complexity unne

  4. Feature selection method based on multi-fractal dimension and harmony search algorithm and its application

    Science.gov (United States)

    Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na

    2016-10-01

    Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.

  5. Feature Selection Based on the SVM Weight Vector for Classification of Dementia.

    Science.gov (United States)

    Bron, Esther E; Smits, Marion; Niessen, Wiro J; Klein, Stefan

    2015-09-01

    Computer-aided diagnosis of dementia using a support vector machine (SVM) can be improved with feature selection. The relevance of individual features can be quantified from the SVM weights as a significance map (p-map). Although these p-maps previously showed clusters of relevant voxels in dementia-related brain regions, they have not yet been used for feature selection. Therefore, we introduce two novel feature selection methods based on p-maps using a direct approach (filter) and an iterative approach (wrapper). To evaluate these p-map feature selection methods, we compared them with methods based on the SVM weight vector directly, t-statistics, and expert knowledge. We used MRI data from the Alzheimer's disease neuroimaging initiative classifying Alzheimer's disease (AD) patients, mild cognitive impairment (MCI) patients who converted to AD (MCIc), MCI patients who did not convert to AD (MCInc), and cognitively normal controls (CN). Features for each voxel were derived from gray matter morphometry. Feature selection based on the SVM weights gave better results than t-statistics and expert knowledge. The p-map methods performed slightly better than those using the weight vector. The wrapper method scored better than the filter method. Recursive feature elimination based on the p-map improved most for AD-CN: the area under the receiver-operating-characteristic curve (AUC) significantly increased from 90.3% without feature selection to 92.0% when selecting 1.5%-3% of the features. This feature selection method also improved the other classifications: AD-MCI 0.1% improvement in AUC (not significant), MCI-CN 0.7%, and MCIc-MCInc 0.1% (not significant). Although the performance improvement due to feature selection was limited, the methods based on the p-map generally had the best performance, and were therefore better in estimating the relevance of individual features.

  6. A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

    Directory of Open Access Journals (Sweden)

    Yong Wang

    2016-02-01

    Full Text Available Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

  7. Feature selection based on mutual information and redundancy-synergy coefficient

    Institute of Scientific and Technical Information of China (English)

    杨胜; 顾钧

    2004-01-01

    Mutual information is an important information measure for feature subset. In this paper, a hashing mechanism is proposed to calculate the mutual information on the feature subset. Redundancy-synergy coefficient, a novel redundancy and synergy measure of features to express the class feature, is defined by mutual information. The information maximization rule was applied to derive the heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient. Our experiment results showed the good performance of the new feature selection method.

  8. Fast scenario-based design space exploration using feature selection

    NARCIS (Netherlands)

    van Stralen, P.; Pimentel, A.; Mühl, G.; Richling, J.; Herkersdorf, A.

    2012-01-01

    This paper presents a novel approach to efficiently perform early system level design space exploration (DSE) of MultiProcessor System-on-Chip (MPSoC) based embedded systems. By modeling dynamic multi-application workloads using application scenarios, optimal designs can be quickly identified using

  9. X-ray image enhancement via determinant based feature selection.

    Science.gov (United States)

    Tappenden, R; Hegarty, J; Broughton, R; Butler, A; Coope, I; Renaud, P

    2013-12-01

    Previous work has investigated the feasibility of using Eigenimage-based enhancement tools to highlight abnormalities on chest X-rays (Butler et al in J Med Imaging Radiat Oncol 52:244-253, 2008). While promising, this approach has been limited by computational restrictions of standard clinical workstations, and uncertainty regarding what constitutes an adequate sample size. This paper suggests an alternative mathematical model to the above referenced singular value decomposition method, which can significantly reduce both the required sample size and the time needed to perform analysis. Using this approach images can be efficiently separated into normal and abnormal parts, with the potential for rapid highlighting of pathology.

  10. Mutual information-based feature selection for low-cost BCIs based on motor imagery.

    Science.gov (United States)

    Schiatti, L; Faes, L; Tessadori, J; Barresi, G; Mattos, L

    2016-08-01

    In the present study a feature selection algorithm based on mutual information (MI) was applied to electro-encephalographic (EEG) data acquired during three different motor imagery tasks from two dataset: Dataset I from BCI Competition IV including full scalp recordings from four subjects, and new data recorded from three subjects using the popular low-cost Emotiv EPOC EEG headset. The aim was to evaluate optimal channels and band-power (BP) features for motor imagery tasks discrimination, in order to assess the feasibility of a portable low-cost motor imagery based Brain-Computer Interface (BCI) system. The minimal sub set of features most relevant to task description and less redundant to each other was determined, and the corresponding classification accuracy was assessed offline employing linear support vector machine (SVM) in a 10-fold cross validation scheme. The analysis was performed: (a) on the original full Dataset I from BCI competition IV, (b) on a restricted channels set from Dataset I corresponding to available Emotiv EPOC electrodes locations, and (c) on data recorded with the EPOC system. Results from (a) showed that an offline classification accuracy above 80% can be reached using only 5 features. Limiting the analysis to EPOC channels caused a decrease of classification accuracy, although it still remained above chance level, both for data from (b) and (c). A top accuracy of 70% was achieved using 2 optimal features. These results encourage further research towards the development of portable low cost motor imagery-based BCI systems.

  11. Regression-Based Feature Selection on Large Scale Human Activity Recognition

    Directory of Open Access Journals (Sweden)

    Hussein Mazaar

    2016-02-01

    Full Text Available In this paper, we present an approach for regression-based feature selection in human activity recognition. Due to high dimensional features in human activity recognition, the model may have over-fitting and can’t learn parameters well. Moreover, the features are redundant or irrelevant. The goal is to select important discriminating features to recognize the human activities in videos. R-Squared regression criterion can identify the best features based on the ability of a feature to explain the variations in the target class. The features are significantly reduced, nearly by 99.33%, resulting in better classification accuracy. Support Vector Machine with a linear kernel is used to classify the activities. The experiments are tested on UCF50 dataset. The results show that the proposed model significantly outperforms state-of-the-art methods.

  12. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information

    Directory of Open Access Journals (Sweden)

    Atiyeh Mortazavi

    2016-01-01

    Full Text Available High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  13. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

    Science.gov (United States)

    Mortazavi, Atiyeh; Moattar, Mohammad Hossein

    2016-01-01

    High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  14. Different Cortical Mechanisms for Spatial vs. Feature-Based Attentional Selection in Visual Working Memory

    Science.gov (United States)

    Heuer, Anna; Schubö, Anna; Crawford, J. D.

    2016-01-01

    The limited capacity of visual working memory (VWM) necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial vs. feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS) to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information) or their shape (featural information). We found that TMS over the supramarginal gyrus (SMG) selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex (LO) selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in terms of attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations. PMID:27582701

  15. Different cortical mechanisms for spatial vs. feature-based attentional selection in visual working memory

    Directory of Open Access Journals (Sweden)

    Anna Heuer

    2016-08-01

    Full Text Available The limited capacity of visual working memory necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial versus feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information or their shape (featural information. We found that TMS over the supramarginal gyrus (SMG selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations.

  16. Analysis of different feature selection criteria based on a covariance convergence perspective for a SLAM algorithm.

    Science.gov (United States)

    Auat Cheein, Fernando A; Carelli, Ricardo

    2011-01-01

    This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM) algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter) SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment composed by trees, although the results shown herein are not restricted to a special type of features.

  17. Analysis of Different Feature Selection Criteria Based on a Covariance Convergence Perspective for a SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando A. Auat Cheein

    2010-12-01

    Full Text Available This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment  composed by trees, although the results shown herein are not restricted to a special type of features.

  18. Emotional textile image classification based on cross-domain convolutional sparse autoencoders with feature selection

    Science.gov (United States)

    Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin

    2017-01-01

    We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.

  19. An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

    Directory of Open Access Journals (Sweden)

    Ali Ahmed

    2011-01-01

    Full Text Available Problem statement: Similarity based Virtual Screening (VS deals with a large amount of data containing irrelevant and/or redundant fragments or features. Recent use of Bayesian network as an alternative for existing tools for similarity based VS has received noticeable attention of the researchers in the field of chemoinformatics. Approach: To this end, different models of Bayesian network have been developed. In this study, we enhance the Bayesian Inference Network (BIN using a subset of selected molecules features. Results: In this approach, a few features were filtered from the molecular fingerprint features based on a features selection approach. Conclusion: Simulated virtual screening experiments with MDL Drug Data Report (MDDR data sets showed that the proposed method provides simple ways of enhancing the cost effectiveness of ligand-based virtual screening searches, especially for higher diversity data set.

  20. AlPOs Synthetic Factor Analysis Based on Maximum Weight and Minimum Redundancy Feature Selection

    Directory of Open Access Journals (Sweden)

    Yinghua Lv

    2013-11-01

    Full Text Available The relationship between synthetic factors and the resulting structures is critical for rational synthesis of zeolites and related microporous materials. In this paper, we develop a new feature selection method for synthetic factor analysis of (6,12-ring-containing microporous aluminophosphates (AlPOs. The proposed method is based on a maximum weight and minimum redundancy criterion. With the proposed method, we can select the feature subset in which the features are most relevant to the synthetic structure while the redundancy among these selected features is minimal. Based on the database of AlPO synthesis, we use (6,12-ring-containing AlPOs as the target class and incorporate 21 synthetic factors including gel composition, solvent and organic template to predict the formation of (6,12-ring-containing microporous aluminophosphates (AlPOs. From these 21 features, 12 selected features are deemed as the optimized features to distinguish (6,12-ring-containing AlPOs from other AlPOs without such rings. The prediction model achieves a classification accuracy rate of 91.12% using the optimal feature subset. Comprehensive experiments demonstrate the effectiveness of the proposed algorithm, and deep analysis is given for the synthetic factors selected by the proposed method.

  1. Performance Evaluation of Content Based Image Retrieval on Feature Optimization and Selection Using Swarm Intelligence

    Directory of Open Access Journals (Sweden)

    Kirti Jain

    2016-03-01

    Full Text Available The diversity and applicability of swarm intelligence is increasing everyday in the fields of science and engineering. Swarm intelligence gives the features of the dynamic features optimization concept. We have used swarm intelligence for the process of feature optimization and feature selection for content-based image retrieval. The performance of content-based image retrieval faced the problem of precision and recall. The value of precision and recall depends on the retrieval capacity of the image. The basic raw image content has visual features such as color, texture, shape and size. The partial feature extraction technique is based on geometric invariant function. Three swarm intelligence algorithms were used for the optimization of features: ant colony optimization, particle swarm optimization (PSO, and glowworm optimization algorithm. Coral image dataset and MatLab software were used for evaluating performance.

  2. Feature subset selection based on mahalanobis distance: a statistical rough set method

    Institute of Scientific and Technical Information of China (English)

    Sun Liang; Han Chongzhao

    2008-01-01

    In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalanobis distance, the effective feature subsets are obtained as generalized attribute reducts. Experiment result shows that the classification performance can be improved by using the selected feature subsets.

  3. Entropy based unsupervised Feature Selection in digital mammogram image using rough set theory.

    Science.gov (United States)

    Velayutham, C; Thangavel, K

    2012-01-01

    Feature Selection (FS) is a process, which attempts to select features, which are more informative. In the supervised FS methods various feature subsets are evaluated using an evaluation function or metric to select only those features, which are related to the decision classes of the data under consideration. However, for many data mining applications, decision class labels are often unknown or incomplete, thus indicating the significance of unsupervised FS. However, in unsupervised learning, decision class labels are not provided. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. In this paper, a novel unsupervised FS in mammogram image, using rough set-based entropy measures, is proposed. A typical mammogram image processing system generally consists of mammogram image acquisition, pre-processing of image, segmentation, features extracted from the segmented mammogram image. The proposed method is used to select features from data set, the method is compared with the existing rough set-based supervised FS methods and classification performance of both methods are recorded and demonstrates the efficiency of the method.

  4. Biometric hashing for handwriting: entropy-based feature selection and semantic fusion

    Science.gov (United States)

    Scheidat, Tobias; Vielhauer, Claus

    2008-02-01

    Some biometric algorithms lack of the problem of using a great number of features, which were extracted from the raw data. This often results in feature vectors of high dimensionality and thus high computational complexity. However, in many cases subsets of features do not contribute or with only little impact to the correct classification of biometric algorithms. The process of choosing more discriminative features from a given set is commonly referred to as feature selection. In this paper we present a study on feature selection for an existing biometric hash generation algorithm for the handwriting modality, which is based on the strategy of entropy analysis of single components of biometric hash vectors, in order to identify and suppress elements carrying little information. To evaluate the impact of our feature selection scheme to the authentication performance of our biometric algorithm, we present an experimental study based on data of 86 users. Besides discussing common biometric error rates such as Equal Error Rates, we suggest a novel measurement to determine the reproduction rate probability for biometric hashes. Our experiments show that, while the feature set size may be significantly reduced by 45% using our scheme, there are marginal changes both in the results of a verification process as well as in the reproducibility of biometric hashes. Since multi-biometrics is a recent topic, we additionally carry out a first study on a pair wise multi-semantic fusion based on reduced hashes and analyze it by the introduced reproducibility measure.

  5. Improving Image steganalysis performance using a graph-based feature selection method

    Directory of Open Access Journals (Sweden)

    Amir Nouri

    2016-05-01

    Full Text Available Steganalysis is the skill of discovering the use of steganography algorithms within an image with low or no information regarding the steganography algorithm or/and its parameters. The high-dimensionality of image data with small number of samples has presented a difficult challenge for the steganalysis task. Several methods have been presented to improve the steganalysis performance by feature selection. Feature selection, also known as variable selection, is one of the fundamental problems in the fields of machine learning, pattern recognition and statistics. The aim of feature selection is to reduce the dimensionality of image data in order to enhance the accuracy of Steganalysis task. In this paper, we have proposed a new graph-based blind steganalysis method for detecting stego images from the cover images in JPEG images using a feature selection technique based on community detection. The experimental results show that the proposed approach is easy to be employed for steganalysis purposes. Moreover, performance of proposed method is better than several recent and well-known feature selection-based Image steganalysis methods.

  6. Artificial immune system based on adaptive clonal selection for feature selection and parameters optimisation of support vector machines

    Science.gov (United States)

    Sadat Hashemipour, Maryam; Soleimani, Seyed Ali

    2016-01-01

    Artificial immune system (AIS) algorithm based on clonal selection method can be defined as a soft computing method inspired by theoretical immune system in order to solve science and engineering problems. Support vector machine (SVM) is a popular pattern classification method with many diverse applications. Kernel parameter setting in the SVM training procedure along with the feature selection significantly impacts on the classification accuracy rate. In this study, AIS based on Adaptive Clonal Selection (AISACS) algorithm has been used to optimise the SVM parameters and feature subset selection without degrading the SVM classification accuracy. Several public datasets of University of California Irvine machine learning (UCI) repository are employed to calculate the classification accuracy rate in order to evaluate the AISACS approach then it was compared with grid search algorithm and Genetic Algorithm (GA) approach. The experimental results show that the feature reduction rate and running time of the AISACS approach are better than the GA approach.

  7. Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation

    CERN Document Server

    Tangaro, Sabina; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Inglese, Paolo; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto

    2015-01-01

    Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic Resonance Imaging (MRI) scans can show these variations and therefore be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach, for each voxel a number of local features were calculated. In this paper we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) Sequential Forward Selection and (iii) Sequential Backward Elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects...

  8. Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification

    CERN Document Server

    Nongmeikapam, Kishorjit; 10.5121/ijcsit.2011.350

    2011-01-01

    This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an important factor in the recognition of Manipuri MWEs using Conditional Random Field (CRF). The disadvantage of manual selection and choosing of the appropriate features for running CRF motivates us to think of Genetic Algorithm (GA). Using GA we are able to find the optimal features to run the CRF. We have tried with fifty generations in feature selection along with three fold cross validation as fitness function. This model demonstrated the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%, showing an improvement over the CRF based Manipuri MWE identification without GA application.

  9. Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

    Science.gov (United States)

    Koromyslova, A.; Semenkina, M.; Sergienko, R.

    2017-02-01

    The text classification problem for natural language call routing was considered in the paper. Seven different term weighting methods were applied. As dimensionality reduction methods, the feature selection based on self-adaptive GA is considered. k-NN, linear SVM and ANN were used as classification algorithms. The tasks of the research are the following: perform research of text classification for natural language call routing with different term weighting methods and classification algorithms and investigate the feature selection method based on self-adaptive GA. The numerical results showed that the most effective term weighting is TRR. The most effective classification algorithm is ANN. Feature selection with self-adaptive GA provides improvement of classification effectiveness and significant dimensionality reduction with all term weighting methods and with all classification algorithms.

  10. DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm

    KAUST Repository

    Soufan, Othman

    2015-02-26

    Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem\\'s dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filteringmethods thatmay be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.

  11. Attentional spreading to task-irrelevant object features: experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation.

    Science.gov (United States)

    Wegener, Detlef; Galashan, Fingal Orlando; Aurich, Maike Kathrin; Kreiter, Andreas Kurt

    2014-01-01

    Directing attention to a specific feature of an object has been linked to different forms of attentional modulation. Object-based attention theory founds on the finding that even task-irrelevant features at the selected object are subject to attentional modulation, while feature-based attention theory proposes a global processing benefit for the selected feature even at other objects. Most studies investigated either the one or the other form of attention, leaving open the possibility that both object- and feature-specific attentional effects do occur at the same time and may just represent two sides of a single attention system. We here investigate this issue by testing attentional spreading within and across objects, using reaction time (RT) measurements to changes of attended and unattended features on both attended and unattended objects. We asked subjects to report color and speed changes occurring on one of two overlapping random dot patterns (RDPs), presented at the center of gaze. The key property of the stimulation was that only one of the features (e.g., motion direction) was unique for each object, whereas the other feature (e.g., color) was shared by both. The results of two experiments show that co-selection of unattended features even occurs when those features have no means for selecting the object. At the same time, they demonstrate that this processing benefit is not restricted to the selected object but spreads to the task-irrelevant one. We conceptualize these findings by a 3-step model of attention that assumes a task-dependent top-down gain, object-specific feature selection based on task- and binding characteristics, and a global feature-specific processing enhancement. The model allows for the unification of a vast amount of experimental results into a single model, and makes various experimentally testable predictions for the interaction of object- and feature-specific processes.

  12. Attentional spreading to task-irrelevant object features: Experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation

    Directory of Open Access Journals (Sweden)

    Detlef eWegener

    2014-06-01

    Full Text Available Directing attention to a specific feature of an object has been linked to different forms of attentional modulation. Object-based attention theory founds on the finding that even task-irrelevant features at the selected object are subject to attentional modulation, while feature-based attention theory proposes a global processing benefit for the selected feature even at other objects. Most studies investigated either the one or the other form of attention, leaving open the possibility that both object- and feature-specific attentional effects do occur at the same time and may just represent two sides of a single attention system. We here investigate this issue by testing attentional spreading within and across objects, using reaction time measurements to changes of attended and unattended features on both attended and unattended objects. We asked subjects to report color and speed changes occurring on one of two overlapping random dot patterns, presented at the center of gaze. The key property of the stimulation was that only one of the features (e.g. motion direction was unique for each object, whereas the other feature (e.g. color was shared by both. The results of two experiments show that co-selection of unattended features even occurs when those features have no means for selecting the object. At the same time, they demonstrate that this processing benefit is not restricted to the selected object but spreads to the task-irrelevant one. We conceptualize these findings by a 3-step model of attention that assumes a task-dependent top-down gain, object-specific feature selection based on task- and binding characteristics, and a global feature-specific processing enhancement. The model allows for the unification of a vast amount of experimental results into a single model, and makes various experimentally testable predictions for the interaction of object- and feature-specific processes.

  13. An Ant Colony Optimization Based Feature Selection for Web Page Classification

    Directory of Open Access Journals (Sweden)

    Esra Saraç

    2014-01-01

    Full Text Available The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  14. An ant colony optimization based feature selection for web page classification.

    Science.gov (United States)

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  15. Feature-based and spatial attentional selection in visual working memory.

    Science.gov (United States)

    Heuer, Anna; Schubö, Anna

    2016-05-01

    The contents of visual working memory (VWM) can be modulated by spatial cues presented during the maintenance interval ("retrocues"). Here, we examined whether attentional selection of representations in VWM can also be based on features. In addition, we investigated whether the mechanisms of feature-based and spatial attention in VWM differ with respect to parallel access to noncontiguous locations. In two experiments, we tested the efficacy of valid retrocues relying on different kinds of information. Specifically, participants were presented with a typical spatial retrocue pointing to two locations, a symbolic spatial retrocue (numbers mapping onto two locations), and two feature-based retrocues: a color retrocue (a blob of the same color as two of the items) and a shape retrocue (an outline of the shape of two of the items). The two cued items were presented at either contiguous or noncontiguous locations. Overall retrocueing benefits, as compared to a neutral condition, were observed for all retrocue types. Whereas feature-based retrocues yielded benefits for cued items presented at both contiguous and noncontiguous locations, spatial retrocues were only effective when the cued items had been presented at contiguous locations. These findings demonstrate that attentional selection and updating in VWM can operate on different kinds of information, allowing for a flexible and efficient use of this limited system. The observation that the representations of items presented at noncontiguous locations could only be reliably selected with feature-based retrocues suggests that feature-based and spatial attentional selection in VWM rely on different mechanisms, as has been shown for attentional orienting in the external world.

  16. Feature Selection Applying Statistical and Neurofuzzy Methods to EEG-Based BCI.

    Science.gov (United States)

    Martinez-Leon, Juan-Antonio; Cano-Izquierdo, Jose-Manuel; Ibarrola, Julio

    2015-01-01

    This paper presents an investigation aimed at drastically reducing the processing burden required by motor imagery brain-computer interface (BCI) systems based on electroencephalography (EEG). In this research, the focus has moved from the channel to the feature paradigm, and a 96% reduction of the number of features required in the process has been achieved maintaining and even improving the classification success rate. This way, it is possible to build cheaper, quicker, and more portable BCI systems. The data set used was provided within the framework of BCI Competition III, which allows it to compare the presented results with the classification accuracy achieved in the contest. Furthermore, a new three-step methodology has been developed which includes a feature discriminant character calculation stage; a score, order, and selection phase; and a final feature selection step. For the first stage, both statistics method and fuzzy criteria are used. The fuzzy criteria are based on the S-dFasArt classification algorithm which has shown excellent performance in previous papers undertaking the BCI multiclass motor imagery problem. The score, order, and selection stage is used to sort the features according to their discriminant nature. Finally, both order selection and Group Method Data Handling (GMDH) approaches are used to choose the most discriminant ones.

  17. A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms.

    Science.gov (United States)

    Zhou, Hongfang; Guo, Jie; Wang, Yinghui; Zhao, Minghua

    2016-01-01

    Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF, t-Test, and CMFS algorithms.

  18. Cluster analysis based on dimensional information with applications to feature selection and classification

    Science.gov (United States)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  19. Analysis and Selection of Features for Gesture Recognition Based on a Micro Wearable Device

    Directory of Open Access Journals (Sweden)

    Yinghui Zhou

    2012-01-01

    Full Text Available More and More researchers concerned about designing a health supporting system for elders that is light weight, no disturbing to user, and low computing complexity. In the paper, we introduced a micro wearable device based on a tri-axis accelerometer, which can detect acceleration change of human body based on the position of the device being set. Considering the flexibility of human finger, we put it on a finger to detect the finger gestures. 12 kinds of one-stroke finger gestures are defined according to the sensing characteristic of the accelerometer. Feature is a paramount factor in the recognition task. In the paper, gestures features both in time domain and frequency domain are described since features decide the recognition accuracy directly. Feature generation method and selection process is analyzed in detail to get the optimal feature subset from the candidate feature set. Experiment results indicate the feature subset can get satisfactory classification results of 90.08% accuracy using 12 features considering the recognition accuracy and dimension of feature set.

  20. An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model

    KAUST Repository

    Soufan, Othman

    2012-09-01

    Feature selection is the first task of any learning approach that is applied in major fields of biomedical, bioinformatics, robotics, natural language processing and social networking. In feature subset selection problem, a search methodology with a proper criterion seeks to find the best subset of features describing data (relevance) and achieving better performance (optimality). Wrapper approaches are feature selection methods which are wrapped around a classification algorithm and use a performance measure to select the best subset of features. We analyze the proper design of the objective function for the wrapper approach and highlight an objective based on several classification algorithms. We compare the wrapper approaches to different feature selection methods based on distance and information based criteria. Significant improvement in performance, computational time, and selection of minimally sized feature subsets is achieved by combining different objectives for the wrapper model. In addition, considering various classification methods in the feature selection process could lead to a global solution of desirable characteristics.

  1. Feature subset selection based on mahalanobis distance: a statistical rough set method

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalan...

  2. Feature selection in bioinformatics

    Science.gov (United States)

    Wang, Lipo

    2012-06-01

    In bioinformatics, there are often a large number of input features. For example, there are millions of single nucleotide polymorphisms (SNPs) that are genetic variations which determine the dierence between any two unrelated individuals. In microarrays, thousands of genes can be proled in each test. It is important to nd out which input features (e.g., SNPs or genes) are useful in classication of a certain group of people or diagnosis of a given disease. In this paper, we investigate some powerful feature selection techniques and apply them to problems in bioinformatics. We are able to identify a very small number of input features sucient for tasks at hand and we demonstrate this with some real-world data.

  3. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    Science.gov (United States)

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-01-01

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346

  4. Online feature selection with streaming features.

    Science.gov (United States)

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  5. Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface

    Directory of Open Access Journals (Sweden)

    Michael H. Thaut

    2005-11-01

    Full Text Available Most EEG-based BCI systems make use of well-studied patterns of brain activity. However, those systems involve tasks that indirectly map to simple binary commands such as “yes” or “no” or require many weeks of biofeedback training. We hypothesized that signal processing and machine learning methods can be used to discriminate EEG in a direct “yes”/“no” BCI from a single session. Blind source separation (BSS and spectral transformations of the EEG produced a 180-dimensional feature space. We used a modified genetic algorithm (GA wrapped around a support vector machine (SVM classifier to search the space of feature subsets. The GA-based search found feature subsets that outperform full feature sets and random feature subsets. Also, BSS transformations of the EEG outperformed the original time series, particularly in conjunction with a subset search of both spaces. The results suggest that BSS and feature selection can be used to improve the performance of even a “direct,” single-session BCI.

  6. A kernel-based multivariate feature selection method for microarray data classification.

    Directory of Open Access Journals (Sweden)

    Shiquan Sun

    Full Text Available High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, [Formula: see text]-nearest neighbor on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.

  7. Selection of Entropy Based Features for Automatic Analysis of Essential Tremor

    Directory of Open Access Journals (Sweden)

    Karmele López-de-Ipiña

    2016-05-01

    Full Text Available Biomedical systems produce biosignals that arise from interaction mechanisms. In a general form, those mechanisms occur across multiple scales, both spatial and temporal, and contain linear and non-linear information. In this framework, entropy measures are good candidates in order provide useful evidence about disorder in the system, lack of information in time-series and/or irregularity of the signals. The most common movement disorder is essential tremor (ET, which occurs 20 times more than Parkinson’s disease. Interestingly, about 50%–70% of the cases of ET have a genetic origin. One of the most used standard tests for clinical diagnosis of ET is Archimedes’ spiral drawing. This work focuses on the selection of non-linear biomarkers from such drawings and handwriting, and it is part of a wider cross study on the diagnosis of essential tremor, where our piece of research presents the selection of entropy features for early ET diagnosis. Classic entropy features are compared with features based on permutation entropy. Automatic analysis system settled on several Machine Learning paradigms is performed, while automatic features selection is implemented by means of ANOVA (analysis of variance test. The obtained results for early detection are promising and appear applicable to real environments.

  8. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    Directory of Open Access Journals (Sweden)

    Liao Li

    2010-10-01

    Full Text Available Abstract Background Protein-protein interaction (PPI plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs, based on domains represented as interaction profile hidden Markov models (ipHMM where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB. Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD. Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure, an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on

  9. Data Visualization and Feature Selection Methods in Gel-based Proteomics

    DEFF Research Database (Denmark)

    Silva, Tomé Santos; Richard, Nadege; Dias, Jorge P.;

    2014-01-01

    Despite the increasing popularity of gel-free proteomic strategies, two-dimensional gel electrophoresis (2DE) is still the most widely used approach in top-down proteomic studies, for all sorts of biological models. In order to achieve meaningful biological insight using 2DE approaches, importance......-based proteomics, summarizing the current state of research within this field. Particular focus is given on discussing the usefulness of available multivariate analysis tools both for data visualization and feature selection purposes. Visual examples are given using a real gel-based proteomic dataset as basis....

  10. An Enhanced Grey Wolf Optimization Based Feature Selection Wrapped Kernel Extreme Learning Machine for Medical Diagnosis

    Science.gov (United States)

    Li, Qiang; Zhao, Xuehua; Cai, ZhenNao; Tong, Changfei; Liu, Wenbin; Tian, Xin

    2017-01-01

    In this study, a new predictive framework is proposed by integrating an improved grey wolf optimization (IGWO) and kernel extreme learning machine (KELM), termed as IGWO-KELM, for medical diagnosis. The proposed IGWO feature selection approach is used for the purpose of finding the optimal feature subset for medical data. In the proposed approach, genetic algorithm (GA) was firstly adopted to generate the diversified initial positions, and then grey wolf optimization (GWO) was used to update the current positions of population in the discrete searching space, thus getting the optimal feature subset for the better classification purpose based on KELM. The proposed approach is compared against the original GA and GWO on the two common disease diagnosis problems in terms of a set of performance metrics, including classification accuracy, sensitivity, specificity, precision, G-mean, F-measure, and the size of selected features. The simulation results have proven the superiority of the proposed method over the other two competitive counterparts. PMID:28246543

  11. GalNAc-transferase specificity prediction based on feature selection method.

    Science.gov (United States)

    Lu, Lin; Niu, Bing; Zhao, Jun; Liu, Liang; Lu, Wen-Cong; Liu, Xiao-Jun; Li, Yi-Xue; Cai, Yu-Dong

    2009-02-01

    GalNAc-transferase can catalyze the biosynthesis of O-linked oligosaccharides. The specificity of GalNAc-transferase is composed of nine amino acid residues denoted by R4, R3, R2, R1, R0, R1', R2', R3', R4'. To predict whether the reducing monosaccharide will be covalently linked to the central residue R0(Ser or Thr), a new method based on feature selection has been proposed in our work. 277 nonapeptides from reference [Chou KC. A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase. Protein Sci 1995;4:1365-83] are chosen for training set. Each nonapeptide is represented by hundreds of amino acid properties collected by Amino Acid Index database (http://www.genome.jp/aaindex) and transformed into a numeric vector with 4554 features. The Maximum Relevance Minimum Redundancy (mRMR) method combining with Incremental Feature Selection (IFS) and Feature Forward Selection (FFS) are then applied for feature selection. Nearest Neighbor Algorithm (NNA) is used to build prediction models. The optimal model contains 54 features and its correct rate tested by Jackknife cross-validation test reaches 91.34%. Final feature analysis indicates that amino acid residues at position R3' play the most important role in the recognition of GalNAc-transferase specificity, which were confirmed by the experiments [Elhammer AP, Poorman RA, Brown E, Maggiora LL, Hoogerheide JG, Kezdy FJ. The specificity of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase as inferred from a database of in vivo substrates and from the in vitro glycosylation of proteins and peptides. J Biol Chem 1993;268:10029-38; O'Connell BC, Hagen FK, Tabak LA. The influence of flanking sequence on the O-glycosylation of threonine in vitro. J Biol Chem 1992;267:25010-8; Yoshida A, Suzuki M, Ikenaga H, Takeuchi M. Discovery of the shortest sequence motif for high level mucin-type O-glycosylation. J Biol Chem 1997;272:16884-8]. Our method can be used as a tool for predicting O

  12. Feature and Score Fusion Based Multiple Classifier Selection for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Md. Rabiul Islam

    2014-01-01

    Full Text Available The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al.

  13. Context-dependent feature selection using unsupervised contexts applied to GPR-based landmine detection

    Science.gov (United States)

    Ratto, Christopher R.; Torrione, Peter A.; Collins, Leslie M.

    2010-04-01

    Context-dependent classification techniques applied to landmine detection with ground-penetrating radar (GPR) have demonstrated substantial performance improvements over conventional classification algorithms. Context-dependent algorithms compute a decision statistic by integrating over uncertainty in the unknown, but probabilistically inferable, context of the observation. When applied to GPR, contexts may be defined by differences in electromagnetic properties of the subsurface environment, which are due to discrepancies in soil composition, moisture levels, and surface texture. Context-dependent Feature Selection (CDFS) is a technique developed for selecting a unique subset of features for classifying landmines from clutter in different environmental contexts. In past work, context definitions were assumed to be soil moisture conditions which were known during training. However, knowledge of environmental conditions could be difficult to obtain in the field. In this paper, we utilize an unsupervised learning algorithm for defining contexts which are unknown a priori. Our method performs unsupervised context identification based on similarities in physics-based and statistical features that characterize the subsurface environment of the raw GPR data. Results indicate that utilizing this contextual information improves classification performance, and provides performance improvements over non-context-dependent approaches. Implications for on-line context identification will be suggested as a possible avenue for future work.

  14. Continuous wavelet transform-based feature selection applied to near-infrared spectral diagnosis of cancer.

    Science.gov (United States)

    Chen, Hui; Lin, Zan; Mo, Lin; Wu, Hegang; Wu, Tong; Tan, Chao

    2015-01-01

    Spectrum is inherently local in nature since it can be thought of as a signal being composed of various frequency components. Wavelet transform (WT) is a powerful tool that partitions a signal into components with different frequency. The property of multi-resolution enables WT a very effective and natural tool for analyzing spectrum-like signal. In this study, a continuous wavelet transform (CWT)-based variable selection procedure was proposed to search for a set of informative wavelet coefficients for constructing a near-infrared (NIR) spectral diagnosis model of cancer. The CWT provided a fine multi-resolution feature space for selecting best predictors. A measure of discriminating power (DP) was defined to evaluate the coefficients. Partial least squares-discriminant analysis (PLS-DA) was used as the classification algorithm. A NIR spectral dataset associated to cancer diagnosis was used for experiment. The optimal results obtained correspond to the wavelet of db2. It revealed that on condition of having better performance on the training set, the optimal PLS-DA model using only 40 wavelet coefficients in 10 scales achieved the same performance as the one using all the variables in the original space on the test set: an overall accuracy of 93.8%, sensitivity of 92.5% and specificity of 96.3%. It confirms that the CWT-based feature selection coupled with PLS-DA is feasible and effective for constructing models of diagnostic cancer by NIR spectroscopy.

  15. QSAR modeling for quinoxaline derivatives using genetic algorithm and simulated annealing based feature selection.

    Science.gov (United States)

    Ghosh, P; Bagchi, M C

    2009-01-01

    With a view to the rational design of selective quinoxaline derivatives, 2D and 3D-QSAR models have been developed for the prediction of anti-tubercular activities. Successful implementation of a predictive QSAR model largely depends on the selection of a preferred set of molecular descriptors that can signify the chemico-biological interaction. Genetic algorithm (GA) and simulated annealing (SA) are applied as variable selection methods for model development. 2D-QSAR modeling using GA or SA based partial least squares (GA-PLS and SA-PLS) methods identified some important topological and electrostatic descriptors as important factor for tubercular activity. Kohonen network and counter propagation artificial neural network (CP-ANN) considering GA and SA based feature selection methods have been applied for such QSAR modeling of Quinoxaline compounds. Out of a variable pool of 380 molecular descriptors, predictive QSAR models are developed for the training set and validated on the test set compounds and a comparative study of the relative effectiveness of linear and non-linear approaches has been investigated. Further analysis using 3D-QSAR technique identifies two models obtained by GA-PLS and SA-PLS methods leading to anti-tubercular activity prediction. The influences of steric and electrostatic field effects generated by the contribution plots are discussed. The results indicate that SA is a very effective variable selection approach for such 3D-QSAR modeling.

  16. Multi-Stage Feature Selection Based Intelligent Classifier for Classification of Incipient Stage Fire in Building

    Directory of Open Access Journals (Sweden)

    Allan Melvin Andrew

    2016-01-01

    Full Text Available In this study, an early fire detection algorithm has been proposed based on low cost array sensing system, utilising off- the shelf gas sensors, dust particles and ambient sensors such as temperature and humidity sensor. The odour or “smellprint” emanated from various fire sources and building construction materials at early stage are measured. For this purpose, odour profile data from five common fire sources and three common building construction materials were used to develop the classification model. Normalised feature extractions of the smell print data were performed before subjected to prediction classifier. These features represent the odour signals in the time domain. The obtained features undergo the proposed multi-stage feature selection technique and lastly, further reduced by Principal Component Analysis (PCA, a dimension reduction technique. The hybrid PCA-PNN based approach has been applied on different datasets from in-house developed system and the portable electronic nose unit. Experimental classification results show that the dimension reduction process performed by PCA has improved the classification accuracy and provided high reliability, regardless of ambient temperature and humidity variation, baseline sensor drift, the different gas concentration level and exposure towards different heating temperature range.

  17. Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

    Science.gov (United States)

    Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

    2016-06-01

    Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.

  18. Genetic Particle Swarm Optimization-Based Feature Selection for Very-High-Resolution Remotely Sensed Imagery Object Change Detection.

    Science.gov (United States)

    Chen, Qiang; Chen, Yunhao; Jiang, Weiguo

    2016-07-30

    In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm.

  19. Kernel-based Joint Feature Selection and Max-Margin Classification for Early Diagnosis of Parkinson’s Disease

    Science.gov (United States)

    Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang

    2017-01-01

    Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods. PMID:28120883

  20. Kernel-based Joint Feature Selection and Max-Margin Classification for Early Diagnosis of Parkinson’s Disease

    Science.gov (United States)

    Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang

    2017-01-01

    Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods.

  1. Research on Methods for Discovering and Selecting Cloud Infrastructure Services Based on Feature Modeling

    Directory of Open Access Journals (Sweden)

    Huamin Zhu

    2016-01-01

    Full Text Available Nowadays more and more cloud infrastructure service providers are providing large numbers of service instances which are a combination of diversified resources, such as computing, storage, and network. However, for cloud infrastructure services, the lack of a description standard and the inadequate research of systematic discovery and selection methods have exposed difficulties in discovering and choosing services for users. First, considering the highly configurable properties of a cloud infrastructure service, the feature model method is used to describe such a service. Second, based on the description of the cloud infrastructure service, a systematic discovery and selection method for cloud infrastructure services are proposed. The automatic analysis techniques of the feature model are introduced to verify the model’s validity and to perform the matching of the service and demand models. Finally, we determine the critical decision metrics and their corresponding measurement methods for cloud infrastructure services, where the subjective and objective weighting results are combined to determine the weights of the decision metrics. The best matching instances from various providers are then ranked by their comprehensive evaluations. Experimental results show that the proposed methods can effectively improve the accuracy and efficiency of cloud infrastructure service discovery and selection.

  2. A soft computing based approach using modified selection strategy for feature reduction of medical systems.

    Science.gov (United States)

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  3. A Soft Computing Based Approach Using Modified Selection Strategy for Feature Reduction of Medical Systems

    Directory of Open Access Journals (Sweden)

    Kursat Zuhtuogullari

    2013-01-01

    Full Text Available The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  4. AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE SELECTION

    Directory of Open Access Journals (Sweden)

    Ashalata Panigrahi

    2015-06-01

    Full Text Available With the increase in Internet users the number of malicious users are also growing day-by-day posing a serious problem in distinguishing between normal and abnormal behavior of users in the network. This has led to the research area of intrusion detection which essentially analyzes the network traffic and tries to determine normal and abnormal patterns of behavior.In this paper, we have analyzed the standard NSL-KDD intrusion dataset using some neural network based techniques for predicting possible intrusions. Four most effective classification methods, namely, Radial Basis Function Network, SelfOrganizing Map, Sequential Minimal Optimization, and Projective Adaptive Resonance Theory have been applied. In order to enhance the performance of the classifiers, three entropy based feature selection methods have been applied as preprocessing of data. Performances of different combinations of classifiers and attribute reduction methods have also been compared.

  5. A New Feature Selection Algorithm Based on the Mean Impact Variance

    Directory of Open Access Journals (Sweden)

    Weidong Cheng

    2014-01-01

    Full Text Available The selection of fewer or more representative features from multidimensional features is important when the artificial neural network (ANN algorithm is used as a classifier. In this paper, a new feature selection method called the mean impact variance (MIVAR method is proposed to determine the feature that is more suitable for classification. Moreover, this method is constructed on the basis of the training process of the ANN algorithm. To verify the effectiveness of the proposed method, the MIVAR value is used to rank the multidimensional features of the bearing fault diagnosis. In detail, (1 70-dimensional all waveform features are extracted from a rolling bearing vibration signal with four different operating states, (2 the corresponding MIVAR values of all 70-dimensional features are calculated to rank all features, (3 14 groups of 10-dimensional features are separately generated according to the ranking results and the principal component analysis (PCA algorithm and a back propagation (BP network is constructed, and (4 the validity of the ranking result is proven by training this BP network with these seven groups of 10-dimensional features and by comparing the corresponding recognition rates. The results prove that the features with larger MIVAR value can lead to higher recognition rates.

  6. Genetic Fuzzy System (GFS based wavelet co-occurrence feature selection in mammogram classification for breast cancer diagnosis

    Directory of Open Access Journals (Sweden)

    Meenakshi M. Pawar

    2016-09-01

    Full Text Available Breast cancer is significant health problem diagnosed mostly in women worldwide. Therefore, early detection of breast cancer is performed with the help of digital mammography, which can reduce mortality rate. This paper presents wrapper based feature selection approach for wavelet co-occurrence feature (WCF using Genetic Fuzzy System (GFS in mammogram classification problem. The performance of GFS algorithm is explained using mini-MIAS database. WCF features are obtained from detail wavelet coefficients at each level of decomposition of mammogram image. At first level of decomposition, 18 features are applied to GFS algorithm, which selects 5 features with an average classification success rate of 39.64%. Subsequently, at second level it selects 9 features from 36 features and the classification success rate is improved to 56.75%. For third level, 16 features are selected from 54 features and average success rate is improved to 64.98%. Lastly, at fourth level 72 features are applied to GFS, which selects 16 features and thereby increasing average success rate to 89.47%. Hence, GFS algorithm is the effective way of obtaining optimal set of feature in breast cancer diagnosis.

  7. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu....... While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative...

  8. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    . While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative......Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu...

  9. TOPSIS Based Multi-Criteria Decision Making of Feature Selection Techniques for Network Traffic Dataset

    Directory of Open Access Journals (Sweden)

    Raman Singh

    2014-01-01

    Full Text Available Intrusion detection systems (IDS have to process millions of packets with many features, which delay the detection of anomalies. Sampling and feature selection may be used to reduce computation time and hence minimizing intrusion detection time. This paper aims to suggest some feature selection algorithm on the basis of The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS. TOPSIS is used to suggest one or more choice(s among some alternatives, having many attributes. Total ten feature selection techniques have been used for the analysis of KDD network dataset. Three classifiers namely Naïve Bayes, J48 and PART have been considered for this experiment using Weka data mining tool. Ranking of the techniques using TOPSIS have been calculated by using MATLAB as a tool. Out of these techniques Filtered Subset Evaluation has been found suitable for intrusion detection in terms of very less computational time with acceptable accuracy.

  10. Improving Bee Algorithm Based Feature Selection in Intrusion Detection System Using Membrane Computing

    Directory of Open Access Journals (Sweden)

    Kazeem I. Rufai

    2014-03-01

    Full Text Available Despite the great benefits accruable from the debut of computer and the internet, efforts are constantly being put up by fraudulent and mischievous individuals to compromise the integrity, confidentiality or availability of electronic information systems. In Cyber-security parlance, this is termed ‘intrusion’. Hence, this has necessitated the introduction of Intrusion Detection Systems (IDS to help detect and curb different types of attack. However, based on the high volume of data traffic involved in a network system, effects of redundant and irrelevant data should be minimized if a qualitative intrusion detection mechanism is genuinely desirous. Several attempts, especially feature subset selection approach using Bee Algorithm (BA, Linear Genetic Programming (LGP, Support Vector Decision Function Ranking (SVDF, Rough, Rough-DPSO, and Mutivariate Regression Splines (MARS have been advanced in the past to measure the dependability and quality of a typical IDS. The observed problem among these approaches has to do with their general performance. This has therefore motivated this research work. We hereby propose a new but robust algorithm called membrane algorithm to improve the Bee Algorithm based feature subset selection technique. This Membrane computing paradigm is a class of parallel computing devices. Data used were taken from KDD-Cup 99 Dataset which is the acceptable standard benchmark for intrusion detection. When the final results were compared to those of the existing approaches, using the three standard IDS measurements-Attack Detection, False Alarm and Classification Accuracy Rates, it was discovered that Bee Algorithm-Membrane Computing (BA-MC approach is a better technique. This is because our approach produced very high attack detection rate of 89.11%, classification accuracy of 95.60% and also generated a reasonable decrease in false alarm rate of 0.004. Receiver Operating Characteristic (ROC curve was used for results

  11. Electrocardiogram Based Identification using a New Effective Intelligent Selection of Fused Features

    Science.gov (United States)

    Abbaspour, Hamidreza; Razavi, Seyyed Mohammad; Mehrshad, Nasser

    2015-01-01

    Over the years, the feasibility of using Electrocardiogram (ECG) signal for human identification issue has been investigated, and some methods have been suggested. In this research, a new effective intelligent feature selection method from ECG signals has been proposed. This method is developed in such a way that it is able to select important features that are necessary for identification using analysis of the ECG signals. For this purpose, after ECG signal preprocessing, its characterizing features were extracted and then compressed using the cosine transform. The more effective features in the identification, among the characterizing features, are selected using a combination of the genetic algorithm and artificial neural networks. The proposed method was tested on three public ECG databases, namely, MIT-BIH Arrhythmias Database, MITBIH Normal Sinus Rhythm Database and The European ST-T Database, in order to evaluate the proposed subject identification method on normal ECG signals as well as ECG signals with arrhythmias. Identification rates of 99.89% and 99.84% and 99.99% are obtained for these databases respectively. The proposed algorithm exhibits remarkable identification accuracies not only with normal ECG signals, but also in the presence of various arrhythmias. Simulation results showed that the proposed method despite the low number of selected features has a high performance in identification task. PMID:25709939

  12. SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

    Science.gov (United States)

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306

  13. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

    Science.gov (United States)

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

  14. A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest

    Directory of Open Access Journals (Sweden)

    Nantian Huang

    2016-09-01

    Full Text Available The prediction accuracy of short-term load forecast (STLF depends on prediction model choice and feature selection result. In this paper, a novel random forest (RF-based feature selection method for STLF is proposed. First, 243 related features were extracted from historical load data and the time information of prediction points to form the original feature set. Subsequently, the original feature set was used to train an RF as the original model. After the training process, the prediction error of the original model on the test set was recorded and the permutation importance (PI value of each feature was obtained. Then, an improved sequential backward search method was used to select the optimal forecasting feature subset based on the PI value of each feature. Finally, the optimal forecasting feature subset was used to train a new RF model as the final prediction model. Experiments showed that the prediction accuracy of RF trained by the optimal forecasting feature subset was higher than that of the original model and comparative models based on support vector regression and artificial neural network.

  15. Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

    Science.gov (United States)

    Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin

    2017-04-01

    As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.

  16. A Circular Polarizer with Beamforming Feature Based on Frequency Selective Surfaces

    Science.gov (United States)

    Yin, Jia Yuan; Wan, Xiang; Ren, Jian; Cui, Tie Jun

    2017-01-01

    We propose a circular polarizer with beamforming features based on frequency selective surface (FSS), in which a modified anchor-shaped unit cell is used to reach the circular polarizer function. The beamforming characteristic is realized by a particular design of the unit-phase distribution, which is obtained by varying the scale of the unit cell. Instead of using plane waves, a horn antenna is designed to feed the phase-variant FSS. The proposed two-layer FSS is fabricated and measured to verify the design. The measured results show that the proposed structure can convert the linearly polarized waves to circularly polarized waves. Compared with the feeding horn antenna, the transmitted beam of the FSS-added horn is 14.43° broader in one direction, while 3.77° narrower in the orthogonal direction. To our best knowledge, this is the first time to realize circular polarizer with beamforming as the extra function based on FSS, which is promising in satellite and communication systems for potential applications due to its simple design and good performance. PMID:28128345

  17. A Circular Polarizer with Beamforming Feature Based on Frequency Selective Surfaces

    Science.gov (United States)

    Yin, Jia Yuan; Wan, Xiang; Ren, Jian; Cui, Tie Jun

    2017-01-01

    We propose a circular polarizer with beamforming features based on frequency selective surface (FSS), in which a modified anchor-shaped unit cell is used to reach the circular polarizer function. The beamforming characteristic is realized by a particular design of the unit-phase distribution, which is obtained by varying the scale of the unit cell. Instead of using plane waves, a horn antenna is designed to feed the phase-variant FSS. The proposed two-layer FSS is fabricated and measured to verify the design. The measured results show that the proposed structure can convert the linearly polarized waves to circularly polarized waves. Compared with the feeding horn antenna, the transmitted beam of the FSS-added horn is 14.43° broader in one direction, while 3.77° narrower in the orthogonal direction. To our best knowledge, this is the first time to realize circular polarizer with beamforming as the extra function based on FSS, which is promising in satellite and communication systems for potential applications due to its simple design and good performance.

  18. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    Science.gov (United States)

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM.

  19. Multi-Stage Feature Selection by Using Genetic Algorithms for Fault Diagnosis in Gearboxes Based on Vibration Signal

    Directory of Open Access Journals (Sweden)

    Mariela Cerrada

    2015-09-01

    Full Text Available There are growing demands for condition-based monitoring of gearboxes, and techniques to improve the reliability, effectiveness and accuracy for fault diagnosis are considered valuable contributions. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance in the diagnosis system. The main aim of this research is to propose a multi-stage feature selection mechanism for selecting the best set of condition parameters on the time, frequency and time-frequency domains, which are extracted from vibration signals for fault diagnosis purposes in gearboxes. The selection is based on genetic algorithms, proposing in each stage a new subset of the best features regarding the classifier performance in a supervised environment. The selected features are augmented at each stage and used as input for a neural network classifier in the next step, while a new subset of feature candidates is treated by the selection process. As a result, the inherent exploration and exploitation of the genetic algorithms for finding the best solutions of the selection problem are locally focused. The Sensors 2015, 15 23904 approach is tested on a dataset from a real test bed with several fault classes under different running conditions of load and velocity. The model performance for diagnosis is over 98%.

  20. [Oil atomic spectrometric feature selection by Parzen window based vague sets theory].

    Science.gov (United States)

    Xu, Chao; Zhang, Pei-Lin; Ren, Guo-Quan; Zhang, Xiao-Dong; Yang, Yu-Dong

    2011-02-01

    Large quantity and ambiguity of oil atomic spectrometric information greatly affects the applicable efficiency and accuracy in fault diagnosis. A novel method for choosing less and effective spectrometric features is presented. Based on gearbox test bed, we simulated the normal wear state and two typical faults to acquire the lubricant samples. The three wear states are regarded as three vague sets, and spectrometric feature values are vague values on vague sets. Based on similarity between vague values, mean vague sensibility (MVS) is defined to describe the sensitive degree of spectrometric feature to wear state. Besides, the membership degrees of vague sets greatly depend on human experience. The probability density distribution of spectrometric data of three wear states was estimated with Parzen window. Combined with Bayesian formula, the range of vague sets membership was calculated. Experimental results verify that the proposed method is of efficient help in choosing high fault-sensitive features from so many spectrometric features.

  1. Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

    Directory of Open Access Journals (Sweden)

    Nicole A Capela

    Full Text Available Human activity recognition (HAR, using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter. The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree. Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

  2. Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

    Science.gov (United States)

    Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie

    2015-01-01

    Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

  3. A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms.

    Science.gov (United States)

    Şen, Baha; Peker, Musa; Çavuşoğlu, Abdullah; Çelebi, Fatih V

    2014-03-01

    Sleep scoring is one of the most important diagnostic methods in psychiatry and neurology. Sleep staging is a time consuming and difficult task undertaken by sleep experts. This study aims to identify a method which would classify sleep stages automatically and with a high degree of accuracy and, in this manner, will assist sleep experts. This study consists of three stages: feature extraction, feature selection from EEG signals, and classification of these signals. In the feature extraction stage, it is used 20 attribute algorithms in four categories. 41 feature parameters were obtained from these algorithms. Feature selection is important in the elimination of irrelevant and redundant features and in this manner prediction accuracy is improved and computational overhead in classification is reduced. Effective feature selection algorithms such as minimum redundancy maximum relevance (mRMR); fast correlation based feature selection (FCBF); ReliefF; t-test; and Fisher score algorithms are preferred at the feature selection stage in selecting a set of features which best represent EEG signals. The features obtained are used as input parameters for the classification algorithms. At the classification stage, five different classification algorithms (random forest (RF); feed-forward neural network (FFNN); decision tree (DT); support vector machine (SVM); and radial basis function neural network (RBF)) classify the problem. The results, obtained from different classification algorithms, are provided so that a comparison can be made between computation times and accuracy rates. Finally, it is obtained 97.03 % classification accuracy using the proposed method. The results show that the proposed method indicate the ability to design a new intelligent assistance sleep scoring system.

  4. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data

    Directory of Open Access Journals (Sweden)

    Rabia Aziz

    2016-06-01

    Full Text Available Feature (gene selection and classification of microarray data are the two most interesting machine learning challenges. In the present work two existing feature selection/extraction algorithms, namely independent component analysis (ICA and fuzzy backward feature elimination (FBFE are used which is a new combination of selection/extraction. The main objective of this paper is to select the independent components of the DNA microarray data using FBFE to improve the performance of support vector machine (SVM and Naïve Bayes (NB classifier, while making the computational expenses affordable. To show the validity of the proposed method, it is applied to reduce the number of genes for five DNA microarray datasets namely; colon cancer, acute leukemia, prostate cancer, lung cancer II, and high-grade glioma. Now these datasets are then classified using SVM and NB classifiers. Experimental results on these five microarray datasets demonstrate that gene selected by proposed approach, effectively improve the performance of SVM and NB classifiers in terms of classification accuracy. We compare our proposed method with principal component analysis (PCA as a standard extraction algorithm and find that the proposed method can obtain better classification accuracy, using SVM and NB classifiers with a smaller number of selected genes than the PCA. The curve between the average error rate and number of genes with each dataset represents the selection of required number of genes for the highest accuracy with our proposed method for both the classifiers. ROC shows best subset of genes for both the classifier of different datasets with propose method.

  5. BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    KAUST Repository

    Kandaswamy, Krishna Kumar

    2011-08-17

    Background: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.

  6. Empirical Validation of Objective Functions in Feature Selection Based on Acceleration Motion Segmentation Data

    Directory of Open Access Journals (Sweden)

    Jong Gwan Lim

    2015-01-01

    Full Text Available Recent change in evaluation criteria from accuracy alone to trade-off with time delay has inspired multivariate energy-based approaches in motion segmentation using acceleration. The essence of multivariate approaches lies in the construction of highly dimensional energy and requires feature subset selection in machine learning. Due to fast process, filter methods are preferred; however, their poorer estimate is of the main concerns. This paper aims at empirical validation of three objective functions for filter approaches, Fisher discriminant ratio, multiple correlation (MC, and mutual information (MI, through two subsequent experiments. With respect to 63 possible subsets out of 6 variables for acceleration motion segmentation, three functions in addition to a theoretical measure are compared with two wrappers, k-nearest neighbor and Bayes classifiers in general statistics and strongly relevant variable identification by social network analysis. Then four kinds of new proposed multivariate energy are compared with a conventional univariate approach in terms of accuracy and time delay. Finally it appears that MC and MI are acceptable enough to match the estimate of two wrappers, and multivariate approaches are justified with our analytic procedures.

  7. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning.

    Science.gov (United States)

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context

  8. A flexible mechanism of rule selection enables rapid feature-based reinforcement learning

    Directory of Open Access Journals (Sweden)

    Matthew eBalcarras

    2016-03-01

    Full Text Available Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or colour and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or colour. Two-thirds of subjects (n=22/32 exhibited behaviour that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behaviour of other subjects (n=10/32 was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioural rules by leveraging simple model-free reinforcement

  9. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    LiuYongguo; LiXueming; 等

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database.However,some redundant irrelevant attributes,which result in low performance and high computing complexity,are included in the very large database in general.So,Feature Selection(FSS)becomes one important issue in the field of data mining.In this letter,an Fss model based on the filter approach is built,which uses the simulated annealing gentic algorithm.Experimental results show that convergence and stability of this algorithm are adequately achieved.

  10. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Liu Yongguo; Li Xueming; Wu Zhongfu

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing complexity, are included in the very large database in general.So, Feature Subset Selection (FSS) becomes one important issue in the field of data mining. In this letter, an FSS model based on the filter approach is built, which uses the simulated annealing genetic algorithm. Experimental results show that convergence and stability of this algorithm are adequately achieved.

  11. Feature Extraction and Selection Scheme for Intelligent Engine Fault Diagnosis Based on 2DNMF, Mutual Information, and NSGA-II

    Directory of Open Access Journals (Sweden)

    Peng-yuan Liu

    2016-01-01

    Full Text Available A novel feature extraction and selection scheme is presented for intelligent engine fault diagnosis by utilizing two-dimensional nonnegative matrix factorization (2DNMF, mutual information, and nondominated sorting genetic algorithms II (NSGA-II. Experiments are conducted on an engine test rig, in which eight different engine operating conditions including one normal condition and seven fault conditions are simulated, to evaluate the presented feature extraction and selection scheme. In the phase of feature extraction, the S transform technique is firstly utilized to convert the engine vibration signals to time-frequency domain, which can provide richer information on engine operating conditions. Then a novel feature extraction technique, named two-dimensional nonnegative matrix factorization, is employed for characterizing the time-frequency representations. In the feature selection phase, a hybrid filter and wrapper scheme based on mutual information and NSGA-II is utilized to acquire a compact feature subset for engine fault diagnosis. Experimental results by adopted three different classifiers have demonstrated that the proposed feature extraction and selection scheme can achieve a very satisfying classification performance with fewer features for engine fault diagnosis.

  12. Evaluation of feature selection algorithms for classification in temporal lobe epilepsy based on MR images

    Science.gov (United States)

    Lai, Chunren; Guo, Shengwen; Cheng, Lina; Wang, Wensheng; Wu, Kai

    2017-02-01

    It's very important to differentiate the temporal lobe epilepsy (TLE) patients from healthy people and localize the abnormal brain regions of the TLE patients. The cortical features and changes can reveal the unique anatomical patterns of brain regions from the structural MR images. In this study, structural MR images from 28 normal controls (NC), 18 left TLE (LTLE), and 21 right TLE (RTLE) were acquired, and four types of cortical feature, namely cortical thickness (CTh), cortical surface area (CSA), gray matter volume (GMV), and mean curvature (MCu), were explored for discriminative analysis. Three feature selection methods, the independent sample t-test filtering, the sparse-constrained dimensionality reduction model (SCDRM), and the support vector machine-recursive feature elimination (SVM-RFE), were investigated to extract dominant regions with significant differences among the compared groups for classification using the SVM classifier. The results showed that the SVM-REF achieved the highest performance (most classifications with more than 92% accuracy), followed by the SCDRM, and the t-test. Especially, the surface area and gray volume matter exhibited prominent discriminative ability, and the performance of the SVM was improved significantly when the four cortical features were combined. Additionally, the dominant regions with higher classification weights were mainly located in temporal and frontal lobe, including the inferior temporal, entorhinal cortex, fusiform, parahippocampal cortex, middle frontal and frontal pole. It was demonstrated that the cortical features provided effective information to determine the abnormal anatomical pattern and the proposed method has the potential to improve the clinical diagnosis of the TLE.

  13. Feature Selection and Classification of Electroencephalographic Signals: An Artificial Neural Network and Genetic Algorithm Based Approach.

    Science.gov (United States)

    Erguzel, Turker Tekin; Ozekes, Serhat; Tan, Oguz; Gultekin, Selahattin

    2015-10-01

    Feature selection is an important step in many pattern recognition systems aiming to overcome the so-called curse of dimensionality. In this study, an optimized classification method was tested in 147 patients with major depressive disorder (MDD) treated with repetitive transcranial magnetic stimulation (rTMS). The performance of the combination of a genetic algorithm (GA) and a back-propagation (BP) neural network (BPNN) was evaluated using 6-channel pre-rTMS electroencephalographic (EEG) patterns of theta and delta frequency bands. The GA was first used to eliminate the redundant and less discriminant features to maximize classification performance. The BPNN was then applied to test the performance of the feature subset. Finally, classification performance using the subset was evaluated using 6-fold cross-validation. Although the slow bands of the frontal electrodes are widely used to collect EEG data for patients with MDD and provide quite satisfactory classification results, the outcomes of the proposed approach indicate noticeably increased overall accuracy of 89.12% and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.904 using the reduced feature set.

  14. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built...

  15. Feature Selection: Algorithms and Challenges

    Institute of Scientific and Technical Information of China (English)

    Xindong Wu; Yanglan Gan; Hao Wang; Xuegang Hu

    2006-01-01

    Feature selection is an active area in data mining research and development. It consists of efforts and contributions from a wide variety of communities, including statistics, machine learning, and pattern recognition. The diversity, on one hand, equips us with many methods and tools. On the other hand, the profusion of options causes confusion. This paper reviews various feature selection methods and identifies research challenges that are at the forefront of this exciting area.

  16. Feature Selection By KDDA For SVM-Based MultiView Face Recognition

    CERN Document Server

    Valiollahzadeh, Seyyed Majid; Nazari, Mohammad

    2008-01-01

    Applications such as face recognition that deal with high-dimensional data need a mapping technique that introduces representation of low-dimensional features with enhanced discriminatory power and a proper classifier, able to classify those complex features. Most of traditional Linear Discriminant Analysis suffer from the disadvantage that their optimality criteria are not directly related to the classification ability of the obtained feature representation. Moreover, their classification accuracy is affected by the "small sample size" problem which is often encountered in FR tasks. In this short paper, we combine nonlinear kernel based mapping of data called KDDA with Support Vector machine classifier to deal with both of the shortcomings in an efficient and cost effective manner. The proposed here method is compared, in terms of classification accuracy, to other commonly used FR methods on UMIST face database. Results indicate that the performance of the proposed method is overall superior to those of trad...

  17. A Study for the Feature Selection to Identify GIEMSA-Stained Human Chromosomes Based on Artificial Neural Network

    Science.gov (United States)

    2007-11-02

    neural network (ANN) has been adopted for the human chromosome classification. It is important to select optimum features for training neural network...Many studies for computer-based chromosome analysis have shown that it is possible to classify chromosomes into 24 subgroups. In addition, artificial

  18. Prediction of antimicrobial peptides based on sequence alignment and feature selection methods.

    Directory of Open Access Journals (Sweden)

    Ping Wang

    Full Text Available Antimicrobial peptides (AMPs represent a class of natural peptides that form a part of the innate immune system, and this kind of 'nature's antibiotics' is quite promising for solving the problem of increasing antibiotic resistance. In view of this, it is highly desired to develop an effective computational method for accurately predicting novel AMPs because it can provide us with more candidates and useful insights for drug design. In this study, a new method for predicting AMPs was implemented by integrating the sequence alignment method and the feature selection method. It was observed that, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was over 80.23%, and the Mathews correlation coefficient is 0.73, indicating a good prediction. Moreover, it is indicated by an in-depth feature analysis that the results are quite consistent with the previously known knowledge that some amino acids are preferential in AMPs and that these amino acids do play an important role for the antimicrobial activity. For the convenience of most experimental scientists who want to use the prediction method without the interest to follow the mathematical details, a user-friendly web-server is provided at http://amp.biosino.org/.

  19. Vinegar Classification Based on Feature Extraction and Selection From Tin Oxide Gas Sensor Array Data

    Directory of Open Access Journals (Sweden)

    Huang Xingyi

    2003-03-01

    Full Text Available Tin oxide gas sensor array based devices were often cited in publications dealing with food products. However, during the process of using a tin oxide gas sensor array to analysis and identify different gas, the most important and difficult was how to get useful parameters from the sensors and how to optimize the parameters. Which can make the sensor array can identify the gas rapidly and accuracy, and there was not a comfortable method. For this reason we developed a device which satisfied the gas sensor array act with the gas from vinegar. The parameters of the sensor act with gas were picked up after getting the whole acting process data. In order to assure whether the feature parameter was optimum or not, in this paper a new method called “distinguish index”(DI has been proposed. Thus we can assure the feature parameter was useful in the later pattern recognition process. Principal component analysis (PCA and artificial neural network (ANN were used to combine the optimum feature parameters. Good separation among the gases with different vinegar is obtained using principal component analysis. The recognition probability of the ANN is 98 %. The new method can also be applied to other pattern recognition problems.

  20. Graph-based unsupervised feature selection and multiview clustering for microarray data

    Indian Academy of Sciences (India)

    Tripti Swarnkar; Pabitra Mitra

    2015-10-01

    A challenge in bioinformatics is to analyse volumes of gene expression data generated through microarray experiments and obtain useful information. Consequently, most microarray studies demand complex data analysis to infer biologically meaningful information from such high-throughput data. Selection of informative genes is an important data analysis step to identify a set of genes which can further help in finding the biological information embedded in microarray data, and thus assists in diagnosis, prognosis and treatment of the disease. In this article we present an unsupervised feature selection technique which attempts to address the goal of explorative data analysis, unfolding the multi-faceted nature of data. It focuses on extracting multiple clustering views considering the diversity of each view from high-dimensional data. We evaluated our technique on benchmark data sets and the experimental results indicates the potential and effectiveness of the proposed model in comparison to the traditional single view clustering models, as well as other existing methods used in the literature for the studied datasets.

  1. Mean shift texture surface detection based on WT and COM feature image selection

    Institute of Scientific and Technical Information of China (English)

    HAN Yan-fang; SHI Peng-fei

    2006-01-01

    Mean shift is a widely used clustering algorithm in image segmentation. However, the segmenting results are not so good as expected when dealing with the texture surface due to the influence of the textures. Therefore, an approach based on wavelet transform (WT), co-occurrence matrix (COM) and mean shift is proposed in this paper. First, WT and COM are employed to extract the optimal resolution approximation of the original image as feature image. Then, mean shift is successfully used to obtain better detection results. Finally, experiments are done to show this approach is effective.

  2. Restricted Bipartite Graphs Based Target Detection for Hyperspectral Image Classification with GFA-LFDA Multi Feature Selection

    Directory of Open Access Journals (Sweden)

    T. Karthikeyan

    2015-06-01

    Full Text Available Hyper spectral imaging has recently become one of the most active research areas in remote sensing. Hyper spectral imagery possesses more spectral information than multispectral imagery because the number of spectral bands in hyper spectral imagery is in the hundreds rather than in the tens. However, the high dimensions of hyper spectral images cause redundancy in spatial-spectral feature domain and consider only spectral and spatial features only and ability of the classifier to excel even as training HSI images are limited. However, unless develop suitable algorithms for target detection or classification of the hyper spectral images data becomes difficult. Therefore, it is becomes essential to consider different features and find exact target detection rate to improve classification rate. In order to overcome this problem in this study presents a novel classification framework for hyper spectral data. Proposed system uses a graph based representation, Restricted Bipartite Graphs (RBG for exact detection of the class values. Before that the feature of the HSI images are selected using the Gaussian Firefly Algorithm (GFA for multiple feature selection and Local-Fisher’s Discriminant Analysis (LFDA based feature projection are performed in a raw spectral-spatial feature space for effective dimensionality reduction. Then RBG is proposed to represent the reduced feature results into graphical manner to solve exact target class matching problem, in hyper spectral imaginary. Classification is performed using the Hybrid Genetic Fuzzy Neural Network (HGFNN, Genetic algorithm is used to optimize the weights of the fuzzifier and the defuzzifier for labeled and unlabeled data samples. Experimentation results show that the proposed GFA-LFDA-RBG-HGFNN method outperforms in terms of the classification accuracy and less misclassification results than traditional methods.

  3. Feature Selection and Effective Classifiers.

    Science.gov (United States)

    Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri

    1998-01-01

    Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…

  4. Prominent feature selection of microarray data

    Institute of Scientific and Technical Information of China (English)

    Yihui Liu

    2009-01-01

    For wavelet transform, a set of orthogonal wavelet basis aims to detect the localized changing features contained in microarray data. In this research, we investigate the performance of the selected wavelet features based on wavelet detail coefficients at the second level and the third level. The genetic algorithm is performed to optimize wavelet detail coefficients to select the best discriminant features. Exper-iments are carried out on four microarray datasets to evaluate the performance of classification. Experimental results prove that wavelet features optimized from detail coefficients efficiently characterize the differences between normal tissues and cancer tissues.

  5. Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection

    Directory of Open Access Journals (Sweden)

    Shengyu Liu

    2015-01-01

    Full Text Available Drug name recognition (DNR is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  6. Feature engineering for drug name recognition in biomedical texts: feature conjunction and feature selection.

    Science.gov (United States)

    Liu, Shengyu; Tang, Buzhou; Chen, Qingcai; Wang, Xiaolong; Fan, Xiaoming

    2015-01-01

    Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  7. Multi-Features Encoding and Selecting Based on Genetic Algorithm for Human Action Recognition from Video

    Directory of Open Access Journals (Sweden)

    Chenglong Yu

    2013-05-01

    Full Text Available In this study, we proposed multiple local features encoded for recognizing the human actions. The multiple local features were obtained from the simple feature description of human actions in video. The simple features are two kinds of important features, optical flow and edge, to represent the human perception for the video behavior. As the video information descriptors, optical flow and edge, which their computing speeds are very fast and their requirement of memory consumption is very low, can represent respectively the motion information and shape information. Furthermore, key local multi-features are extracted and encoded by GA in order to reduce the computational complexity of the algorithm. After then, the Multi-SVM classifier is applied to discriminate the human actions.

  8. A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders.

    Science.gov (United States)

    Tekin Erguzel, Turker; Tas, Cumhur; Cebi, Merve

    2015-09-01

    Feature selection (FS) and classification are consecutive artificial intelligence (AI) methods used in data analysis, pattern classification, data mining and medical informatics. Beside promising studies in the application of AI methods to health informatics, working with more informative features is crucial in order to contribute to early diagnosis. Being one of the prevalent psychiatric disorders, depressive episodes of bipolar disorder (BD) is often misdiagnosed as major depressive disorder (MDD), leading to suboptimal therapy and poor outcomes. Therefore discriminating MDD and BD at earlier stages of illness could help to facilitate efficient and specific treatment. In this study, a nature inspired and novel FS algorithm based on standard Ant Colony Optimization (ACO), called improved ACO (IACO), was used to reduce the number of features by removing irrelevant and redundant data. The selected features were then fed into support vector machine (SVM), a powerful mathematical tool for data classification, regression, function estimation and modeling processes, in order to classify MDD and BD subjects. Proposed method used coherence, a promising quantitative electroencephalography (EEG) biomarker, values calculated from alpha, theta and delta frequency bands. The noteworthy performance of novel IACO-SVM approach stated that it is possible to discriminate 46 BD and 55 MDD subjects using 22 of 48 features with 80.19% overall classification accuracy. The performance of IACO algorithm was also compared to the performance of standard ACO, genetic algorithm (GA) and particle swarm optimization (PSO) algorithms in terms of their classification accuracy and number of selected features. In order to provide an almost unbiased estimate of classification error, the validation process was performed using nested cross-validation (CV) procedure.

  9. Building an intrusion detection system using a filter-based feature selection algorithm

    NARCIS (Netherlands)

    Ambusaidi, Mohammed A.; He, Xiangjian; Nanda, Priyadarsi; Tan, Zhiyuan

    2016-01-01

    Redundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a

  10. ECG Signal Feature Selection for Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Lichen Xun

    2013-01-01

    Full Text Available This paper aims to study the selection of features based on ECG in emotion recognition. In the process of features selection, we start from existing feature selection algorithm, and pay special attention to some of the intuitive value on ECG waveform as well. Through the use of ANOVA and heuristic search, we picked out the different features to distinguish joy and pleasure these two emotions, then we combine this with pathological analysis of ECG signals by the view of the medical experts to discuss the logic corresponding relation between ECG waveform and emotion distinguish. Through experiment, using the method in this paper we only picked out five features and reached 92% of accuracy rate in the recognition of joy and pleasure.

  11. Feature Selection in Scientific Applications

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E; Newsam, S; Kamath, C

    2004-02-27

    Numerous applications of data mining to scientific data involve the induction of a classification model. In many cases, the collection of data is not performed with this task in mind, and therefore, the data might contain irrelevant or redundant features that affect negatively the accuracy of the induction algorithms. The size and dimensionality of typical scientific data make it difficult to use any available domain information to identify features that discriminate between the classes of interest. Similarly, exploratory data analysis techniques have limitations on the amount and dimensionality of the data that can be effectively processed. In this paper, we describe applications of efficient feature selection methods to data sets from astronomy, plasma physics, and remote sensing. We use variations of recently proposed filter methods as well as traditional wrapper approaches where practical. We discuss the importance of these applications, the general challenges of feature selection in scientific datasets, the strategies for success that were common among our diverse applications, and the lessons learned in solving these problems.

  12. A novel computer-aided diagnosis system for breast MRI based on feature selection and ensemble learning.

    Science.gov (United States)

    Lu, Wei; Li, Zhe; Chu, Jinghui

    2017-03-06

    Breast cancer is a common cancer among women. With the development of modern medical science and information technology, medical imaging techniques have an increasingly important role in the early detection and diagnosis of breast cancer. In this paper, we propose an automated computer-aided diagnosis (CADx) framework for magnetic resonance imaging (MRI). The scheme consists of an ensemble of several machine learning-based techniques, including ensemble under-sampling (EUS) for imbalanced data processing, the Relief algorithm for feature selection, the subspace method for providing data diversity, and Adaboost for improving the performance of base classifiers. We extracted morphological, various texture, and Gabor features. To clarify the feature subsets' physical meaning, subspaces are built by combining morphological features with each kind of texture or Gabor feature. We tested our proposal using a manually segmented Region of Interest (ROI) data set, which contains 438 images of malignant tumors and 1898 images of normal tissues or benign tumors. Our proposal achieves an area under the ROC curve (AUC) value of 0.9617, which outperforms most other state-of-the-art breast MRI CADx systems. Compared with other methods, our proposal significantly reduces the false-positive classification rate.

  13. Optimal features selection based on circular Gabor filters and RSE in texture segmentation

    Science.gov (United States)

    Wang, Qiong; Liu, Jian; Tian, Jinwen

    2007-12-01

    This paper designs the circular Gabor filters incorporating into the human visual characteristics, and the concept of mutual information entropy in rough set is introduced to evaluate the effect of the features extracted from different filters on clustering, redundant features are got rid of, Experimental results indicate that the proposed algorithm outperforms conventional approaches in terms of both objective measurements and visual evaluation in texture segmentation.

  14. Identifying Metabolite and Protein Biomarkers in Unstable Angina In-patients by Feature Selection Based Data Mining Method

    Institute of Scientific and Technical Information of China (English)

    SHI Cheng-he; YANG Yi; WANG Wei; ZHAO Hui-hui; HOU Na; CHEN Jian-xin; SHI Qi; XU Xue-gong; WANG Juan; ZHENG Cheng-long; ZHAO Ling-yan

    2011-01-01

    Unstable angina(UA) is the most dangerous type of Coronary Heart Disease(CHD) to cause more and more mortal and morbid world wide. Identification of biomarkers for UA at the level of proteomics and metaboiomics is a better avenue to understand the inner mechanism of it. Feature selection based data mining method is better suited to identify biomarkers of UA. In this study, we carried out clinical epidemiology to collect plasmas of UA in-patients and controls. Proteomics and metabolomics data were obtained via two-dimensional difference gel electrophoresis and gas chromatography techniques. We presented a novel computational strategy to select biomarkers as few as possible for UA in the two groups of data. Firstly, decision tree was used to select biomarkers for UA and 3-fold cross validation was used to evaluate computational performances for the three methods. Alternatively, we combined independent t test and classification based data mining method as well as backward elimination technique to select, as few as possible, protein and metabolite biomarkers with best classification performances. By the method, we selected 6 proteins and 5 metabolites for UA. The novel method presented here provides a better insight into the pathology of a disease.

  15. SELECTED FEATURES OF POLISH FARMERS

    Directory of Open Access Journals (Sweden)

    Grzegorz Spychalski

    2013-12-01

    Full Text Available The paper presents results of the research carried out among farm owners in Wielkopolskie voivodeship referring to selected features of social capital. The author identifies and estimates impact of some socio-professional factors on social capital quality and derives statistical conclusion. As a result there is a list of economic policy measures facilitating rural areas development in this aspect. The level of education, civic activity and tendency for collective activity are main conditions of social capital quality in Polish rural areas.

  16. Automated Feature Selection for Experience-Based Adaptive Re-planning

    Science.gov (United States)

    2013-03-01

    MDP ) [10]. A MDP is represented as a 4-tuple (S,A,T,R), where: S is a set of states, A is a set of actions, T is a transition function, and R is the...reward function. In a factored MDP , the set of states S is determined by a set of state variables (features). In the planning system, an existing...capabilities include: types of buildings and structures to build, upgrades to research, and number and compositions of units to create in order to achieve

  17. Trains Trouble Shooting Based on Wavelet Analysis and Joint Selection Feature Classifier

    Directory of Open Access Journals (Sweden)

    Bo Yu

    2014-02-01

    Full Text Available According to urban train running status, this paper adjusts constraints, air spring and lateral damper components running status and vibration signals of vertical acceleration of the vehicle body, combined with characteristics of urban train operation, we build an optimized train operation adjustment model and put forward corresponding estimation method-- wavelet packet energy moment, for the train state. First, we analyze characteristics of the body vertical vibration, conduct wavelet packet decomposition of signals according to different conditions and different speeds, and reconstruct the band signal which with larger energy; we introduce the hybrid ideas of particle swarm algorithm, establish fault diagnosis model and use improved particle swarm algorithm to solve this model; the algorithm also gives specific steps for solution; then calculate features of each band wavelet packet energy moment. Changes of wavelet packet energy moment with different frequency bands reflect changes of the train operation state; finally, wavelet packet energy moments with different frequency band are composed as feature vector to support vector machines for fault identification

  18. On the frequency-selective features of gold nanorods-based columnar thin film metamaterial absorber

    Science.gov (United States)

    Ghasemi, Masih; Choudhury, P. K.; Baqir, M. A.; Mohamed, M. A.; Zain, A. R. M.; Majlis, B. Y.

    2016-09-01

    Metamaterials have been of great interest owing to multifarious technological applications. Among various applications of scientific need, the perfect absorber kind of property of metamaterials remains prudent. Within the context, this investigation describes the filtering/absorber applications of metasurfaces comprised of columnar nanorods of gold having circular and elliptical cross-sections. The spectral features of such absorbers are investigated in terms of absorptivity in the visible to infrared (IR) regimes. The results indicate of almost perfect absorption corresponding to certain wavelengths in the IR span. Also, multiple absorption peaks would determine the filtering characteristics of the structures under consideration. It has been found that the absorber having circular nanorods exhibits better performance than the one with elliptical nanorods in terms of the magnitude/smoothness of absorption peaks in the entire electromagnetic spectral region of interest; the case of elliptical nanorods makes the absorption spectra to yield too much of flickers in the IR range of wavelength.

  19. Intrusion Detection In Mobile Ad Hoc Networks Using GA Based Feature Selection

    CERN Document Server

    Nallusamy, R; Duraiswamy, K

    2009-01-01

    Mobile ad hoc networking (MANET) has become an exciting and important technology in recent years because of the rapid proliferation of wireless devices. MANETs are highly vulnerable to attacks due to the open medium, dynamically changing network topology and lack of centralized monitoring point. It is important to search new architecture and mechanisms to protect the wireless networks and mobile computing application. IDS analyze the network activities by means of audit data and use patterns of well-known attacks or normal profile to detect potential attacks. There are two methods to analyze: misuse detection and anomaly detection. Misuse detection is not effective against unknown attacks and therefore, anomaly detection method is used. In this approach, the audit data is collected from each mobile node after simulating the attack and compared with the normal behavior of the system. If there is any deviation from normal behavior then the event is considered as an attack. Some of the features of collected audi...

  20. Arc-Welding Spectroscopic Monitoring based on Feature Selection and Neural Networks

    Directory of Open Access Journals (Sweden)

    Jose M. Lopez- Higuera

    2008-10-01

    Full Text Available A new spectral processing technique designed for application in the on-line detection and classification of arc-welding defects is presented in this paper. A noninvasive fiber sensor embedded within a TIG torch collects the plasma radiation originated during the welding process. The spectral information is then processed in two consecutive stages. A compression algorithm is first applied to the data, allowing real-time analysis. The selected spectral bands are then used to feed a classification algorithm, which will be demonstrated to provide an efficient weld defect detection and classification. The results obtained with the proposed technique are compared to a similar processing scheme presented in previous works, giving rise to an improvement in the performance of the monitoring system.

  1. The Importance of Feature Selection in Classification

    Directory of Open Access Journals (Sweden)

    Mrs.K. Moni Sushma Deep

    2014-01-01

    Full Text Available Feature Selection is an important technique for classification for reducing the dimensionality of feature space and it removes redundant, irrelevant, or noisy data. In this paper the feature are selected based on the ranking methods.(1 Information Gain (IG attribute evaluation, (2 Gain Ratio (GR attribute evaluation, (3 Symmetrical Uncertainty (SU attribute evaluation. This paper evaluates the features which are derived from the 3 methods using supervised learning algorithms K-Nearest Neighbor and Naïve Bayes. The measures used for the classifier are True Positive, False Positive, Accuracy and they compared between the algorithm for experimental results. we have taken 2 data sets Pima and Wine from UCI Repository database.

  2. Wavelength Selection of Hyperspectral LIDAR Based on Feature Weighting for Estimation of Leaf Nitrogen Content in Rice

    Science.gov (United States)

    Du, Lin; Shi, Shuo; Gong, Wei; Yang, Jian; Sun, Jia; Mao, Feiyue

    2016-06-01

    Hyperspectral LiDAR (HSL) is a novel tool in the field of active remote sensing, which has been widely used in many domains because of its advantageous ability of spectrum-gained. Especially in the precise monitoring of nitrogen in green plants, the HSL plays a dispensable role. The exiting HSL system used for nitrogen status monitoring has a multi-channel detector, which can improve the spectral resolution and receiving range, but maybe result in data redundancy, difficulty in system integration and high cost as well. Thus, it is necessary and urgent to pick out the nitrogen-sensitive feature wavelengths among the spectral range. The present study, aiming at solving this problem, assigns a feature weighting to each centre wavelength of HSL system by using matrix coefficient analysis and divergence threshold. The feature weighting is a criterion to amend the centre wavelength of the detector to accommodate different purpose, especially the estimation of leaf nitrogen content (LNC) in rice. By this way, the wavelengths high-correlated to the LNC can be ranked in a descending order, which are used to estimate rice LNC sequentially. In this paper, a HSL system which works based on a wide spectrum emission and a 32-channel detector is conducted to collect the reflectance spectra of rice leaf. These spectra collected by HSL cover a range of 538 nm - 910 nm with a resolution of 12 nm. These 32 wavelengths are strong absorbed by chlorophyll in green plant among this range. The relationship between the rice LNC and reflectance-based spectra is modeled using partial least squares (PLS) and support vector machines (SVMs) based on calibration and validation datasets respectively. The results indicate that I) wavelength selection method of HSL based on feature weighting is effective to choose the nitrogen-sensitive wavelengths, which can also be co-adapted with the hardware of HSL system friendly. II) The chosen wavelength has a high correlation with rice LNC which can be

  3. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  4. A New Approach of Feature Selection for Text Categorization

    Institute of Scientific and Technical Information of China (English)

    CUI Zifeng; XU Baowen; ZHANG Weifeng; XU Junling

    2006-01-01

    This paper proposes a new approach of feature selection based on the independent measure between features for text categorization.A fundamental hypothesis that occurrence of the terms in documents is independent of each other,widely used in the probabilistic models for text categorization (TC), is discussed.However, the basic hypothesis is incomplete for independence of feature set.From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to obtain a feature subset.The selected subset is high in relevance with category and strong in independence between features,satisfies the basic hypothesis at maximum degree.Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.

  5. A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It's Application to Tumor Gene Expression Data Analysis.

    Science.gov (United States)

    Li, Jiangeng; Su, Lei; Pang, Zenan

    2015-12-01

    Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.

  6. Adaptive feature selection for hyperspectral data analysis

    Science.gov (United States)

    Korycinski, Donna; Crawford, Melba M.; Barnes, J. Wesley

    2004-02-01

    Hyperspectral data can potentially provide greatly improved capability for discrimination between many land cover types, but new methods are required to process these data and extract the required information. Data sets are extremely large, and the data are not well distributed across these high dimensional spaces. The increased number and resolution of spectral bands, many of which are highly correlated, is problematic for supervised statistical classification techniques when the number of training samples is small relative to the dimension of the input vector. Selection of the most relevant subset of features is one means of mitigating these effects. A new algorithm based on the tabu search metaheuristic optimization technique was developed to perform subset feature selection and implemented within a binary hierarchical tree framework. Results obtained using the new approach were compared to those from a greedy common greedy selection technique and to a Fisher discriminant based feature extraction method, both of which were implemented in the same binary hierarchical tree classification scheme. The tabu search based method generally yielded higher classification accuracies with lower variability than these other methods in experiments using hyperspectral data acquired by the EO-1 Hyperion sensor over the Okavango Delta of Botswana.

  7. Detecting Lo cal Manifold Structure for Unsup ervised Feature Selection

    Institute of Scientific and Technical Information of China (English)

    FENG Ding-Cheng; CHEN Feng; XU Wen-Li

    2014-01-01

    Unsupervised feature selection is fundamental in statistical pattern recognition, and has drawn persistent attention in the past several decades. Recently, much work has shown that feature selection can be formulated as nonlinear dimensionality reduction with discrete constraints. This line of research emphasizes utilizing the manifold learning techniques, where feature selection and learning can be studied based on the manifold assumption in data distribution. Many existing feature selection methods such as Laplacian score, SPEC (spectrum decomposition of graph Laplacian), TR (trace ratio) criterion, MSFS (multi-cluster feature selection) and EVSC (eigenvalue sensitive criterion) apply the basic properties of graph Laplacian, and select the optimal feature subsets which best preserve the manifold structure defined on the graph Laplacian. In this paper, we propose a new feature selection perspective from locally linear embedding (LLE), which is another popular manifold learning method. The main difficulty of using LLE for feature selection is that its optimization involves quadratic programming and eigenvalue decomposition, both of which are continuous procedures and different from discrete feature selection. We prove that the LLE objective can be decomposed with respect to data dimensionalities in the subset selection problem, which also facilitates constructing better coordinates from data using the principal component analysis (PCA) technique. Based on these results, we propose a novel unsupervised feature selection algorithm, called locally linear selection (LLS), to select a feature subset representing the underlying data manifold. The local relationship among samples is computed from the LLE formulation, which is then used to estimate the contribution of each individual feature to the underlying manifold structure. These contributions, represented as LLS scores, are ranked and selected as the candidate solution to feature selection. We further develop a

  8. Filter-based feature selection and support vector machine for false positive reduction in computer-aided mass detection in mammograms

    Science.gov (United States)

    Nguyen, V. D.; Nguyen, D. T.; Nguyen, T. D.; Phan, V. A.; Truong, Q. D.

    2015-02-01

    In this paper, a method for reducing false positive in computer-aided mass detection in screening mammograms is proposed. A set of 32 features, including First Order Statistics (FOS) features, Gray-Level Occurrence Matrix (GLCM) features, Block Difference Inverse Probability (BDIP) features, and Block Variation of Local Correlation coefficients (BVLC) are extracted from detected Regions-Of-Interest (ROIs). An optimal subset of 8 features is selected from the full feature set by mean of a filter-based Sequential Backward Selection (SBS). Then, Support Vector Machine (SVM) is utilized to classify the ROIs into massive regions or normal regions. The method's performance is evaluated using the area under the Receiver Operating Characteristic (ROC) curve (AUC or AZ). On a dataset consisting about 2700 ROIs detected from mini-MIAS database of mammograms, the proposed method achieves AZ=0.938.

  9. Feature dimensionality reduction for myoelectric pattern recognition: a comparison study of feature selection and feature projection methods.

    Science.gov (United States)

    Liu, Jie

    2014-12-01

    This study investigates the effect of the feature dimensionality reduction strategies on the classification of surface electromyography (EMG) signals toward developing a practical myoelectric control system. Two dimensionality reduction strategies, feature selection and feature projection, were tested on both EMG feature sets, respectively. A feature selection based myoelectric pattern recognition system was introduced to select the features by eliminating the redundant features of EMG recordings instead of directly choosing a subset of EMG channels. The Markov random field (MRF) method and a forward orthogonal search algorithm were employed to evaluate the contribution of each individual feature to the classification, respectively. Our results from 15 healthy subjects indicate that, with a feature selection analysis, independent of the type of feature set, across all subjects high overall accuracies can be achieved in classification of seven different forearm motions with a small number of top ranked original EMG features obtained from the forearm muscles (average overall classification accuracy >95% with 12 selected EMG features). Compared to various feature dimensionality reduction techniques in myoelectric pattern recognition, the proposed filter-based feature selection approach is independent of the type of classification algorithms and features, which can effectively reduce the redundant information not only across different channels, but also cross different features in the same channel. This may enable robust EMG feature dimensionality reduction without needing to change ongoing, practical use of classification algorithms, an important step toward clinical utility.

  10. Novel Feature Selection by Differential Evolution Algorithm

    Directory of Open Access Journals (Sweden)

    Ali Ghareaghaji

    2013-11-01

    Full Text Available Iris scan biometrics employs the unique characteristic and features of the human iris in order to verify the identity of in individual. In today's world, where terrorist attacks are on the rise employment of infallible security systems is a must. This makes Iris recognition systems unavoidable in emerging security. Authentication the objective function is minimized using Differential Evolutionary (DE Algorithm where the population vector is encoded using Binary Encoded Decimal to avoid the float number optimization problem. An automatic clustering of the possible values of the Lagrangian multiplier provides a detailed insight of the selected features during the proposed DE based optimization process. The classification accuracy of Support Vector Machine (SVM is used to measure the performance of the selected features. The proposed algorithm outperforms the existing DE based approaches when tested on IRIS, Wine, Wisconsin Breast Cancer, Sonar and Ionosphere datasets. The same algorithm when applied on gait based people identification, using skeleton data points obtained from Microsoft Kinect sensor, exceeds the previously reported accuracies.

  11. 基于支持向量机的特征选择%Feature Selection Based on Support Vector Machine

    Institute of Scientific and Technical Information of China (English)

    葛敏敏; 范丽亚

    2011-01-01

    主要研究了基于支持向量机的特征选择方法——特征权法,通过对两组数据进行试验,说明了特征权法在分类效果上优于F-得分法和支持向量机.%This paper is devoted to study a feature election method based on support vector machine feature weight. Experiments with two kinds of data taken from UCI machine learning repository show that feature weight method is superior to F-score method and SVM on

  12. A classification system based on a new wrapper feature selection algorithm for the diagnosis of primary and secondary polycythemia.

    Science.gov (United States)

    Korfiatis, Vasileios Ch; Asvestas, Pantelis A; Delibasis, Konstantinos K; Matsopoulos, George K

    2013-12-01

    Primary and Secondary Polycythemia are diseases of the bone marrow that affect the blood's composition and prohibit patients from becoming blood donors. Since these diseases may become fatal, their early diagnosis is important. In this paper, a classification system for the diagnosis of Primary and Secondary Polycythemia is proposed. The proposed system classifies input data into three classes; Healthy, Primary Polycythemic (PP) and Secondary Polycythemic (SP) and is implemented using two separate binary classification levels. The first level performs the Healthy/non-Healthy classification and the second level the PP/SP classification. To this end, a novel wrapper feature selection algorithm, called the LM-FM algorithm, is presented in order to maximize the classifier's performance. The algorithm is comprised of two stages that are applied sequentially: the Local Maximization (LM) stage and the Floating Maximization (FM) stage. The LM stage finds the best possible subset of a fixed predefined size, which is then used as an input for the next stage. The FM stage uses a floating size technique to search for an even better solution by varying the initially provided subset size. Then, the Support Vector Machine (SVM) classifier is used for the discrimination of the data at each classification level. The proposed classification system is compared with various well-established feature selection techniques such as the Sequential Floating Forward Selection (SFFS) and the Maximum Output Information (MOI) wrapper schemes, and with standalone classification techniques such as the Multilayer Perceptron (MLP) and SVM classifier. The proposed LM-FM feature selection algorithm combined with the SVM classifier increases the overall performance of the classification system, scoring up to 98.9% overall accuracy at the first classification level and up to 96.6% at the second classification level. Moreover, it provides excellent robustness regardless of the size of the input feature

  13. A Hybrid Feature Subset Selection using Metrics and Forward Selection

    Directory of Open Access Journals (Sweden)

    K. Fathima Bibi

    2015-04-01

    Full Text Available The aim of this study is to design a Feature Subset Selection Technique that speeds up the Feature Selection (FS process in high dimensional datasets with reduced computational cost and great efficiency. FS has become the focus of much research on decision support system areas for which data with tremendous number of variables are analyzed. Filters and wrappers are proposed techniques for the feature subset selection process. Filters make use of association based approach but wrappers adopt classification algorithms to identify important features. Filter method lacks the ability of minimization of simplification error while wrapper method burden weighty computational resource. To pull through these difficulties, a hybrid approach is proposed combining both filters and wrappers. Filter approach uses a permutation of ranker search methods and a wrapper which improves the learning accurateness and obtains a lessening in the memory requirements and finishing time. The UCI machine learning repository was chosen to experiment the approach. The classification accuracy resulted from our approach proves to be higher.

  14. Fish recognition based on the combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree

    CERN Document Server

    Alsmadi, Mutasem Khalil Sari; Noah, Shahrul Azman; Almarashdah, Ibrahim

    2009-01-01

    We presents in this paper a novel fish classification methodology based on a combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree. Unlike existing works for fish classification, which propose descriptors and do not analyze their individual impacts in the whole classification task and do not make the combination between the feature selection, image segmentation and geometrical parameter, we propose a general set of features extraction using robust feature selection, image segmentation and geometrical parameter and their correspondent weights that should be used as a priori information by the classifier. In this sense, instead of studying techniques for improving the classifiers structure itself, we consider it as a black box and focus our research in the determination of which input information must bring a robust fish discrimination.The main contribution of this paper is enhancement recognize and classify fishes...

  15. Feature Selection Based on Adaptive Fuzzy Membership Functions%基于自适应隶属度函数的特征选择

    Institute of Scientific and Technical Information of China (English)

    谢衍涛; 桑农; 张天序

    2006-01-01

    Neuro-fuzzy (NF) networks are adaptive fuzzy inference systems (FIS) and have been applied to feature selection by some researchers. However, their rule number will grow exponentially as the data dimension increases. On the other hand, feature selection algorithms with artificial neural networks (ANN) usually require normalization of input data, which will probably change some characteristics of original data that are important for classification. To overcome the problems mentioned above, this paper combines the fuzzification layer of the neuro-fuzzy system with the multi-layer perceptron (MLP) to form a new artificial neural network. Furthermore, fuzzification strategy and feature measurement based on membership space are proposed for feature selection.Finally, experiments with both natural and artificial data are carried out to compare with other methods, and the results approve the validity of the algorithm.

  16. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

    Directory of Open Access Journals (Sweden)

    Xin Ma

    2015-01-01

    Full Text Available The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR method, followed by incremental feature selection (IFS. We incorporated features of conjoint triad features and three novel features: binding propensity (BP, nonbinding propensity (NBP, and evolutionary information combined with physicochemical properties (EIPP. The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient. High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

  17. Genetic Feature Selection for Texture Classification

    Institute of Scientific and Technical Information of China (English)

    PAN Li; ZHENG Hong; ZHANG Zuxun; ZHANG Jianqing

    2004-01-01

    This paper presents a novel approach to feature subset selection using genetic algorithms. This approach has the ability to accommodate multiple criteria such as the accuracy and cost of classification into the process of feature selection and finds the effective feature subset for texture classification. On the basis of the effective feature subset selected, a method is described to extract the objects which are higher than their surroundings, such as trees or forest, in the color aerial images. The methodology presented in this paper is illustrated by its application to the problem of trees extraction from aerial images.

  18. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Classical reinforcement learning techniques become impractical in domains with large complex state spaces. The size of a domain’s state space is...require all the provided features. In this paper we present a feature selection algorithm for reinforcement learning called Incremental Feature

  19. NEW FEATURE SELECTION METHOD IN MACHINE FAULT DIAGNOSIS

    Institute of Scientific and Technical Information of China (English)

    Wang Xinfeng; Qiu Jing; Liu Guanjun

    2005-01-01

    Aiming to deficiency of the filter and wrapper feature selection methods, a new method based on composite method of filter and wrapper method is proposed. First the method filters original features to form a feature subset which can meet classification correctness rate, then applies wrapper feature selection method select optimal feature subset. A successful technique for solving optimization problems is given by genetic algorithm (GA). GA is applied to the problem of optimal feature selection. The composite method saves computing time several times of the wrapper method with holding the classification accuracy in data simulation and experiment on bearing fault feature selection. So this method possesses excellent optimization property, can save more selection time, and has the characteristics of high accuracy and high efficiency.

  20. Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

    Directory of Open Access Journals (Sweden)

    Mustafa Serter Uzer

    2013-01-01

    Full Text Available This paper offers a hybrid approach that uses the artificial bee colony (ABC algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.

  1. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-13

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.

  2. Robust prediction of B-factor profile from sequence using two-stage SVR based on random forest feature selection.

    Science.gov (United States)

    Pan, Xiao-Yong; Shen, Hong-Bin

    2009-01-01

    B-factor is highly correlated with protein internal motion, which is used to measure the uncertainty in the position of an atom within a crystal structure. Although the rapid progress of structural biology in recent years makes more accurate protein structures available than ever, with the avalanche of new protein sequences emerging during the post-genomic Era, the gap between the known protein sequences and the known protein structures becomes wider and wider. It is urgent to develop automated methods to predict B-factor profile from the amino acid sequences directly, so as to be able to timely utilize them for basic research. In this article, we propose a novel approach, called PredBF, to predict the real value of B-factor. We firstly extract both global and local features from the protein sequences as well as their evolution information, then the random forests feature selection is applied to rank their importance and the most important features are inputted to a two-stage support vector regression (SVR) for prediction, where the initial predicted outputs from the 1(st) SVR are further inputted to the 2nd layer SVR for final refinement. Our results have revealed that a systematic analysis of the importance of different features makes us have deep insights into the different contributions of features and is very necessary for developing effective B-factor prediction tools. The two-layer SVR prediction model designed in this study further enhanced the robustness of predicting the B-factor profile. As a web server, PredBF is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/PredBF for academic use.

  3. Stable Feature Selection for Biomarker Discovery

    CERN Document Server

    He, Zengyou

    2010-01-01

    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development.

  4. A Genetic-Based Feature Selection Approach in the Identification of Left/Right Hand Motor Imagery for a Brain-Computer Interface

    Science.gov (United States)

    Yaacoub, Charles; Mhanna, Georges; Rihana, Sandy

    2017-01-01

    Electroencephalography is a non-invasive measure of the brain electrical activity generated by millions of neurons. Feature extraction in electroencephalography analysis is a core issue that may lead to accurate brain mental state classification. This paper presents a new feature selection method that improves left/right hand movement identification of a motor imagery brain-computer interface, based on genetic algorithms and artificial neural networks used as classifiers. Raw electroencephalography signals are first preprocessed using appropriate filtering. Feature extraction is carried out afterwards, based on spectral and temporal signal components, and thus a feature vector is constructed. As various features might be inaccurate and mislead the classifier, thus degrading the overall system performance, the proposed approach identifies a subset of features from a large feature space, such that the classifier error rate is reduced. Experimental results show that the proposed method is able to reduce the number of features to as low as 0.5% (i.e., the number of ignored features can reach 99.5%) while improving the accuracy, sensitivity, specificity, and precision of the classifier. PMID:28124985

  5. 小样本问题的算法比较%Feature Selection Based on Support Vector Machine

    Institute of Scientific and Technical Information of China (English)

    张先荣; 范丽亚

    2011-01-01

    This paper is devoted to study a leature election method based on support vector machine feature weight. Experiments with two kinds of data taken from UCI machine learning repository show that feature weight method is superior to F-score method and SVM on%将不相关线性判别分析(ULDA)和零空间线性判别分析(NLDA)两种思想结合起来,提出了处理小样本问题的六种算法,并通过实验说明了这六种算法的分类有效性.

  6. 基于改进SVM的特征选择%Based on Modified Support Vector Machines Feature Selection

    Institute of Scientific and Technical Information of China (English)

    陈振洲; 邹丽珊

    2007-01-01

    本文在仔细分析特征选择思想的基础上,将特征选择过程嵌入到学习机里面,提出了一种基于改进支持向量机的特征选择算法(Feature selection via Modified Support Vector Machines),该方法通过对特征的权重进行排序来实现特征选择.利用可以将特征选择过程和学习过程有机地统一起来,实验表明,与其它方法比较,该方法能够达到比较好的效果.

  7. A New Feature Selection Method for Text Clustering

    Institute of Scientific and Technical Information of China (English)

    XU Junling; XU Baowen; ZHANG Weifeng; CUI Zifeng; ZHANG Wei

    2007-01-01

    Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the DaviesBouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.

  8. Classification Using Markov Blanket for Feature Selection

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Luo, Jian

    2009-01-01

    Selecting relevant features is in demand when a large data set is of interest in a classification task. It produces a tractable number of features that are sufficient and possibly improve the classification performance. This paper studies a statistical method of Markov blanket induction algorithm...... induction as a feature selection method. In addition, we point out an important assumption behind the Markov blanket induction algorithm and show its effect on the classification performance....... for filtering features and then applies a classifier using the Markov blanket predictors. The Markov blanket contains a minimal subset of relevant features that yields optimal classification performance. We experimentally demonstrate the improved performance of several classifiers using a Markov blanket...

  9. Medical Image Feature, Extraction, Selection And Classification

    Directory of Open Access Journals (Sweden)

    M.VASANTHA,

    2010-06-01

    Full Text Available Breast cancer is the most common type of cancer found in women. It is the most frequent form of cancer and one in 22 women in India is likely to suffer from breast cancer. This paper proposes a image classifier to classify the mammogram images. Mammogram image is classified into normal image, benign image and malignant image. Totally 26 features including histogram intensity features and GLCM features are extracted from mammogram image. A hybrid approach of feature selection is proposed in this paper which reduces 75% of the features. Decision tree algorithms are applied to mammography lassification by using these reduced features. Experimental results have been obtained for a data set of 113 images taken from MIAS of different types. This technique of classification has not been attempted before and it reveals the potential of Data mining in medical treatment.

  10. Feature Selection Criteria for Real Time EKF-SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando Auat Cheein

    2010-02-01

    Full Text Available This paper presents a seletion procedure for environmet features for the correction stage of a SLAM (Simultaneous Localization and Mapping algorithm based on an Extended Kalman Filter (EKF. This approach decreases the computational time of the correction stage which allows for real and constant-time implementations of the SLAM. The selection procedure consists in chosing the features the SLAM system state covariance is more sensible to. The entire system is implemented on a mobile robot equipped with a range sensor laser. The features extracted from the environment correspond to lines and corners. Experimental results of the real time SLAM algorithm and an analysis of the processing-time consumed by the SLAM with the feature selection procedure proposed are shown. A comparison between the feature selection approach proposed and the classical sequential EKF-SLAM along with an entropy feature selection approach is also performed.

  11. Integrated Clustering and Feature Selection Scheme for Text Documents.

    Directory of Open Access Journals (Sweden)

    M. Thangamani

    2010-01-01

    Full Text Available Problem statement: Text documents are the unstructured databases that contain raw data collection. The clustering techniques are used group up the text documents with reference to its similarity. Approach: The feature selection techniques were used to improve the efficiency and accuracy of clustering process. The feature selection was done by eliminate the redundant and irrelevant items from the text document contents. Statistical methods were used in the text clustering and feature selection algorithm. The cube size is very high and accuracy is low in the term based text clustering and feature selection method. The semantic clustering and feature selection method was proposed to improve the clustering and feature selection mechanism with semantic relations of the text documents. The proposed system was designed to identify the semantic relations using the ontology. The ontology was used to represent the term and concept relationship. Results: The synonym, meronym and hypernym relationships were represented in the ontology. The concept weights were estimated with reference to the ontology. The concept weight was used for the clustering process. The system was implemented in two methods. They were term clustering with feature selection and semantic clustering with feature selection. Conclusion: The performance analysis was carried out with the term clustering and semantic clustering methods. The accuracy and efficiency factors were analyzed in the performance analysis.

  12. Intrusion feature selection algorithm based on Particle Swarm Opti- mization%基于粒子群优化的入侵特征选择算法

    Institute of Scientific and Technical Information of China (English)

    吴庆涛; 曹继邦; 郑瑞娟; 张聚伟

    2013-01-01

      针对高维数入侵检测数据集中信息冗余导致入侵检测算法处理速度慢的问题,提出了一种基于粒子群优化的入侵特征选择算法,通过分析网络入侵数据特征之间的相关性,可使粒子群优化算法在所有特征空间中优化搜索,自主选择有效特征子集,降低数据维度。实验结果表明该算法能够有效去除冗余特征,减少特征选择时间,在保证检测准确率的前提下,有效地提高了系统的检测速度。%Intrusion feature selection can improve the correctness and detection rate of the intrusion detection system effectively. A intrusion feature subset selection algorithm based on Particle Swarm Optimization(PSO)is proposed. Depending on the analyses of correlation between all features of network intrusion data, the PSO algorithm is used to search and choose effective feature subset independently to reduce data dimension in the feature space. The experimental results show that the algorithm can effec-tively remove redundant features, reduce the time of feature selection, ensure detection accuracy and improve detecting speed.

  13. Optimized Image Steganalysis through Feature Selection using MBEGA

    CERN Document Server

    Geetha, S

    2010-01-01

    Feature based steganalysis, an emerging branch in information forensics, aims at identifying the presence of a covert communication by employing the statistical features of the cover and stego image as clues/evidences. Due to the large volumes of security audit data as well as complex and dynamic properties of steganogram behaviours, optimizing the performance of steganalysers becomes an important open problem. This paper is focussed at fine tuning the performance of six promising steganalysers in this field, through feature selection. We propose to employ Markov Blanket-Embedded Genetic Algorithm (MBEGA) for stego sensitive feature selection process. In particular, the embedded Markov blanket based memetic operators add or delete features (or genes) from a genetic algorithm (GA) solution so as to quickly improve the solution and fine-tune the search. Empirical results suggest that MBEGA is effective and efficient in eliminating irrelevant and redundant features based on both Markov blanket and predictive pow...

  14. Features Based Text Similarity Detection

    CERN Document Server

    Kent, Chow Kok

    2010-01-01

    As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, in handling some large content articles, there are some weaknesses in fingerprint matching technique especially in space and time consumption issue. In this paper, we propose a new approach to detect plagiarism which integrates the use of fingerprint matching technique with four key features to assist in the detection process. These proposed features are capable to choose the main point or key sentence in the articles to be compared. Those selected sentence will be undergo the fingerprint matching process in order to detect the similarity between the sentences. Hence, time and space usage for the comparison process is r...

  15. Hadoop neural network for parallel and distributed feature selection.

    Science.gov (United States)

    Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

    2016-06-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.

  16. 基于相关性和冗余度的联合特征选择方法%Joint Feature Selection Method Based on Relevance and Redundancy

    Institute of Scientific and Technical Information of China (English)

    周城; 葛斌; 唐九阳; 肖卫东

    2012-01-01

    比较研究了与类别信息无关的文档频率和与类别信息有关的信息增益、互信息和X2统计特征选择方法,在此基础上分析了以往直接组合这两类特征选择方法的弊端,并提出基于相关性和冗余度的联合特征选择算法.该算法将文档频率方法分别与信息增益、互信息和X2统计方法联合进行特征选择,旨在删除冗余特征,并保留有利于分类的特征,从而提高文本情感分类效果.实验结果表明,该联合特征选择方法具有较好的性能,并且能够有效降低特征维数.%Based on a comparative study of four feature selection methods, including document frequency(DF) unrelated to class information, and information gain(IG), mutual information(MI) and chi-square statistic (CHI), which are related to class information, we analyzed the disadvantages of combining these two kinds of methods directly and proposed a joint feature selection method based on relevance and redundancy to joint DF and one of IG,MI and CHL This approach aims to eliminate redundant features,find useful features for classification and consequently improve the accuracy of text sentiment classification. The results of the experiment show that the proposed method can not only improve the performance but also reduce the feature dimension.

  17. Coevolution of active vision and feature selection.

    Science.gov (United States)

    Floreano, Dario; Kato, Toshifumi; Marocco, Davide; Sauser, Eric

    2004-03-01

    We show that complex visual tasks, such as position- and size-invariant shape recognition and navigation in the environment, can be tackled with simple architectures generated by a coevolutionary process of active vision and feature selection. Behavioral machines equipped with primitive vision systems and direct pathways between visual and motor neurons are evolved while they freely interact with their environments. We describe the application of this methodology in three sets of experiments, namely, shape discrimination, car driving, and robot navigation. We show that these systems develop sensitivity to a number of oriented, retinotopic, visual-feature-oriented edges, corners, height, and a behavioral repertoire to locate, bring, and keep these features in sensitive regions of the vision system, resembling strategies observed in simple insects.

  18. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HU Yue-li; CAO Jia-lin; ZHAO Qian; FENG Xu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92 % using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  19. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HUYue-li; CAOJia-lin; ZHAOQian; FENGXu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92% using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  20. Ensemble feature selection integrating elitist roles and quantum game model

    Institute of Scientific and Technical Information of China (English)

    Weiping Ding; Jiandong Wang; Zhijin Guan; Quan Shi

    2015-01-01

    To accelerate the selection process of feature subsets in the rough set theory (RST), an ensemble elitist roles based quantum game (EERQG) algorithm is proposed for feature selec-tion. Firstly, the multilevel elitist roles based dynamics equilibrium strategy is established, and both immigration and emigration of elitists are able to be self-adaptive to balance between exploration and exploitation for feature selection. Secondly, the utility matrix of trust margins is introduced to the model of multilevel elitist roles to enhance various elitist roles’ performance of searching the optimal feature subsets, and the win-win utility solutions for feature selec-tion can be attained. Meanwhile, a novel ensemble quantum game strategy is designed as an intriguing exhibiting structure to perfect the dynamics equilibrium of multilevel elitist roles. Final y, the en-semble manner of multilevel elitist roles is employed to achieve the global minimal feature subset, which wil greatly improve the fea-sibility and effectiveness. Experiment results show the proposed EERQG algorithm has superiority compared to the existing feature selection algorithms.

  1. Protein fold classification with genetic algorithms and feature selection.

    Science.gov (United States)

    Chen, Peng; Liu, Chunmei; Burge, Legand; Mahmood, Mohammad; Southerland, William; Gloster, Clay

    2009-10-01

    Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors.

  2. DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS

    Directory of Open Access Journals (Sweden)

    A. A. Vorobeva

    2017-01-01

    Full Text Available The paper deals with identification and authentication of web users participating in the Internet information processes (based on features of online texts.In digital forensics web user identification based on various linguistic features can be used to discover identity of individuals, criminals or terrorists using the Internet to commit cybercrimes. Internet could be used as a tool in different types of cybercrimes (fraud and identity theft, harassment and anonymous threats, terrorist or extremist statements, distribution of illegal content and information warfare. Linguistic identification of web users is a kind of biometric identification, it can be used to narrow down the suspects, identify a criminal and prosecute him. Feature set includes various linguistic and stylistic features extracted from online texts. We propose dynamic feature selection for each web user identification task. Selection is based on calculating Manhattan distance to k-nearest neighbors (Relief-f algorithm. This approach improves the identification accuracy and minimizes the number of features. Experiments were carried out on several datasets with different level of class imbalance. Experiment results showed that features relevance varies in different set of web users (probable authors of some text; features selection for each set of web users improves identification accuracy by 4% at the average that is approximately 1% higher than with the use of static set of features. The proposed approach is most effective for a small number of training samples (messages per user.

  3. 融合PLS监督特征提取和虚假最近邻点的数据分类特征选择%Feature selection for data classification based on pls supervised feature extraction and false nearest neighbors

    Institute of Scientific and Technical Information of China (English)

    颜克胜; 李太福; 魏正元; 苏盈盈; 姚立忠

    2012-01-01

    The classifier is often led to the problem of low recognition accuracy and time and space overhead, due to the multicollinearity and redundant features and noise in the classification of high dimensional data. A feature selection method based on partial least squares(PLS) and false nearest neighbors(FNN) is proposed. Firstly, the partial least squares method is employed to extract the principal components of high-dimensional data and overcome difficulties encountered with the existing multicollinearity between the original features, and the independent principal components space which carries supervision information could be obtained. Then, the similarity measure based on FNN would be established by calculating the correlation in this space before and after each feature selection, furthermore, gets the original features ranking of interpretation to the dependent variable. Finally, the features which have weak explanatory ability could be removed in turn to construct various classification models, and uses recognition rate of Support Vector Machine(SVM) as a evaluation criterion of models to search out the classification model which not only has the highest recognition rate, but also contains the least number of features, the best feature subset is the just model. A series of experiments from different data models have been conducted. The simulation results show that this method has a good capability to select the best feature subset which is consistent with the nature of classification feature for the data set. Therefore, the research provides a new approach to the feature selection of data classification.%在高维数据分类中,针对多重共线性、冗余特征及噪声易导致分类器识别精度低和时空开销大的问题,提出融合偏最小二乘(Partial Least Squares,PLS)有监督特征提取和虚假最近邻点(False Nearest Neighbors,FNN)的特征选择方法:首先利用偏最小二乘对高维数据提取主元,消除特征之间的多重共

  4. 一种双重过滤式特征选择算法%New feature selection algorithm based on two-phase filter

    Institute of Scientific and Technical Information of China (English)

    计智伟; 胡珉

    2011-01-01

    特征选择是模式识别和机器学习领域的重要问题.针对目前Filter和Wrapper方法,以及传统二阶段组合式方法存在的缺陷,提出了一种双重过滤式特征选择方法FSTPF,并在三个国际公认数据集和一个盾构隧道施工实时数据集上进行了验证测试.实验结果表明,FSTPF算法降维效果好,且获得的优化特征子集的分类准确率得到了提高.%Feature selection is an important problem in the pattern recognition and machine learning areas.Aimed at the question that there are some shortcomings in the actual Filter, Wrapper and tradictional two-phase combined methods, this paper proposes a Feature Selection algorithm based on Two-Phase Filter(FSTPF),and it is used to test in three international accepted datasets and a shield tunneling construncting real-time dataset.The emulational experiment shows that FSTPF can get good effect of reducting dimension and improve the classification accuracy of best feature subset.

  5. A Feature Selection Method Based on Maximal Marginal Relevance%一种基于最大边缘相关的特征选择方法

    Institute of Scientific and Technical Information of China (English)

    刘赫; 张相洪; 刘大有; 李燕军; 尹立军

    2012-01-01

    文本分类的特点是高维的特征空间和高度的特征冗余.针对这两个特点,采用x2统计量处理高维的特征空间,利用信息新颖度的思想处理高度的特征冗余,根据最大边缘相关的定义,将二者有机结合,提出一种基于最大边缘相关的特征选择方法.该方法可以在特征选择过程中减少大量的冗余特征.最后,在Reuters-21578 Topl0和OHSCAL两个文本数据集上进行实验.实验结果表明,基于最大边缘相关的特征选择方法比x2统计量和信息增益两种特征选择方法更高效,并且能够提高naive Bayes,Rocchio和kNN 3种不同分类器的性能.%With the rapid growth of textual information on the Internet, text categorization has already been one of the key research directions in data mining. Text categorization is a supervised learning process, defined as automatically distributing free text into one or more predefined categories. At the present, text categorization is necessary for managing textual information and has been applied into many fields. However, text categorization has two characteristics: high dimensionality of feature space and high level of feature redundancy. For the two characteristics, X is used to deal with high dimensionality of feature space, and information novelty is used to deal with high level of feature redundancy. According to the definition of maximal marginal relevance, a feature selection method based on maximal marginal relevance is proposed, which can reduce redundancy between features in the process of feature selection. Furthermore, the experiments are carried out on two text data sets, namely, Reuters-21578 ToplO and OHSCAL. The results indicate that the featureselection method based on maximal marginal relevance is more efficient than X and information gain. Moveover it can improve the performance of three different categorizers, namely, naive Bayes, Rocchio and k NN.

  6. 分类分析中基于信息论准则的特征选取%Feature Selection for Classificatory Analysis Based on Information-theoretic Criteria

    Institute of Scientific and Technical Information of China (English)

    黄金杰; 吕宁; 李双全; 蔡云泽

    2008-01-01

    Feature selection aims to reduce the dimensionality of patterns for classificatory analysis by selecting the most informative instead of irrelevant and/or redundant features. In this study, two novel information-theoretic measures for feature ranking are presented: one is an improved formula to estimate the conditional mutual information between the candidate feature fi and the target class C given the subset of selected features S, i. e., I(C; fi|S), under the assumption that information of features is distributed uniformly; the other is a mutual information (MI) based constructive criterion that is able to capture both irrelevant and redundant input features under arbitrary distributions of information of features. With these two measures, two new feature selection algorithms,called the quadratic MI-based feature selection (QMIFS) approach and the MI-based constructive criterion (MICC) approach,respectively, are proposed, in which no parameters like β in Battiti's MIFS and (Kwak and Choi)'s MIFS-U methods need to be preset. Thus, the intractable problem of how to choose an appropriate value for β to do the tradeoff between the relevance to the target classes and the redundancy with the already-selected features is avoided completely. Experimental results demonstrate the good performances of QMIFS and MICC on both synthetic and benchmark data sets.

  7. Multi-task GLOH feature selection for human age estimation

    CERN Document Server

    Liang, Yixiong; Xu, Ying; Xiang, Yao; Zou, Beiji

    2011-01-01

    In this paper, we propose a novel age estimation method based on GLOH feature descriptor and multi-task learning (MTL). The GLOH feature descriptor, one of the state-of-the-art feature descriptor, is used to capture the age-related local and spatial information of face image. As the exacted GLOH features are often redundant, MTL is designed to select the most informative feature bins for age estimation problem, while the corresponding weights are determined by ridge regression. This approach largely reduces the dimensions of feature, which can not only improve performance but also decrease the computational burden. Experiments on the public available FG-NET database show that the proposed method can achieve comparable performance over previous approaches while using much fewer features.

  8. An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications.

    Science.gov (United States)

    Ye, Fei; Lou, Xin Yuan; Sun, Lin Fu

    2017-01-01

    This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm's performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem.

  9. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  10. Network Intrusion Detection Based on Features Selecting and Samples Selecting%基于特征选取和样本选择的网络入侵检测

    Institute of Scientific and Technical Information of China (English)

    马世欢; 胡彬

    2015-01-01

    In order to obtain a more ideal network intrusion detection results, according to the network intrusion feature selection and sample selection problem, this paper proposes a network intrusion detection model based on features selecting and samples selecting. Firstly, the features of network intrusion are extracted, and normalized, and secondly kernel principal component analysis is used to select intrusion features, and the samples are selection, finally, extreme learning machine is used to set up network intrusion detection classifier, and the simulation experiments are carried out with KDD Cup99 data. The simulation results show that that the proposed model has been better network intrusion detection results, the detection rate is above 95%, the efficiency of intrusion detection can meet the requirements of network security protection.%为了获得更加理想的网络入侵检测结果,针对网络入侵特征选取和样本选择问题,提出一种基于特征选取和样本选择的网络入侵检测模型。首先提取网络入侵特征,并进行归一化处理,然后采用核主成分分析选择入侵特征,并对样本进行选择,最后采用极限学习机建立网络入侵检测分类器,并采用KDD Cup99数据集进行仿真实验。仿真结果表明,本文模型得到了理想的网络入侵检测结果,检测率超过95%以上,入侵检测效率可以满足网络安全实际应用要求。

  11. WE-AB-204-04: Feature Selection and Clustering Optimization for Pseudo-CT Generation in MR-Based Attenuation Correction and Radiation Therapy Planning

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, J; Su, K [Case Center for Imaging Research, Case Western Reserve University, Cleveland, Ohio (United States); Department of Radiology, University Hospitals Case Medical Center, Case Western Reserve University, Cleveland, Ohio (United States); Hu, L; Traughber, M [Philips Healthcare, Cleveland, Ohio (United States); Pereira, G; Traughber, B [Department of Radiation Oncology, University Hospitals Seidman Cancer Center, Case Western Reserve University, Cleveland, Ohio (United States); Herrmann, K [Department of Radiology, University Hospitals Case Medical Center, Case Western Reserve University, Cleveland, Ohio (United States); Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, Ohio (United States); Muzic, R [Case Center for Imaging Research, Case Western Reserve University, Cleveland, Ohio (United States); Department of Radiology, University Hospitals Case Medical Center, Case Western Reserve University, Cleveland, Ohio (United States); Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH (United States)

    2015-06-15

    Purpose: Accurate and robust photon attenuation derived from MR is essential for PET/MR and MR-based radiation treatment planning applications. Although the fuzzy C-means (FCM) algorithm has been applied for pseudo-CT generation, the input feature combination and the number of clusters have not been optimized. This study aims to optimize both for clinically practical pseudo-CT generation. Methods: Nine volunteers were recruited. A 190-second, single-acquisition UTE-mDixon with 25% (angular) sampling and 3D radial readout was performed to acquire three primitive MR features at TEs of 0.1, 1.5, and 2.8 ms: the free-induction-decay (FID), the first and the second echo images. Three derived images, Dixon-fat and Dixon-water generated by two-point Dixon water/fat separation, and R2* (1/T2*) map, were also created. To identify informative inputs for generating a pseudo-CT image volume, all 63 combinations, choosing one to six of the feature images, were used as inputs to FCM for pseudo-CT generation. Further, the number of clusters was varied from four to seven to find the optimal approach. Mean prediction deviation (MPD), mean absolute prediction deviation (MAPD), and correlation coefficient (R) of different combinations were compared for feature selection. Results: Among the 63 feature combinations, the four that resulted in the best MAPD and R were further compared along with the set containing all six features. The results suggested that R2* and Dixon-water are the most informative features. Further, including FID also improved the performance of pseudo-CT generation. Consequently, the set containing FID, Dixon-water, and R2* resulted in the most accurate, robust pseudo-CT when the number of cluster equals to five (5C). The clusters were interpreted as air, fat, bone, brain, and fluid. The six-cluster Result additionally included bone marrow. Conclusion: The results suggested that FID, Dixon-water, R2* are the most important features. The findings can be used to

  12. Effective feature selection for image steganalysis using extreme learning machine

    Science.gov (United States)

    Feng, Guorui; Zhang, Haiyan; Zhang, Xinpeng

    2014-11-01

    Image steganography delivers secret data by slight modifications of the cover. To detect these data, steganalysis tries to create some features to embody the discrepancy between the cover and steganographic images. Therefore, the urgent problem is how to design an effective classification architecture for given feature vectors extracted from the images. We propose an approach to automatically select effective features based on the well-known JPEG steganographic methods. This approach, referred to as extreme learning machine revisited feature selection (ELM-RFS), can tune input weights in terms of the importance of input features. This idea is derived from cross-validation learning and one-dimensional (1-D) search. While updating input weights, we seek the energy decreasing direction using the leave-one-out (LOO) selection. Furthermore, we optimize the 1-D energy function instead of directly discarding the least significant feature. Since recent Liu features can gain considerable low detection errors compared to a previous JPEG steganalysis, the experimental results demonstrate that the new approach results in less classification error than other classifiers such as SVM, Kodovsky ensemble classifier, direct ELM-LOO learning, kernel ELM, and conventional ELM in Liu features. Furthermore, ELM-RFS achieves a similar performance with a deep Boltzmann machine using less training time.

  13. Feature-selective attention in healthy old age: a selective decline in selective attention?

    Science.gov (United States)

    Quigley, Cliodhna; Müller, Matthias M

    2014-02-12

    Deficient selection against irrelevant information has been proposed to underlie age-related cognitive decline. We recently reported evidence for maintained early sensory selection when older and younger adults used spatial selective attention to perform a challenging task. Here we explored age-related differences when spatial selection is not possible and feature-selective attention must be deployed. We additionally compared the integrity of feedforward processing by exploiting the well established phenomenon of suppression of visual cortical responses attributable to interstimulus competition. Electroencephalogram was measured while older and younger human adults responded to brief occurrences of coherent motion in an attended stimulus composed of randomly moving, orientation-defined, flickering bars. Attention was directed to horizontal or vertical bars by a pretrial cue, after which two orthogonally oriented, overlapping stimuli or a single stimulus were presented. Horizontal and vertical bars flickered at different frequencies and thereby elicited separable steady-state visual-evoked potentials, which were used to examine the effect of feature-based selection and the competitive influence of a second stimulus on ongoing visual processing. Age differences were found in feature-selective attentional modulation of visual responses: older adults did not show consistent modulation of magnitude or phase. In contrast, the suppressive effect of a second stimulus was robust and comparable in magnitude across age groups, suggesting that bottom-up processing of the current stimuli is essentially unchanged in healthy old age. Thus, it seems that visual processing per se is unchanged, but top-down attentional control is compromised in older adults when space cannot be used to guide selection.

  14. Feature Selection Algorithm for Incomplete Data Based on Information Entropy%基于信息熵的不完备数据特征选择算法

    Institute of Scientific and Technical Information of China (English)

    陈圣兵; 王晓峰

    2014-01-01

    Grounded on the analysis of the existing incomplete information entropy, the concept of incomplete information entropy based on similarity relations ( SIIE ) is proposed, and some properties of SIIE are discussed. A feature selection algorithm for incomplete data is presented. In this algorithm, SIIE of incomplete data is calculated directly, and SIIE is taken as the criteria for feature selection. Then, the sequential forward floating search method is employed to addresses the problem of correlation among features. Experiments on UCI database are carried out, and the results indicate the accuracy and efficiency of the proposed algorithm.%在分析已有不完备信息熵的基础上,提出一种基于相似关系的不完备信息熵,并证明该信息熵的若干性质。给出一个不完备数据特征选择算法,算法以改进的不完备信息熵作为特征选择准则,直接对不完备数据的特征进行熵值分析,并采用顺序前向浮动选择方法解决特征间的相关性问题。最后在UCI实测数据集上的实验表明,文中算法具有更高的准确率和更快的特征选择速度。

  15. Feature selection versus feature compression in the building of calibration models from FTIR-spectrophotometry datasets.

    Science.gov (United States)

    Vergara, Alexander; Llobet, Eduard

    2012-01-15

    Undoubtedly, FTIR-spectrophotometry has become a standard in chemical industry for monitoring, on-the-fly, the different concentrations of reagents and by-products. However, representing chemical samples by FTIR spectra, which spectra are characterized by hundreds if not thousands of variables, conveys their own set of particular challenges because they necessitate to be analyzed in a high-dimensional feature space, where many of these features are likely to be highly correlated and many others surely affected by noise. Therefore, identifying a subset of features that preserves the classifier/regressor performance seems imperative prior any attempt to build an appropriate pattern recognition method. In this context, we investigate the benefit of utilizing two different dimensionality reduction methods, namely the minimum Redundancy-Maximum Relevance (mRMR) feature selection scheme and a new self-organized map (SOM) based feature compression, coupled to regression methods to quantitatively analyze two-component liquid samples utilizing FTIR spectrophotometry. Since these methods give us the possibility of selecting a small subset of relevant features from FTIR spectra preserving the statistical characteristics of the target variable being analyzed, we claim that expressing the FTIR spectra by these dimensionality-reduced set of features may be beneficial. We demonstrate the utility of these novel feature selection schemes in quantifying the distinct analytes within their binary mixtures utilizing a FTIR-spectrophotometer.

  16. Confidence-Based Feature Acquisition

    Science.gov (United States)

    Wagstaff, Kiri L.; desJardins, Marie; MacGlashan, James

    2010-01-01

    Confidence-based Feature Acquisition (CFA) is a novel, supervised learning method for acquiring missing feature values when there is missing data at both training (learning) and test (deployment) time. To train a machine learning classifier, data is encoded with a series of input features describing each item. In some applications, the training data may have missing values for some of the features, which can be acquired at a given cost. A relevant JPL example is that of the Mars rover exploration in which the features are obtained from a variety of different instruments, with different power consumption and integration time costs. The challenge is to decide which features will lead to increased classification performance and are therefore worth acquiring (paying the cost). To solve this problem, CFA, which is made up of two algorithms (CFA-train and CFA-predict), has been designed to greedily minimize total acquisition cost (during training and testing) while aiming for a specific accuracy level (specified as a confidence threshold). With this method, it is assumed that there is a nonempty subset of features that are free; that is, every instance in the data set includes these features initially for zero cost. It is also assumed that the feature acquisition (FA) cost associated with each feature is known in advance, and that the FA cost for a given feature is the same for all instances. Finally, CFA requires that the base-level classifiers produce not only a classification, but also a confidence (or posterior probability).

  17. Unsupervised Feature Selection for Latent Dirichlet Allocation

    Institute of Scientific and Technical Information of China (English)

    Xu Weiran; Du Gang; Chen Guang; Guo Jun; Yang Jie

    2011-01-01

    As a generative model Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to distinguish between “general word” and “special word” in LDA topics.Therefore,we add a constraint to the LDA objective function to let the “general words” only happen in “general topics”other than “special topics”.Then a heuristic algorithm is presented to get the solution.Experiments show that this method can not only improve the information gain of topics,but also make the topics easier to understand by human.

  18. Bayesian feature selection to estimate customer survival

    OpenAIRE

    Figini, Silvia; Giudici, Paolo; Brooks, S P

    2006-01-01

    We consider the problem of estimating the lifetime value of customers, when a large number of features are present in the data. In order to measure lifetime value we use survival analysis models to estimate customer tenure. In such a context, a number of classical modelling challenges arise. We will show how our proposed Bayesian methods perform, and compare it with classical churn models on a real case study. More specifically, based on data from a media service company, our aim will be to p...

  19. Binary classification of chalcone derivatives with LDA or KNN based on their antileishmanial activity and molecular descriptors selected using the Successive Projections Algorithm feature-selection technique.

    Science.gov (United States)

    Goodarzi, Mohammad; Saeys, Wouter; de Araujo, Mario Cesar Ugulino; Galvão, Roberto Kawakami Harrop; Vander Heyden, Yvan

    2014-01-23

    Chalcones are naturally occurring aromatic ketones, which consist of an α-, β-unsaturated carbonyl system joining two aryl rings. These compounds are reported to exhibit several pharmacological activities, including antiparasitic, antibacterial, antifungal, anticancer, immunomodulatory, nitric oxide inhibition and anti-inflammatory effects. In the present work, a Quantitative Structure-Activity Relationship (QSAR) study is carried out to classify chalcone derivatives with respect to their antileishmanial activity (active/inactive) on the basis of molecular descriptors. For this purpose, two techniques to select descriptors are employed, the Successive Projections Algorithm (SPA) and the Genetic Algorithm (GA). The selected descriptors are initially employed to build Linear Discriminant Analysis (LDA) models. An additional investigation is then carried out to determine whether the results can be improved by using a non-parametric classification technique (One Nearest Neighbour, 1NN). In a case study involving 100 chalcone derivatives, the 1NN models were found to provide better rates of correct classification than LDA, both in the training and test sets. The best result was achieved by a SPA-1NN model with six molecular descriptors, which provided correct classification rates of 97% and 84% for the training and test sets, respectively.

  20. Feature Selection for Chemical Sensor Arrays Using Mutual Information

    Science.gov (United States)

    Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.

    2014-01-01

    We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058

  1. An Improved Particle Swarm Optimization for Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Yuanning Liu; Gang Wang; Huiling Chen; Hao Dong; Xiaodong Zhu; Sujing Wang

    2011-01-01

    Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-swarm strategy have been reported. However, the competition among swarms, reservation or destruction of a swarm, has not been considered further. In this paper, we formulate four rules by introducing the mechanism for survival of the fittest, which simulates the competition among the swarms. Based on the mechanism, we design a modified Multi-Swarm PSO (MSPSO) to solve discrete problems,which consists of a number of sub-swarms and a multi-swarm scheduler that can monitor and control each sub-swarm using the rules. To further settle the feature selection problems, we propose an Improved Feature Selection (IFS) method by integrating MSPSO, Support Vector Machines (SVM) with F-score method. The IFS method aims to achieve higher generalization capability through performing kernel parameter optimization and feature selection simultaneously. The performance of the proposed method is compared with that of the standard PSO based, Genetic Algorithm (GA) based and the grid search based methods on 10 benchmark datasets, taken from UCI machine learning and StatLog databases. The numerical results and statistical analysis show that the proposed IFS method performs significantly better than the other three methods in terms of prediction accuracy with smaller subset of features.

  2. Feature-Based Classification of Networks

    CERN Document Server

    Barnett, Ian; Kuijjer, Marieke L; Mucha, Peter J; Onnela, Jukka-Pekka

    2016-01-01

    Network representations of systems from various scientific and societal domains are neither completely random nor fully regular, but instead appear to contain recurring structural building blocks. These features tend to be shared by networks belonging to the same broad class, such as the class of social networks or the class of biological networks. At a finer scale of classification within each such class, networks describing more similar systems tend to have more similar features. This occurs presumably because networks representing similar purposes or constructions would be expected to be generated by a shared set of domain specific mechanisms, and it should therefore be possible to classify these networks into categories based on their features at various structural levels. Here we describe and demonstrate a new, hybrid approach that combines manual selection of features of potential interest with existing automated classification methods. In particular, selecting well-known and well-studied features that ...

  3. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Roy Kaushik

    2008-01-01

    Full Text Available Abstract The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  4. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Prabir Bhattacharya

    2008-06-01

    Full Text Available The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  5. 基于链接关系的半监督特征选择算法%Linked Social Media Data Based Semi-Supervised Feature Selection Method

    Institute of Scientific and Technical Information of China (English)

    王亦兵; 潘志松; 吴君青; 贾波; 胡谷雨

    2014-01-01

    社会媒体网络产生的海量、高维无标记数据给数据处理工作带来巨大挑战,同时数据样本间构成的链接图信息在现有模式识别算法中难以有效利用。基于此,文中充分挖掘社会媒体网络数据链接关系图,结合部分监督信息提出一种基于链接关系的半监督特征选择算法( SSLFS)。该算法利用谱分析和稀疏约束,使得选出的特征子集保持原数据的局部流形和稀疏特性。在社会媒体数据集Flickr上的实验结果表明,SSLFS相比其他特征选择方法得到的特征子集在分类性能上有较显著提高。%Mountains of high-dimensional, unlabeled data are produced by the social media network, which brings tremendous challenges to the data processing. Meanwhile, the linked graph information between data samples can not be effectively used in the existing pattern recognition algorithms. A semi-supervised feature selection method ( SSLFS) based on linked relations is proposed combined with a little supervised information after mining the linked graph of social media network. Through spectral analysis and sparsity constraint, SSLFS selects feature subsets which maintain the characteristics of local manifold and sparsity. The experimental results on the Flickr dataset show that the subset obtained by SSLFS is more effective when applied to classification compared with those by other methods.

  6. Towards literature-based feature selection for diagnostic classification: A meta-analysis of resting-state fMRI in depression

    Directory of Open Access Journals (Sweden)

    Benedikt eSundermann

    2014-09-01

    Full Text Available Information derived from functional magnetic resonance imaging (fMRI during wakeful rest has been introduced as a candidate diagnostic biomarker in unipolar major depressive disorder (MDD. Multiple reports of resting state fMRI in MDD describe group effects. Such prior knowledge can be adopted to pre-select potentially discriminating features for diagnostic classification models with the aim to improve diagnostic accuracy. Purpose of this analysis was to consolidate spatial information about alterations of spontaneous brain activity in MDD, primarily to serve as feature selection for multivariate pattern analysis techniques (MVPA. 32 studies were included in final analyses. Coordinates extracted from the original reports were assigned to two categories based on directionality of findings. Meta-analyses were calculated using the non-additive activation likelihood estimation approach with coordinates organized by subject group to account for non-independent samples. Converging evidence revealed a distributed pattern of brain regions with increased or decreased spontaneous activity in MDD. The most distinct finding was hyperactivity/hyperconnectivity presumably reflecting the interaction of cortical midline structures (posterior default mode network components including the precuneus and neighboring posterior cingulate cortices associated with self-referential processing and the subgenual anterior cingulate and neighboring medial frontal cortices with lateral prefrontal areas related to externally-directed cognition. Other areas of hyperactivity/hyperconnectivity include the left lateral parietal cortex, right hippocampus and right cerebellum whereas hypoactivity/hypoconnectivity was observed mainly in the left temporal cortex, the insula, precuneus, superior frontal gyrus, lentiform nucleus and thalamus. Results are made available in two different data formats to be used as spatial hypotheses in future studies, particularly for diagnostic

  7. Feature Selection for Wheat Yield Prediction

    Science.gov (United States)

    Ruß, Georg; Kruse, Rudolf

    Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.

  8. Correlation Feature Selection and Mutual Information Theory Based Quantitative Research on Meteorological Impact Factors of Module Temperature for Solar Photovoltaic Systems

    Directory of Open Access Journals (Sweden)

    Yujing Sun

    2016-12-01

    Full Text Available The module temperature is the most important parameter influencing the output power of solar photovoltaic (PV systems, aside from solar irradiance. In this paper, we focus on the interdisciplinary research that combines the correlation analysis, mutual information (MI and heat transfer theory, which aims to figure out the correlative relations between different meteorological impact factors (MIFs and PV module temperature from both quality and quantitative aspects. The identification and confirmation of primary MIFs of PV module temperature are investigated as the first step of this research from the perspective of physical meaning and mathematical analysis about electrical performance and thermal characteristic of PV modules based on PV effect and heat transfer theory. Furthermore, the quantitative description of the MIFs influence on PV module temperature is mathematically formulated as several indexes using correlation-based feature selection (CFS and MI theory to explore the specific impact degrees under four different typical weather statuses named general weather classes (GWCs. Case studies for the proposed methods were conducted using actual measurement data of a 500 kW grid-connected solar PV plant in China. The results not only verified the knowledge about the main MIFs of PV module temperatures, more importantly, but also provide the specific ratio of quantitative impact degrees of these three MIFs respectively through CFS and MI based measures under four different GWCs.

  9. Spatial selection of features within perceived and remembered objects

    Directory of Open Access Journals (Sweden)

    Duncan E Astle

    2009-04-01

    Full Text Available Our representation of the visual world can be modulated by spatially specific attentional biases that depend flexibly on task goals. We compared searching for task-relevant features in perceived versus remembered objects. When searching perceptual input, selected task-relevant and suppressed task-irrelevant features elicited contrasting spatiotopic ERP effects, despite them being perceptually identical. This was also true when participants searched a memory array, suggesting that memory had retained the spatial organisation of the original perceptual input and that this representation could be modulated in a spatially specific fashion. However, task-relevant selection and task-irrelevant suppression effects were of the opposite polarity when searching remembered compared to perceived objects. We suggest that this surprising result stems from the nature of feature- and object-based representations when stored in visual short-term memory. When stored, features are integrated into objects, meaning that the spatially specific selection mechanisms must operate upon objects rather than specific feature-level representations.

  10. A Features Selection for Crops Classification

    Science.gov (United States)

    Zhao, Lei; Chen, Erxue; Li, Zengyuan; Li, Lan; Gu, Xinzhi

    2016-08-01

    Polarization orientation angle (POA) is a major parameter of electromagnetic wave. This angle will be shift due to azimuth slopes, which will affect the radiometric quality of PolSAR data. Under the assumption of reflection symmetrical medium, the shift value of polarization orientation angle (POAs) can be estimated by Circular Polarization Method (CPM). Then, the shift angle can be used to compensate PolSAR data or extract DEM information. However, it is less effective when using high-frequency SAR (L-, C-band) in the forest area. The main reason is that the polarization orientation angle shift of forest area not only influenced by topography, but also affected by the forest canopy. Among them, the influence of the former belongs to the interference information should be removed, but the impact of the latter belongs to the polarization feature information needs to be retained. The ALOS2 PALSAR2 L-band full polarimetric SAR data was used in this study. Base on the Circular Polarization and DEM-based method, we analyzed the variation of shift value of polarization orientation angle and developed the polarization orientation shift estimation and compensation of PolSAR data in forest.

  11. Naive Bayes-Guided Bat Algorithm for Feature Selection

    Directory of Open Access Journals (Sweden)

    Ahmed Majid Taha

    2013-01-01

    Full Text Available When the amount of data and information is said to double in every 20 months or so, feature selection has become highly important and beneficial. Further improvements in feature selection will positively affect a wide array of applications in fields such as pattern recognition, machine learning, or signal processing. Bio-inspired method called Bat Algorithm hybridized with a Naive Bayes classifier has been presented in this work. The performance of the proposed feature selection algorithm was investigated using twelve benchmark datasets from different domains and was compared to three other well-known feature selection algorithms. Discussion focused on four perspectives: number of features, classification accuracy, stability, and feature generalization. The results showed that BANB significantly outperformed other algorithms in selecting lower number of features, hence removing irrelevant, redundant, or noisy features while maintaining the classification accuracy. BANB is also proven to be more stable than other methods and is capable of producing more general feature subsets.

  12. Efficient Generation and Selection of Combined Features for Improved Classification

    KAUST Repository

    Shono, Ahmad N.

    2014-05-01

    This study contributes a methodology and associated toolkit developed to allow users to experiment with the use of combined features in classification problems. Methods are provided for efficiently generating combined features from an original feature set, for efficiently selecting the most discriminating of these generated combined features, and for efficiently performing a preliminary comparison of the classification results when using the original features exclusively against the results when using the selected combined features. The potential benefit of considering combined features in classification problems is demonstrated by applying the developed methodology and toolkit to three sample data sets where the discovery of combined features containing new discriminating information led to improved classification results.

  13. Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data

    Directory of Open Access Journals (Sweden)

    Lijun Wang

    2013-01-01

    Full Text Available Brain decoding with functional magnetic resonance imaging (fMRI requires analysis of complex, multivariate data. Multivoxel pattern analysis (MVPA has been widely used in recent years. MVPA treats the activation of multiple voxels from fMRI data as a pattern and decodes brain states using pattern classification methods. Feature selection is a critical procedure of MVPA because it decides which features will be included in the classification analysis of fMRI data, thereby improving the performance of the classifier. Features can be selected by limiting the analysis to specific anatomical regions or by computing univariate (voxel-wise or multivariate statistics. However, these methods either discard some informative features or select features with redundant information. This paper introduces the principal feature analysis as a novel multivariate feature selection method for fMRI data processing. This multivariate approach aims to remove features with redundant information, thereby selecting fewer features, while retaining the most information.

  14. Feature selection and survival modeling in The Cancer Genome Atlas

    Directory of Open Access Journals (Sweden)

    Kim H

    2013-09-01

    Full Text Available Hyunsoo Kim,1 Markus Bredel2 1Department of Pathology, The University of Alabama at Birmingham, Birmingham, AL, USA; 2Department of Radiation Oncology, and Comprehensive Cancer Center, The University of Alabama at Birmingham, Birmingham, AL, USA Purpose: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling. Patients and methods: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1 1-nearest neighbor (1-NN survival prediction method; (2 random patient selection method and a Cox-based regression method with nested cross-validation; (3 least absolute shrinkage and selection operator (LASSO optimization using whole-genome gene expression profiles; or (4 gene expression profiles of cancer pathway genes. Results: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone. Conclusion: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology. Keywords: brain, feature selection

  15. Online Feature Selection of Class Imbalance via PA Algorithm

    Institute of Scientific and Technical Information of China (English)

    Chao Han; Yun-Kun Tan; Jin-Hui Zhu; Yong Guo; Jian Chen; Qing-Yao Wu

    2016-01-01

    Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.

  16. Unsupervised Feature Selection Based on Locality Preserving Projection and Sparse Representation%基于局部保持投影和稀疏表示的无监督特征选择方法

    Institute of Scientific and Technical Information of China (English)

    简彩仁; 陈晓云

    2015-01-01

    Traditional filter-based feature selection methods calculate some scores of each feature independently to select features in a statistical or geometric perspective only, however, they ignore the correlation of different features. To solve this problem, an unsupervised feature selection method based on locality preserving projection and sparse representation is proposed. The nonnegativity and sparsity of feature weights are limited to select features in the proposed method. The experimental results on 4 gene expression datasets and 2 image datasets show that the method is effective.%传统基于过滤的特征选择方法仅从统计或几何角度分别对数据集的每个特征计算某种得分选择特征,而忽略不同特征之间存在的联系。为解决该问题,利用局部保持投影和稀疏表示的优点,提出新的无监督特征选择算法。该方法通过限制特征权重的非负性和稀疏性选择特征。在4个基因表达数据集和2个图像数据集上的实验表明该方法是有效的。

  17. Clustering-based Improved K-means Text Feature Selection%聚类模式下一种优化的K-means文本特征选择

    Institute of Scientific and Technical Information of China (English)

    刘海峰; 刘守生; 张学仁

    2011-01-01

    文本特征降维是文本自动分类的核心技术.K-means方法是一种常用的基于划分的方法.针对该算法对类中心初始值及孤立点过于敏感的问题,提出了一种改进的K-means算法用于文本特征选择.通过优化初始类中心的选择模式及对孤立点的剔除,改善了文本特征聚类的效果.随后的文本分类试验表明,提出的改进K-means算法具有较好的特征选择能力,文本分类的效率较高.%Text feature reduction is the key technology in text categorization.In addition, K-means is an partitioning method which usually be used.With regards to this arithmetic excessively incentive to the initial centers and the isolated points, the improved K-means arithmetic was put forward which is used in text feature selection.Text feature clustering was improved by optimizing primitive class center's options and the elimination of isolated point.Following text classification test shows that the K-means arithmetic put forward in this paper has a good feature selection ability and high efficiency in text categorization.

  18. An ensemble approach for feature selection of Cyber Attack Dataset

    CERN Document Server

    Singh, Shailendra

    2009-01-01

    Feature selection is an indispensable preprocessing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the Knearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset.

  19. A Rank Aggregation Algorithm for Ensemble of Multiple Feature Selection Techniques in Credit Risk Evaluation

    Directory of Open Access Journals (Sweden)

    Shashi Dahiya

    2016-10-01

    Full Text Available In credit risk evaluation the accuracy of a classifier is very significant for classifying the high-risk loan applicants correctly. Feature selection is one way of improving the accuracy of a classifier. It provides the classifier with important and relevant features for model development. This study uses the ensemble of multiple feature ranking techniques for feature selection of credit data. It uses five individual rank based feature selection methods. It proposes a novel rank aggregation algorithm for combining the ranks of the individual feature selection methods of the ensemble. This algorithm uses the rank order along with the rank score of the features in the ranked list of each feature selection method for rank aggregation. The ensemble of multiple feature selection techniques uses the novel rank aggregation algorithm and selects the relevant features using the 80%, 60%, 40% and 20% thresholds from the top of the aggregated ranked list for building the C4.5, MLP, C4.5 based Bagging and MLP based Bagging models. It was observed that the performance of models using the ensemble of multiple feature selection techniques is better than the performance of 5 individual rank based feature selection methods. The average performance of all the models was observed as best for the ensemble of feature selection techniques at 60% threshold. Also, the bagging based models outperformed the individual models most significantly for the 60% threshold. This increase in performance is more significant from the fact that the number of features were reduced by 40% for building the highest performing models. This reduces the data dimensions and hence the overall data size phenomenally for model building. The use of the ensemble of feature selection techniques using the novel aggregation algorithm provided more accurate models which are simpler, faster and easy to interpret.

  20. HYBRID FEATURE SELECTION ALGORITHM FOR INTRUSION DETECTION SYSTEM

    Directory of Open Access Journals (Sweden)

    Seyed Reza Hasani

    2014-01-01

    Full Text Available Network security is a serious global concern. Usefulness Intrusion Detection Systems (IDS are increasing incredibly in Information Security research using Soft computing techniques. In the previous researches having irrelevant and redundant features are recognized causes of increasing the processing speed of evaluating the known intrusive patterns. In addition, an efficient feature selection method eliminates dimension of data and reduce redundancy and ambiguity caused by none important attributes. Therefore, feature selection methods are well-known methods to overcome this problem. There are various approaches being utilized in intrusion detections, they are able to perform their method and relatively they are achieved with some improvements. This work is based on the enhancement of the highest Detection Rate (DR algorithm which is Linear Genetic Programming (LGP reducing the False Alarm Rate (FAR incorporates with Bees Algorithm. Finally, Support Vector Machine (SVM is one of the best candidate solutions to settle IDSs problems. In this study four sample dataset containing 4000 random records are excluded randomly from this dataset for training and testing purposes. Experimental results show that the LGP_BA method improves the accuracy and efficiency compared with the previous related research and the feature subcategory offered by LGP_BA gives a superior representation of data.

  1. Geochemical dynamics in selected Yellowstone hydrothermal features

    Science.gov (United States)

    Druschel, G.; Kamyshny, A.; Findlay, A.; Nuzzio, D.

    2010-12-01

    Yellowstone National Park has a wide diversity of thermal features, and includes springs with a range of pH conditions that significantly impact sulfur speciation. We have utilized a combination of voltammetric and spectroscopic techniques to characterize the intermediate sulfur chemistry of Cinder Pool, Evening Primrose, Ojo Caliente, Frying Pan, Azure, and Dragon thermal springs. These measurements additionally have demonstrated the geochemical dynamics inherent in these systems; significant variability in chemical speciation occur in many of these thermal features due to changes in gas supply rates, fluid discharge rates, and thermal differences that occur on second time scales. The dynamics of the geochemical settings shown may significantly impact how microorganisms interact with the sulfur forms in these systems.

  2. Use of genetic algorithm for the selection of EEG features

    Science.gov (United States)

    Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.

    2015-09-01

    Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.

  3. Recursive Feature Selection with Significant Variables of Support Vectors

    Directory of Open Access Journals (Sweden)

    Chen-An Tsai

    2012-01-01

    Full Text Available The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE and recursive support vector machine (RSVM. The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  4. 基于Relief F和PSO混合特征选择的面向对象土地利用分类%Object basedland-use classification based on hybrid feature selection method of combining Relief F and PSO

    Institute of Scientific and Technical Information of China (English)

    肖艳; 姜琦刚; 王斌; 李远华; 刘舒; 崔璨

    2016-01-01

    针对面向对象土地利用分类存在特征维数过高的问题,提出了一种结合Relief F和粒子群优化算法(particle swarm optimization,PSO)的混合特征选择方法,即首先利用Relief F作为特征预选器滤除相关性小的特征,然后以PSO作为搜索算法,以支持向量机(support vector machine,SVM)的分类精度作为评估函数在剩余特征中选择出最优特征子集。该文以吉林省长春市部分区域为研究区,采用Landsat8遥感影像为数据源,首先对其进行多尺度分割,然后提取影像对象的光谱、纹理、形状和空间关系特征,利用提出的混合特征选择方法选取最优特征子集,最后使用SVM分类器对研究区进行土地利用分类,总体分类精度和Kappa系数分别为85.88%和0.8036,与基于4种其他特征选择方法的土地利用分类结果进行比较,基于Relief F和PSO的混合特征选择方法利用最少的特征获得最高的分类精度,能够有效地用于面向对象土地利用分类。%In recent years, object-based methods have been increasingly used for the land-use classification of remote sensing data. However, the availability of numerous features with object-based image analysis renders the selection of optimal features. In this study, a hybrid feature selection method that combined filter approach and wrapper approach was proposed. In the filter approach, the Relief F algorithm was employed to select features that had the higher relevance with land-use classes. The wrapper approach used the particle swarm optimization (PSO) algorithm as a search method and the classification accuracy of support vector machine (SVM) as an evaluator to search for an optimal feature subset from the selected features. The objective of this research was to examine the effectiveness of the proposed feature selection method on object-based classification. The study site was located in the southeastern part of Changchun City

  5. A Computer-Aided Diagnosis System for Dynamic Contrast-Enhanced MR Images Based on Level Set Segmentation and ReliefF Feature Selection

    Directory of Open Access Journals (Sweden)

    Zhiyong Pang

    2015-01-01

    Full Text Available This study established a fully automated computer-aided diagnosis (CAD system for the classification of malignant and benign masses via breast magnetic resonance imaging (BMRI. A breast segmentation method consisting of a preprocessing step to identify the air-breast interfacing boundary and curve fitting for chest wall line (CWL segmentation was included in the proposed CAD system. The Chan-Vese (CV model level set (LS segmentation method was adopted to segment breast mass and demonstrated sufficiently good segmentation performance. The support vector machine (SVM classifier with ReliefF feature selection was used to merge the extracted morphological and texture features into a classification score. The accuracy, sensitivity, and specificity measurements for the leave-half-case-out resampling method were 92.3%, 98.2%, and 76.2%, respectively. For the leave-one-case-out resampling method, the measurements were 90.0%, 98.7%, and 73.8%, respectively.

  6. Aptamers overview: selection, features and applications.

    Science.gov (United States)

    Hernandez, Luiza I; Machado, Isabel; Schafer, Thomas; Hernandez, Frank J

    2015-01-01

    Apatamer technology has been around for a quarter of a century and the field had matured enough to start seeing real applications, especially in the medical field. Since their discovery, aptamers rapidly emerged as key players in many fields, such as diagnostics, drug discovery, food science, drug delivery and therapeutics. Because of their synthetic nature, aptamers are evolving at an exponential rate gaining from the newest advances in chemistry, nanotechnology, biology and medicine. This review is meant to give an overview of the aptamer field, by including general aspects of aptamer identification and applications as well as highlighting certain features that contribute to their quick deployment in the biomedical field.

  7. Comparative Study of Triangulation based and Feature based Image Morphing

    Directory of Open Access Journals (Sweden)

    Ms. Bhumika G. Bhatt

    2012-01-01

    Full Text Available Image Morphing is one of the most powerful Digital Image processing technique, which is used to enhancemany multimedia projects, presentations, education and computer based training. It is also used inmedical imaging field to recover features not visible in images by establishing correspondence of featuresamong successive pair of scanned images. This paper discuss what morphing is and implementation ofTriangulation based morphing Technique and Feature based Image Morphing. IT analyze both morphingtechniques in terms of different attributes such as computational complexity, Visual quality of morphobtained and complexity involved in selection of features.

  8. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2016-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  9. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2017-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  10. Individual discriminative face recognition models based on subsets of features

    DEFF Research Database (Denmark)

    Clemmensen, Line Katrine Harder; Gomez, David Delgado; Ersbøll, Bjarne Kjær

    2007-01-01

    of the face recognition problem. The elastic net model is able to select a subset of features with low computational effort compared to other state-of-the-art feature selection methods. Furthermore, the fact that the number of features usually is larger than the number of images in the data base makes feature...... selection techniques such as forward selection or lasso regression become inadequate. In the experimental section, the performance of the elastic net model is compared with geometrical and color based algorithms widely used in face recognition such as Procrustes nearest neighbor, Eigenfaces, or Fisher...

  11. Feature Extraction and Selection From the Perspective of Explosive Detection

    Energy Technology Data Exchange (ETDEWEB)

    Sengupta, S K

    2009-09-01

    ) digitized 3-dimensional attenuation images with a voxel resolution of the order of one quarter of a milimeter. In the task of feature extraction and subsequent selection of an appropriate subset thereof, several important factors need to be considered. Foremost among them are: (1) Definition of the sampling unit from which the features will be extracted for the purpose of detection/ identification of the explosives. (2) The choice of features ( given the sampling unit) to be extracted that can be used to signal the existence / identity of the explosive. (3) Robustness of the computed features under different inspection conditions. To attain robustness, invariance under the transformations of translation, scaling, rotation and change of orientation is highly desirable. (4) The computational costs in the process of feature extraction, selection and their use in explosive detection/ identification In the search for extractable features, we have done a thorough literature survey with the above factors in mind and come out with a list of features that could possibly help us in meeting our objective. We are assuming that features will be based on sampling units that are single CT slices of the target. This may however change when appropriate modifications should be made to the feature extraction process. We indicate below some of the major types of features in 2- or 3-dimensional images that have been used in the literature on application of pattern recognition (PR) techniques in image understanding and are possibly pertinent to our study. In the following paragraph, we briefly indicate the motivation that guided us in the choice of these features, and identify the nature of the constraints. The principal feature types derivable from an image will be discussed in section 2. Once the features are extracted, one must select a subset of this feature set that will retain the most useful information and remove any redundant and irrelevant information that may have a detrimental effect

  12. Vinegar classification based on feature extraction and selection from headspace solid-phase microextraction/gas chromatography volatile analyses: a feasibility study.

    Science.gov (United States)

    Pizarro, C; Esteban-Díez, I; Sáenz-González, C; González-Sáiz, J M

    2008-02-04

    Headspace solid-phase microextraction (HS-SPME) coupled with gas chromatography (GC) and multivariate data analysis were applied to classify different vinegar types (white and red, balsamic, sherry and cider vinegars) on the basis of their volatile composition. The collected chromatographic signals were analysed using the stepwise linear discriminant analysis (SLDA) method, thus simultaneously performing feature selection and classification. Several options, more or less restrictive according to the final number of considered categories, were explored in order to identify the one that afforded highest discrimination ability. The simplicity and effectiveness of the classification methodology proposed in the present study (all the samples were correctly classified and predicted by cross-validation) are promising and encourage the feasibility of using a similar strategy to evaluate the quality and origin of vinegar samples in a reliable, fast, reproducible and cost-efficient way in routine applications. The high quality results obtained were even more remarkable considering the reduced number of discriminant variables finally selected by the stepwise procedure. The use of only 14 peaks enabled differentiation between cider, balsamic, sherry and wine vinegars, whereas only 3 variables were selected to discriminate between red (RW) and white wine (WW) vinegars. The subsequent identification by gas chromatography-mass spectrometry (GC-MS) of the volatile compounds associated with the discriminant peaks selected in the classification process served to interpret their chemical significance.

  13. Evaluation of Feature Selection Approaches for Urdu Text Categorization

    Directory of Open Access Journals (Sweden)

    Tehseen Zia

    2015-05-01

    Full Text Available Efficient feature selection is an important phase of designing an effective text categorization system. Various feature selection methods have been proposed for selecting dissimilar feature sets. It is often essential to evaluate that which method is more effective for a given task and what size of feature set is an effective model selection choice. Aim of this paper is to answer these questions for designing Urdu text categorization system. Five widely used feature selection methods were examined using six well-known classification algorithms: naive Bays (NB, k-nearest neighbor (KNN, support vector machines (SVM with linear, polynomial and radial basis kernels and decision tree (i.e. J48. The study was conducted over two test collections: EMILLE collection and a naive collection. We have observed that three feature selection methods i.e. information gain, Chi statistics, and symmetrical uncertain, have performed uniformly in most of the cases if not all. Moreover, we have found that no single feature selection method is best for all classifiers. While gain ratio out-performed others for naive Bays and J48, information gain has shown top performance for KNN and SVM with polynomial and radial basis kernels. Overall, linear SVM with any of feature selection methods including information gain, Chi statistics or symmetric uncertain methods is turned-out to be first choice across other combinations of classifiers and feature selection methods on moderate size naive collection. On the other hand, naive Bays with any of feature selection method have shown its advantage for a small sized EMILLE corpus.

  14. Improving permafrost distribution modelling using feature selection algorithms

    Science.gov (United States)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2016-04-01

    The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its

  15. Facial expression feature selection method based on neighborhood rough set theory and quantum genetic algorithm%基于邻域粗糙集与量子遗传算法的人脸表情特征选择方法

    Institute of Scientific and Technical Information of China (English)

    冯林; 李聪; 沈莉

    2013-01-01

    人脸表情特征选择是人脸表情识别研究领域关注的一个热点.基于量子遗传算法与邻域粗糙集理论,文章提出一种新的人脸表情特征选择方法(Feature Selection based on Neighborhood Rough Set Theory and Quantum Genetic Algorithm,简称FSNRSTQGA),以邻域粗糙集理论为基础,定义了最优特征集的适应度函数来评价表情特征子集的选择效果;并结合量子遗传算法进化策略,提出了一种表情特征选择方法.Cohn-Kanade表情数据集上的仿真实验结果表明了该方法的有效性.%Facial expression feature selection is one of the hot issues in the field of facial expression recognition. A novel facial expression feature selection method named feature selection based on neighborhood rough set theory and quantum genetic algorithm (FSNRSTQGA) is proposed. First, an evaluation criterion of the optimization expression feature subset is established based on neighborhood rough set theory and used as the fitness function. Then, by quantum genetic algorithm evolutionary strategy, an approach of facial expression feature selection is proposed. The results of the simulation on Cohn-Kanade expression dataset illustrate that the FSNRSTQGA method is effective.

  16. Feature selection applied to ultrasound carotid images segmentation.

    Science.gov (United States)

    Rosati, Samanta; Molinari, Filippo; Balestra, Gabriella

    2011-01-01

    The automated tracing of the carotid layers on ultrasound images is complicated by noise, different morphology and pathology of the carotid artery. In this study we benchmarked four methods for feature selection on a set of variables extracted from ultrasound carotid images. The main goal was to select those parameters containing the highest amount of information useful to classify the pixels in the carotid regions they belong to. Six different classes of pixels were identified: lumen, lumen-intima interface, intima-media complex, media-adventitia interface, adventitia and adventitia far boundary. The performances of QuickReduct Algorithm (QRA), Entropy-Based Algorithm (EBR), Improved QuickReduct Algorithm (IQRA) and Genetic Algorithm (GA) were compared using Artificial Neural Networks (ANNs). All methods returned subsets with a high dependency degree, even if the average classification accuracy was about 50%. Among all classes, the best results were obtained for the lumen. Overall, the four methods for feature selection assessed in this study return comparable results. Despite the need for accuracy improvement, this study could be useful to build a pre-classifier stage for the optimization of segmentation performance in ultrasound automated carotid segmentation.

  17. Information Theory for Gabor Feature Selection for Face Recognition

    Science.gov (United States)

    Shen, Linlin; Bai, Li

    2006-12-01

    A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004), our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  18. Information Theory for Gabor Feature Selection for Face Recognition

    Directory of Open Access Journals (Sweden)

    Shen Linlin

    2006-01-01

    Full Text Available A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004, our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  19. Topological Features Based Entity Disambiguation

    Institute of Scientific and Technical Information of China (English)

    Chen-Chen Sun; De-Rong Shen; Tie-Zheng Nie; Ge Yu

    2016-01-01

    This work proposes an unsupervised topological features based entity disambiguation solution. Most existing studies leverage semantic information to resolve ambiguous references. However, the semantic information is not always accessible because of privacy or is too expensive to access. We consider the problem in a setting that only relationships between references are available. A structure similarity algorithm via random walk with restarts is proposed to measure the similarity of references. The disambiguation is regarded as a clustering problem and a family of graph walk based clustering algorithms are brought to group ambiguous references. We evaluate our solution extensively on two real datasets and show its advantage over two state-of-the-art approaches in accuracy.

  20. Feature-based telescope scheduler

    Science.gov (United States)

    Naghib, Elahesadat; Vanderbei, Robert J.; Stubbs, Christopher

    2016-07-01

    Feature-based Scheduler offers a sequencing strategy for ground-based telescopes. This scheduler is designed in the framework of Markovian Decision Process (MDP), and consists of a sub-linear online controller, and an offline supervisory control-optimizer. Online control law is computed at the moment of decision for the next visit, and the supervisory optimizer trains the controller by simulation data. Choice of the Differential Evolution (DE) optimizer, and introducing a reduced state space of the telescope system, offer an efficient and parallelizable optimization algorithm. In this study, we applied the proposed scheduler to the problem of Large Synoptic Survey Telescope (LSST). Preliminary results for a simplified model of LSST is promising in terms of both optimality, and computational cost.

  1. Feature selection using genetic algorithms for fetal heart rate analysis.

    Science.gov (United States)

    Xu, Liang; Redman, Christopher W G; Payne, Stephen J; Georgieva, Antoniya

    2014-07-01

    The fetal heart rate (FHR) is monitored on a paper strip (cardiotocogram) during labour to assess fetal health. If necessary, clinicians can intervene and assist with a prompt delivery of the baby. Data-driven computerized FHR analysis could help clinicians in the decision-making process. However, selecting the best computerized FHR features that relate to labour outcome is a pressing research problem. The objective of this study is to apply genetic algorithms (GA) as a feature selection method to select the best feature subset from 64 FHR features and to integrate these best features to recognize unfavourable FHR patterns. The GA was trained on 404 cases and tested on 106 cases (both balanced datasets) using three classifiers, respectively. Regularization methods and backward selection were used to optimize the GA. Reasonable classification performance is shown on the testing set for the best feature subset (Cohen's kappa values of 0.45 to 0.49 using different classifiers). This is, to our knowledge, the first time that a feature selection method for FHR analysis has been developed on a database of this size. This study indicates that different FHR features, when integrated, can show good performance in predicting labour outcome. It also gives the importance of each feature, which will be a valuable reference point for further studies.

  2. A fast approach for detection of erythemato-squamous diseases based on extreme learning machine with maximum relevance minimum redundancy feature selection

    Science.gov (United States)

    Liu, Tong; Hu, Liang; Ma, Chao; Wang, Zhi-Yan; Chen, Hui-Ling

    2015-04-01

    In this paper, a novel hybrid method, which integrates an effective filter maximum relevance minimum redundancy (MRMR) and a fast classifier extreme learning machine (ELM), has been introduced for diagnosing erythemato-squamous (ES) diseases. In the proposed method, MRMR is employed as a feature selection tool for dimensionality reduction in order to further improve the diagnostic accuracy of the ELM classifier. The impact of the type of activation functions, the number of hidden neurons and the size of the feature subsets on the performance of ELM have been investigated in detail. The effectiveness of the proposed method has been rigorously evaluated against the ES disease dataset, a benchmark dataset, from UCI machine learning database in terms of classification accuracy. Experimental results have demonstrated that our method has achieved the best classification accuracy of 98.89% and an average accuracy of 98.55% via 10-fold cross-validation technique. The proposed method might serve as a new candidate of powerful methods for diagnosing ES diseases.

  3. Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

    Science.gov (United States)

    Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor

    2010-08-01

    Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods.

  4. Selective attention to temporal features on nested time scales.

    Science.gov (United States)

    Henry, Molly J; Herrmann, Björn; Obleser, Jonas

    2015-02-01

    Meaningful auditory stimuli such as speech and music often vary simultaneously along multiple time scales. Thus, listeners must selectively attend to, and selectively ignore, separate but intertwined temporal features. The current study aimed to identify and characterize the neural network specifically involved in this feature-selective attention to time. We used a novel paradigm where listeners judged either the duration or modulation rate of auditory stimuli, and in which the stimulation, working memory demands, response requirements, and task difficulty were held constant. A first analysis identified all brain regions where individual brain activation patterns were correlated with individual behavioral performance patterns, which thus supported temporal judgments generically. A second analysis then isolated those brain regions that specifically regulated selective attention to temporal features: Neural responses in a bilateral fronto-parietal network including insular cortex and basal ganglia decreased with degree of change of the attended temporal feature. Critically, response patterns in these regions were inverted when the task required selectively ignoring this feature. The results demonstrate how the neural analysis of complex acoustic stimuli with multiple temporal features depends on a fronto-parietal network that simultaneously regulates the selective gain for attended and ignored temporal features.

  5. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    Directory of Open Access Journals (Sweden)

    Shokoufeh Aalaei

    2016-05-01

    Full Text Available Objective(s: This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we employed three different classifiers artificial neural network (ANN and PS-classifier and genetic algorithm based classifier (GA-classifier on Wisconsin breast cancer datasets include Wisconsin breast cancer dataset (WBC, Wisconsin diagnosis breast cancer (WDBC, and Wisconsin prognosis breast cancer (WPBC. Results: For WBC dataset, it is observed that feature selection improved the accuracy of all classifiers expect of ANN and the best accuracy with feature selection achieved by PS-classifier. For WDBC and WPBC, results show feature selection improved accuracy of all three classifiers and the best accuracy with feature selection achieved by ANN. Also specificity and sensitivity improved after feature selection. Conclusion: The results show that feature selection can improve accuracy, specificity and sensitivity of classifiers. Result of this study is comparable with the other studies on Wisconsin breast cancer datasets.

  6. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    Science.gov (United States)

    Aalaei, Shokoufeh; Shahraki, Hadi; Rowhanimanesh, Alireza; Eslami, Saeid

    2016-01-01

    Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we employed three different classifiers artificial neural network (ANN) and PS-classifier and genetic algorithm based classifier (GA-classifier) on Wisconsin breast cancer datasets include Wisconsin breast cancer dataset (WBC), Wisconsin diagnosis breast cancer (WDBC), and Wisconsin prognosis breast cancer (WPBC). Results: For WBC dataset, it is observed that feature selection improved the accuracy of all classifiers expect of ANN and the best accuracy with feature selection achieved by PS-classifier. For WDBC and WPBC, results show feature selection improved accuracy of all three classifiers and the best accuracy with feature selection achieved by ANN. Also specificity and sensitivity improved after feature selection. Conclusion: The results show that feature selection can improve accuracy, specificity and sensitivity of classifiers. Result of this study is comparable with the other studies on Wisconsin breast cancer datasets. PMID:27403253

  7. A Sparse-Feature-Based Face Detector

    Institute of Scientific and Technical Information of China (English)

    LUXiaofeng; ZHENGNanning; ZHENGSongfeng

    2003-01-01

    Local features and global features are two kinds of important statistical features used to distinguish faces from nonfaces. They are both special cases of sparse features. A final classifier can be considered as a combination of a set of selected weak classiflers, and each weak classifier uses a sparse feature to classify samples. Motivated by this thought, we construct an over complete set of weak classifiers using LPSVM (Linear proximal support vector machine) algorithm, and then we select part of them using AdaBoost algorithm and combine the selected weak classifiers to form a strong classifier. And duringthe course of feature extraction and selection, our method can minimize the classification error directly, whereas most previous works cannot do this. The main difference from other methods is that the local features are learned from the training set instead of being arbitrarily defined. We applied our method to face detection; the test result shows that this method performs well.

  8. A New Heuristic for Feature Selection by Consistent Biclustering

    CERN Document Server

    Mucherino, Antonio

    2010-01-01

    Given a set of data, biclustering aims at finding simultaneous partitions in biclusters of its samples and of the features which are used for representing the samples. Consistent biclusterings allow to obtain correct classifications of the samples from the known classification of the features, and vice versa, and they are very useful for performing supervised classifications. The problem of finding consistent biclusterings can be seen as a feature selection problem, where the features that are not relevant for classification purposes are removed from the set of data, while the total number of features is maximized in order to preserve information. This feature selection problem can be formulated as a linear fractional 0-1 optimization problem. We propose a reformulation of this problem as a bilevel optimization problem, and we present a heuristic algorithm for an efficient solution of the reformulated problem. Computational experiments show that the presented algorithm is able to find better solutions with re...

  9. Modeling Suspicious Email Detection using Enhanced Feature Selection

    OpenAIRE

    2013-01-01

    The paper presents a suspicious email detection model which incorporates enhanced feature selection. In the paper we proposed the use of feature selection strategies along with classification technique for terrorists email detection. The presented model focuses on the evaluation of machine learning algorithms such as decision tree (ID3), logistic regression, Na\\"ive Bayes (NB), and Support Vector Machine (SVM) for detecting emails containing suspicious content. In the literature, various algo...

  10. Investigation on the isoform selectivity of novel kinesin-like protein 1 (KIF11) inhibitor using chemical feature based pharmacophore, molecular docking, and quantum mechanical studies.

    Science.gov (United States)

    Karunagaran, Subramanian; Subhashchandrabose, Subramaniyan; Lee, Keun Woo; Meganathan, Chandrasekaran

    2016-04-01

    Kinesin-like protein (KIF11) is a molecular motor protein that is essential in mitosis. Removal of KIF11 prevents centrosome migration and causes cell arrest in mitosis. KIF11 defects are linked to the disease of microcephaly, lymph edema or mental retardation. The human KIF11 protein has been actively studied for its role in mitosis and its potential as a therapeutic target for cancer treatment. Pharmacophore modeling, molecular docking and density functional theory approaches was employed to reveal the structural, chemical and electronic features essential for the development of small molecule inhibitor for KIF11. Hence we have developed chemical feature based pharmacophore models using Discovery Studio v 2.5 (DS). The best hypothesis (Hypo1) consisting of four chemical features (two hydrogen bond acceptor, one hydrophobic and one ring aromatic) has exhibited high correlation co-efficient of 0.9521, cost difference of 70.63 and low RMS value of 0.9475. This Hypo1 is cross validated by Cat Scramble method; test set and decoy set to prove its robustness, statistical significance and predictability respectively. The well validated Hypo1 was used as 3Dquery to perform virtual screening. The hits obtained from the virtual screening were subjected to various scrupulous drug-like filters such as Lipinski's rule of five and ADMET properties. Finally, six hit compounds were identified based on the molecular interaction and its electronic properties. Our final lead compound could serve as a powerful tool for the discovery of potent inhibitor as KIF11 agonists.

  11. Feature-based attention resolves depth ambiguity.

    Science.gov (United States)

    Yu, D; Levinthal, B; Franconeri, S L

    2016-09-07

    Perceiving the world around us requires that we resolve ambiguity. This process is often studied in the lab using ambiguous figures whose structures can be interpreted in multiple ways. One class of figures contains ambiguity in its depth relations, such that either of two surfaces could be seen as being the "front" of an object. Previous research suggests that selectively attending to a given location on such objects can bias the perception of that region as the front. This study asks whether selectively attending to a distributed feature can also bias that region toward the front. Participants viewed a structure-from-motion display of a rotating cylinder that could be perceived as rotating clockwise or counterclockwise (as imagined viewing from the top), depending on whether a set of red or green moving dots were seen as being in the front. A secondary task encouraged observers to globally attend to either red or green. Results from both Experiment 1 and 2 showed that the dots on the cylinder that shared the attended feature, and its corresponding surface, were more likely to be seen as being in the front, as measured by participants' clockwise versus counterclockwise percept reports. Feature-based attention, like location-based attention, is capable of biasing competition among potential interpretations of figures with ambiguous structure in depth.

  12. Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

    Energy Technology Data Exchange (ETDEWEB)

    Pon, R K; Cardenas, A F; Buttler, D J

    2007-09-19

    The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifier with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.

  13. The Effect of Feature Selection on Phish Website Detection

    Directory of Open Access Journals (Sweden)

    Hiba Zuhair

    2015-10-01

    Full Text Available Recently, limited anti-phishing campaigns have given phishers more possibilities to bypass through their advanced deceptions. Moreover, failure to devise appropriate classification techniques to effectively identify these deceptions has degraded the detection of phishing websites. Consequently, exploiting as new; few; predictive; and effective features as possible has emerged as a key challenge to keep the detection resilient. Thus, some prior works had been carried out to investigate and apply certain selected methods to develop their own classification techniques. However, no study had generally agreed on which feature selection method that could be employed as the best assistant to enhance the classification performance. Hence, this study empirically examined these methods and their effects on classification performance. Furthermore, it recommends some promoting criteria to assess their outcomes and offers contribution on the problem at hand. Hybrid features, low and high dimensional datasets, different feature selection methods, and classification models were examined in this study. As a result, the findings displayed notably improved detection precision with low latency, as well as noteworthy gains in robustness and prediction susceptibilities. Although selecting an ideal feature subset was a challenging task, the findings retrieved from this study had provided the most advantageous feature subset as possible for robust selection and effective classification in the phishing detection domain.

  14. Selecting Optimal Subset of Features for Student Performance Model

    Directory of Open Access Journals (Sweden)

    Hany M. Harb

    2012-09-01

    Full Text Available Educational data mining (EDM is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the student behavior in the learning process. Classification methods like decision trees, rule mining, and Bayesian network, can be applied on the educational data for predicting the student behavior like performance in an examination. This prediction may help in student evaluation. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. The main objective of this work is to achieve high predictive performance by adopting various feature selection techniques to increase the predictive accuracy with least number of features. The outcomes show a reduction in computational time and constructional cost in both training and classification phases of the student performance model.

  15. Feature Selection Algorithm Based on IMGA and MKSVM to Intrusion Detection%面向入侵检测的基于IMGA和MKSVM的特征选择算法

    Institute of Scientific and Technical Information of China (English)

    井小沛; 汪厚祥; 聂凯; 罗志伟

    2012-01-01

    入侵检测系统处理的数据具有数据量大、特征维数高等特点,会降低检测算法的处理速度和检测效率.为了提高入侵检测系统的检测速度和准确率,将特征选择应用到入侵检测系统中.首先提出一种基于免疫记忆和遗传算法的高效特征子集生成策略,然后研究基于支持向量机的特征子集评估方法.并针对可能出现的数据集不平衡造成的特征子集评估能力下降,以黎曼几何为依据,利用保角变换对核函数进行修改,以提高支持向量机的分类泛化能力.实验仿真表明,提出的特征选择算法不仅可以提高特征选择的效果,而且在不平衡数据集上具有更好的特征选择能力.还表明,基于该方法构建的入侵检测系统与没有运用特征选择的入侵检测系统相比具有更好的性能.%In order to improve performances of intrusion detection system in terms of detection speed and detection rate,it is necessary to apply feature selection in intrusion detection system. Firstly, an efficient search procedure based on immune memory and genetic algorithm (IMGA) was proposed. Then, support vector machine (SVM) based on wrapper feature evaluation methods was surveyed,in order to improve the feature selection performance of unbalanced datasets. We used the conformal transformation and Riemannian metric to modify kernel function, and reconstructed a new Modified Kernel SVM (MKSVM). Finally, the simulation experimental results show that this approach can improve the process of selecting important features, and has better feature selection ability on the unbalanced data. Furthermore, the experiments indicate that intrusion detection system with this feature selection algorithm has better performances than that without feature selection algorithm.

  16. Feature-based Image Sequence Compression Coding

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    A novel compressing method for video teleconference applications is presented. Semantic-based coding based on human image feature is realized, where human features are adopted as parameters. Model-based coding and the concept of vector coding are combined with the work on image feature extraction to obtain the result.

  17. A Network Intrusion Detection Model Based on Data Ming and Feature Selection Schemes%基于数据挖掘和特征选择的入侵检测模型

    Institute of Scientific and Technical Information of China (English)

    康世瑜

    2011-01-01

    提出了一种基于SVM特征选择和C4.5数据挖掘算法的高效入侵检测模型.通过使用该模型对经过特征提取后的攻击数据的训练学习,可以有效地识别各种入侵,并提高检测速度.在经典的KDD 1999入侵检测数据集上的测试说明:该数据挖掘模型能够高效地对攻击模式进行训练学习,能够采用选择的特征正确有效地检测网络攻击.%This paper proposes a kind of intrusion detection model based on C4.5 data mining algorithm and SVM(correlation-based feature selection) based feature selection mechanism,which can effectively detect several types of attacks using the process of feature selection and attack feature training.The experiments on classic KDD 1999 intrusion dataset demonstrate our model is accurate and effective.

  18. Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Thanh-Tung Nguyen

    2015-01-01

    Full Text Available Random forests (RFs have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  19. System Entropy and Its Application in Feature Selection

    Institute of Scientific and Technical Information of China (English)

    ZHAO Jun; WU Zhong-fu; LI Hua

    2004-01-01

    Feature selection is always an important issue in the research on data mining technologies. However, the problem of optimal feature selection is NP hard. Therefore, heuristic approaches are more practical to actual learning systems. Usually, that kind of algorithm selects features with the help of a heuristic metric compactum to measure the relative importance of features in a learning system. Here a new notion of 'system entropy' is described in terms of rough set theory, and then some of its algebraic characteristics are studied. After its intrinsic value biase is effectively counteracted, the system entropy is applied in BSE, a new heuristic algorithm for feature selection. BSE is efficient, whose time complexity is lower than that of analogous algorithms; BSE is also effective, which can produce the optimal results in the mini-feature biased sense from varieties of learning systems. Besides, BSE is tolerant and also flexible to the inconsistency of a learning system, consequently able to elegantly handle data noise in the learning system.

  20. Tournament screening cum EBIC for feature selection with high-dimensional feature spaces

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics.The high dimensionality of the feature space causes serious diffculties:(i) the sample correlations between features become high even if the features are stochastically independent;(ii) the computation becomes intractable.These diffculties make conventional approaches either inapplicable or ine?cient.The reduction of dimensionality of the feature space followed by low dimensional approaches appears the only feasible way to tackle the problem.Along this line,we develop in this article a tournament screening cum EBIC approach for feature selection with high dimensional feature space.The procedure of tournament screening mimics that of a tournament.It is shown theoretically that the tournament screening has the sure screening property,a necessary property which should be satisfied by any valid screening procedure.It is demonstrated by numerical studies that the tournament screening cum EBIC approach enjoys desirable properties such as having higher positive selection rate and lower false discovery rate than other approaches.

  1. Selective processing of multiple features in the human brain: effects of feature type and salience.

    Science.gov (United States)

    McGinnis, E Menton; Keil, Andreas

    2011-02-09

    Identifying targets in a stream of items at a given constant spatial location relies on selection of aspects such as color, shape, or texture. Such attended (target) features of a stimulus elicit a negative-going event-related brain potential (ERP), termed Selection Negativity (SN), which has been used as an index of selective feature processing. In two experiments, participants viewed a series of Gabor patches in which targets were defined as a specific combination of color, orientation, and shape. Distracters were composed of different combinations of color, orientation, and shape of the target stimulus. This design allows comparisons of items with and without specific target features. Consistent with previous ERP research, SN deflections extended between 160-300 ms. Data from the subsequent P3 component (300-450 ms post-stimulus) were also examined, and were regarded as an index of target processing. In Experiment A, predominant effects of target color on SN and P3 amplitudes were found, along with smaller ERP differences in response to variations of orientation and shape. Manipulating color to be less salient while enhancing the saliency of the orientation of the Gabor patch (Experiment B) led to delayed color selection and enhanced orientation selection. Topographical analyses suggested that the location of SN on the scalp reliably varies with the nature of the to-be-attended feature. No interference of non-target features on the SN was observed. These results suggest that target feature selection operates by means of electrocortical facilitation of feature-specific sensory processes, and that selective electrocortical facilitation is more effective when stimulus saliency is heightened.

  2. Selective processing of multiple features in the human brain: effects of feature type and salience.

    Directory of Open Access Journals (Sweden)

    E Menton McGinnis

    Full Text Available Identifying targets in a stream of items at a given constant spatial location relies on selection of aspects such as color, shape, or texture. Such attended (target features of a stimulus elicit a negative-going event-related brain potential (ERP, termed Selection Negativity (SN, which has been used as an index of selective feature processing. In two experiments, participants viewed a series of Gabor patches in which targets were defined as a specific combination of color, orientation, and shape. Distracters were composed of different combinations of color, orientation, and shape of the target stimulus. This design allows comparisons of items with and without specific target features. Consistent with previous ERP research, SN deflections extended between 160-300 ms. Data from the subsequent P3 component (300-450 ms post-stimulus were also examined, and were regarded as an index of target processing. In Experiment A, predominant effects of target color on SN and P3 amplitudes were found, along with smaller ERP differences in response to variations of orientation and shape. Manipulating color to be less salient while enhancing the saliency of the orientation of the Gabor patch (Experiment B led to delayed color selection and enhanced orientation selection. Topographical analyses suggested that the location of SN on the scalp reliably varies with the nature of the to-be-attended feature. No interference of non-target features on the SN was observed. These results suggest that target feature selection operates by means of electrocortical facilitation of feature-specific sensory processes, and that selective electrocortical facilitation is more effective when stimulus saliency is heightened.

  3. 基于电网运行大数据的在线分布式安全特征选择%Online Distributed Security Feature Selection Based on Big Data in Power System Operation

    Institute of Scientific and Technical Information of China (English)

    黄天恩; 孙宏斌; 郭庆来; 温柏坚; 郭文鑫

    2016-01-01

    The latest development and existing problems of power system security feature selection are briefly introduced.An online distributed security feature selection method is proposed.The method is based on power system security feature grouping by correlation and adapts to the big data in power system operation,and it could online discover critical features for power system security.First,a power system security feature selection method for a single compute node is discussed.Then a distributed method based on feature grouping is proposed.As the feature grouping method has an important influence on the selection results,a strategy based on power system security feature grouping by correlation is put forward to make correlation of features within the same group larger while the correlation of features among different groups smaller.This distributed security feature selection method is well applied in IEEE 9-bus system and Guangdong power system for its practicality and effectiveness,which could quickly find out the weak spots in power system operation and accurately help operators grasp the critical features for power system security.Compared with traditional methods,this method performs well for its compute accuracy and speed.%简述大数据环境下,电网安全特征选择的现状与问题。提出了一种基于电网特征量相关性分组、适应于电网运行大数据的在线分布式安全特征选择方法,该方法能在线挖掘出关键的电网安全运行特征。首先阐述了单个计算节点上电网安全特征选择方法,接着提出了基于电网特征量分组的分布式安全特征选择方法;由于电网特征量分组情况会对特征选择结果产生较大影响,故提出了基于电网特征量相关性分组的策略,尽量使得同一组内的电网特征量相关性较大,不同分组间的电网特征量相关性较小。IEEE 9节点系统和广东实际省网系统算例验证了该方法的实用性和有效性

  4. A Feature Selection Method Based on Correlation in Cancer Recognition%癌症识别中一种基于相关性的分层特征选择方法

    Institute of Scientific and Technical Information of China (English)

    彭湘华

    2013-01-01

    The gene expression profiles is high dimension, smal sample and redundant. A feature selection method based on Correlation-based Feature Selection (CFS) and Stratified Sampling (SS) has been proposed. Firstly, the measures of variable to variable and variable to observe were calculated respectively. Then we utilized heuristic search method to search the space of variable for selecting informative gene subset and the subset weight was computed using these measures. Through regression we obtained a subset of distinguished genes. Finally, the stratified sampling strategy was presented to obtain the most informative genes subset. The result of the K-fold Cross-Validation experiment been performed in three datasets of Leukemia, colon, Prostate showed that CFS-SS would select valuable subset from those datasets and get well classification performance.%根据基因表达谱数据高维度、小样本、高噪声的特点,提出一种基于相关特征选择(Correlation- based Feature Selection ,CFS)的分层抽样的基因特征选择方法(简称CFS- SS)。首先,利用CFS算法提取与分类相关性大的特征基因集,然后通过分层方法构造多层特征子集空间,在部分层空间中寻找最优特征子集。在Leukemia, Colon, Prostate数据集上进行了交叉测试实验。实验结果表明, CFS- SS有效地从不同层次的特征子集样本中可以提取出有价值的基因特征集,在不同的分类器上取得较好的分类性能。

  5. Feature gene selection for Chinese hamster classification based on support vector machine%基于支持向量机的中国地鼠分类特征基因选取

    Institute of Scientific and Technical Information of China (English)

    杨俊丽; 刘田福

    2011-01-01

    针对中国地鼠基因表达谱数据维数高和样本小的特点,提出一种基于支持向量机(SVM)的分类特征基因选取方法.该方法利用改进的Fisher判别(FDR)基因特征计分准则剔除分类无关基因,提出由空间距离和功能距离组成的新距离作为相似性度量的标准进行冗余基因的剔除,采用SVM作为分类器检验特征基因的分类性能.实验结果表明,该方法有效地剔除了分类无关基因和冗余基因,选取的特征基因满足对中国地鼠正确分类的最小基因数.%Concerning the gene expression profile of Chinese hamster feature, such as high-dimension and small sample,a method of feature selection for Chinese hamster classification based on Support Vector Machine (SVM) was proposed in this paper. The method used improved FDR gene feature score criterion to remove the genes irrelevant to the classification. A new distance composed by space distance and function distance was proposed as the criterion of comparability to remove redundant genes. A SVM was used as classifier to validate the classification performance of the feature genes selected. The experimental results show that this method effectively removes the irrelevant and redundant genes, and selected the feature genes that meet the needs of least feature genes which classify accurately on Chinese hamster.

  6. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

    Directory of Open Access Journals (Sweden)

    Daniel Peralta

    2015-01-01

    Full Text Available Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.

  7. Structure damage detection based on random forest recursive feature elimination

    Science.gov (United States)

    Zhou, Qifeng; Zhou, Hao; Zhou, Qingqing; Yang, Fan; Luo, Linkai

    2014-05-01

    Feature extraction is a key former step in structural damage detection. In this paper, a structural damage detection method based on wavelet packet decomposition (WPD) and random forest recursive feature elimination (RF-RFE) is proposed. In order to gain the most effective feature subset and to improve the identification accuracy a two-stage feature selection method is adopted after WPD. First, the damage features are sorted according to original random forest variable importance analysis. Second, using RF-RFE to eliminate the least important feature and reorder the feature list each time, then get the new feature importance sequence. Finally, k-nearest neighbor (KNN) algorithm, as a benchmark classifier, is used to evaluate the extracted feature subset. A four-storey steel shear building model is chosen as an example in method verification. The experimental results show that using the fewer features got from proposed method can achieve higher identification accuracy and reduce the detection time cost.

  8. Informative Feature Selection for Object Recognition via Sparse PCA

    Science.gov (United States)

    2011-04-07

    the BMW database [17] are used for training. For each image pair in SfM, SURF features are deemed informative if the consensus of the corresponding...observe that the first two sparse PVs are sufficient for selecting in- formative features that lie on the foreground objects in the BMW database (as... BMW ) database [17]. The database consists of multiple-view images of 20 landmark buildings on the Berkeley campus. For each building, wide-baseline

  9. Effect of feature-selective attention on neuronal responses in macaque area MT.

    Science.gov (United States)

    Chen, X; Hoffmann, K-P; Albright, T D; Thiele, A

    2012-03-01

    Attention influences visual processing in striate and extrastriate cortex, which has been extensively studied for spatial-, object-, and feature-based attention. Most studies exploring neural signatures of feature-based attention have trained animals to attend to an object identified by a certain feature and ignore objects/displays identified by a different feature. Little is known about the effects of feature-selective attention, where subjects attend to one stimulus feature domain (e.g., color) of an object while features from different domains (e.g., direction of motion) of the same object are ignored. To study this type of feature-selective attention in area MT in the middle temporal sulcus, we trained macaque monkeys to either attend to and report the direction of motion of a moving sine wave grating (a feature for which MT neurons display strong selectivity) or attend to and report its color (a feature for which MT neurons have very limited selectivity). We hypothesized that neurons would upregulate their firing rate during attend-direction conditions compared with attend-color conditions. We found that feature-selective attention significantly affected 22% of MT neurons. Contrary to our hypothesis, these neurons did not necessarily increase firing rate when animals attended to direction of motion but fell into one of two classes. In one class, attention to color increased the gain of stimulus-induced responses compared with attend-direction conditions. The other class displayed the opposite effects. Feature-selective activity modulations occurred earlier in neurons modulated by attention to color compared with neurons modulated by attention to motion direction. Thus feature-selective attention influences neuronal processing in macaque area MT but often exhibited a mismatch between the preferred stimulus dimension (direction of motion) and the preferred attention dimension (attention to color).

  10. Return of feature-based cost modeling

    Science.gov (United States)

    Creese, Robert C.; Patrawala, Taher B.

    1998-10-01

    Feature Based Cost Modeling is thought of as a relative new approach to cost modeling, but feature based cost modeling had considerable development in the 1950's. Considerable work was published in the 1950's by Boeing on cost for various casting processes--sand casting, die casting, investment casting and permanent mold casting--as a function of a single casting feature, casting volume. Additional approaches to feature based cost modeling have been made, and this work is a review of previous works and a proposed integrated model to feature based cost modeling.

  11. Adaptive feature selection using v-shaped binary particle swarm optimization

    Science.gov (United States)

    Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850

  12. Feature Selection Based on the Measurement of Correlation Information Entropy%一种基于关联信息熵度量的特征选择方法

    Institute of Scientific and Technical Information of China (English)

    董红斌; 滕旭阳; 杨雪

    2016-01-01

    Feature selection aims to select a smaller feature subset from the original feature set .The subset can provide the approximate or better performance in data mining and machine learning . Without transforming physical characteristics of features , fewer features give a more powerful interpretation . T raditional information‐theoretic methods tend to measure features relevance and redundancy separately and ignore the combination effect of the w hole feature subset .In this paper , the correlation information entropy is applied to feature selection , w hich is a technology in data fusion .Based on this method ,we measure the degree of the independence and redundancy among features .Then the correlation matrix is constructed by utilizing the mutual information between features and their class labels and the combination of feature pairs .Besides ,with the consideration of the multivariable correlation of different features in subset ,the eigenvalue of the correlation matrix is calculated .Therefore , the sorting algorithm of features and an adaptive feature subset selection algorithm combining with the parameter are proposed .Experiment results show the effectiveness and efficiency on classification tasks of the proposed algorithms .%特征选择旨在从原始集合中选择一个规模较小的特征子集,该子集能够在数据挖掘和机器学习任务中提供与原集合近似或者更好的表现。在不改变特征物理意义的基础上,较少特征为数据提供了更强的可解读性。传统信息论方法往往将特征相关性和冗余性分割判断,无法判断整个特征子集的组合效应。将数据融合领域中的关联信息熵理论应用到特征选择中,基于该方法度量特征间的独立和冗余程度。利用特征与类别的互信息与特征对组合构建特征相关矩阵,在计算矩阵特征值时充分考虑了特征子集中不同特征间的多变量关系。提出了特征排序方法,并

  13. A robust and accurate method for feature selection and prioritization from multi-class OMICs data.

    Directory of Open Access Journals (Sweden)

    Vittorio Fortino

    Full Text Available Selecting relevant features is a common task in most OMICs data analysis, where the aim is to identify a small set of key features to be used as biomarkers. To this end, two alternative but equally valid methods are mainly available, namely the univariate (filter or the multivariate (wrapper approach. The stability of the selected lists of features is an often neglected but very important requirement. If the same features are selected in multiple independent iterations, they more likely are reliable biomarkers. In this study, we developed and evaluated the performance of a novel method for feature selection and prioritization, aiming at generating robust and stable sets of features with high predictive power. The proposed method uses the fuzzy logic for a first unbiased feature selection and a Random Forest built from conditional inference trees to prioritize the candidate discriminant features. Analyzing several multi-class gene expression microarray data sets, we demonstrate that our technique provides equal or better classification performance and a greater stability as compared to other Random Forest-based feature selection methods.

  14. Feature subset selection algorithms for incomplete decision systems based on neighborhood rough sets%基于邻域粗糙集的不完整决策系统特征选择算法

    Institute of Scientific and Technical Information of China (English)

    谢娟英; 李楠; 乔子芮

    2011-01-01

    针对不完整决策系统属性约简算法时间复杂度较高问题,基于正域不变条件下,决策系统分类能力保持不变原则,提出不完整决策系统前向顺序特征选择算法.该算法从约简集为空集开始,根据在约简集合中加入各属性后对正域影响程度大小将属性降序排列,采用顺序前向搜索,选择当前最佳特征加入特征约简集合,确定最佳特征子集.将该算法扩展到基于邻域粗糙集的实值和混合型不完整决策系统,得到基于邻域粗糙集的不完整决策系统前向顺序特征选择算法.同时,将基于相容关系的不完整决策系统快速属性约简算法推广到实值和混合属性的不完整决策系统,得到适用于实值、混合属性的不完整决策系统后向特征选择算法.理论分析和University of California Irvine机器学习数据库数据集的实验共同表明,本文提出的基于邻域粗糙集的不完整决策系统前向特征选择算法有效降低了不完整决策系统特征选择算法的时间复杂度,在保持系统识别能力的情况下,用更少的时间得到决策系统的属性约简子集,即特征子集.然而,本文前向特征选择算法的缺陷是有可能因为无法选择到第一个最重要的特征(属性)而使特征选择过程不能进行下去,从而不能完成特征选择过程.%New feature subset selection algorithms are presented in this paper to reduce the heavy computational load of available algorithms to feature subset selection for incomplete decision systems.We firstly propose the forward sequential feature selection algorithm for incomplete decision systems based on the fact that that the discernibility of an incomplete decision system will not change with its unchangeable positive region;then we generalize the algorithm to heterogeneous incomplete decision systems based on neighborhood rough sets theory;finally we extend the fast approach to attribute reduction in incomplete

  15. An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem

    Directory of Open Access Journals (Sweden)

    Jing Bian

    2016-01-01

    Full Text Available In the era of big data, feature selection is an essential process in machine learning. Although the class imbalance problem has recently attracted a great deal of attention, little effort has been undertaken to develop feature selection techniques. In addition, most applications involving feature selection focus on classification accuracy but not cost, although costs are important. To cope with imbalance problems, we developed a cost-sensitive feature selection algorithm that adds the cost-based evaluation function of a filter feature selection using a chaos genetic algorithm, referred to as CSFSG. The evaluation function considers both feature-acquiring costs (test costs and misclassification costs in the field of network security, thereby weakening the influence of many instances from the majority of classes in large-scale datasets. The CSFSG algorithm reduces the total cost of feature selection and trades off both factors. The behavior of the CSFSG algorithm is tested on a large-scale dataset of network security, using two kinds of classifiers: C4.5 and k-nearest neighbor (KNN. The results of the experimental research show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

  16. Variance Ranklets : Orientation-selective rank features for contrast modulations

    NARCIS (Netherlands)

    Azzopardi, George; Smeraldi, Fabrizio

    2009-01-01

    We introduce a novel type of orientation–selective rank features that are sensitive to contrast modulations (second–order stimuli). Variance Ranklets are designed in close analogy with the standard Ranklets, but use the Siegel–Tukey statistics for dispersion instead of the Wilcoxon statistics. Their

  17. Technical Evaluation Report 27: Educational Wikis: Features and selection criteria

    Directory of Open Access Journals (Sweden)

    Jim Rudolph

    2004-04-01

    Full Text Available This report discusses the educational uses of the ‘wiki,’ an increasingly popular approach to online community development. Wikis are defined and compared with ‘blogging’ methods; characteristics of major wiki engines are described; and wiki features and selection criteria are examined.

  18. Feature selection for surface electromyography signal based on ant colony optimization%基于蚁群算法的表面肌电信号特征选择

    Institute of Scientific and Technical Information of China (English)

    黄虎; 谢洪波

    2012-01-01

    Objective To improve the classification performance of the surface electromyography (Semg) -based prosthesis and reduce the dimensions of features extracted from the Semg signals, a modified ant colony optimization (ACO) was employed to select the best feature subset. Methods The relationship between features and target classes was calculated as the heuristic function and the best feature subset was selected by ACO, and the trained artificial nerve net was utilized to verify the classification performance. Results Ten healthy subjects participated in the experiment on classification of hand and wrist motion using Semg signals. Compared to the principle component analysis (PCA) -based feature subsets, the ACO-reduced feature subsets not only improved the classification accuracy but greatly reduced the number of features in the original feature set, which subsequently simplified the structure of the classifier and reduced the computational cost. Conclusions The proposed method exhibits a great potential in the real-time applications, such as Semg-based prosthesis control.%目的为提高假肢系统对动作信号的识别速度,设计了基于优化蚁群算法(ant colony optimization,ACO)的特征选择法,对表面肌电信号(surface electromyography,sEMG)高维特征向量降维以减少计算负担.方法 以特征与目标类型之间互信息关系作为启发函数,通过蚁群算法选出最佳特征子集,最后用已训练好的人工神经网络检验其分类性能.结果 对10名健康受试者进行了手腕部动作的肌电信号模式分类实验.与传统主成分分析法(principle component analysis,PCA)相比,该算法选出的特征子集提高了识别准确率,并显著降低了原始特征集的特征维数,进而简化分类器的结构,减少计算开销.结论 本方法在实时性要求高的肌电控制假肢等系统中具有良好的应用前景.

  19. Review and Evaluation of Feature Selection Algorithms in Synthetic Problems

    CERN Document Server

    Belanche, L A

    2011-01-01

    The main purpose of Feature Subset Selection is to find a reduced subset of attributes from a data set described by a feature set. The task of a feature selection algorithm (FSA) is to provide with a computational solution motivated by a certain definition of relevance or by a reliable evaluation measure. In this paper several fundamental algorithms are studied to assess their performance in a controlled experimental scenario. A measure to evaluate FSAs is devised that computes the degree of matching between the output given by a FSA and the known optimal solutions. An extensive experimental study on synthetic problems is carried out to assess the behaviour of the algorithms in terms of solution accuracy and size as a function of the relevance, irrelevance, redundancy and size of the data samples. The controlled experimental conditions facilitate the derivation of better-supported and meaningful conclusions.

  20. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles

    2012-04-26

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  1. Neural Gen Feature Selection for Supervised Learning Classifier

    Directory of Open Access Journals (Sweden)

    Mohammed Hasan Abdulameer

    2014-04-01

    Full Text Available Face recognition has recently received significant attention, especially during the past few years. Many face recognition techniques were developed such as PSO-SVM and LDA-SVM However, inefficient features in the face recognition may lead to inadequate in the recognition results. Hence, a new face recognition system based on Genetic Algorithm and FFBNN technique is proposed. Our proposed face recognition system initially performs the feature extraction and these optimal features are promoted to the recognition process. In the feature extraction, the optimal features are extracted from the face image database by Genetic Algorithm (GA with FFBNN and the computed optimal features are given to the FFBNN technique to carry out the training and testing process. The optimal features from the feature database are fed to the FFBNN for accomplishing the training process. The well trained FFBNN with the optimal features provide the recognition result. The optimal features in FFBNN by GA efficiently perform the face recognition process. The human face dataset called YALE is utilized to analyze the performance of our proposed GA-FFNN technique and also this GA-FFBNN is compared with standard SVM and PSO-SVM techniques.

  2. Highly comparative, feature-based time-series classification

    CERN Document Server

    Fulcher, Ben D

    2014-01-01

    A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on ve...

  3. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    Science.gov (United States)

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  4. 基于Markov blanket和互信息的集成特征选择算法%Ensemble feature selection algorithm based on Markov blanket and mutual information

    Institute of Scientific and Technical Information of China (English)

    姚旭; 王晓丹; 张玉玺; 权文

    2012-01-01

    To resolve the poor performance of classifiers owing to the irrelevant and redundancy features, a feature selection algorithm based on approximate Markov blanket and dynamic mutual information is proposed, then it is introduced to an ensemble feature selection algorithm. In the ensemble algorithm, a base classifier is trained based on Bagging and the proposed feature selection algorithm, and the base classifier diversity is introduced to selective ensemble. Finally, the weighted voting method is utilized to fuse the base classifiers' recognition results. To attest the validity, experiments on data sets with support vector machine (SVM) as the classifier are carried out. The results have been compared with single-SVM, Bagging-SVM and AB-SVM. Experimental results suggest that the proposed algorithm can get higher classification accuracy.%针对大量无关和冗余特征的存在可能降低分类器性能的问题,提出一种基于近似Markov blanket 和动态互信息的特征选择算法并将其应用于集成学习,进而得到一种集成特征选择算法.该集成特征选择算法运用Bagging方法结合提出的特征选择方法生成基分类器,并引入基分类器差异度进行选择性集成,最后用加权投票法融合所选基分类器的识别结果.通过仿真实验验证算法的有效性,以支持向量机(support vector machine,SVM)为分类器,在公共数据集UCI上进行试验,并与单SVM及经典的Bagging集成算法和特征Bagging集成算法进行对比.实验结果显示,该方法可获得较高的分类精度.

  5. Economic indicators selection for crime rates forecasting using cooperative feature selection

    Science.gov (United States)

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Salleh Sallehuddin, Roselina

    2013-04-01

    Features selection in multivariate forecasting model is very important to ensure that the model is accurate. The purpose of this study is to apply the Cooperative Feature Selection method for features selection. The features are economic indicators that will be used in crime rate forecasting model. The Cooperative Feature Selection combines grey relational analysis and artificial neural network to establish a cooperative model that can rank and select the significant economic indicators. Grey relational analysis is used to select the best data series to represent each economic indicator and is also used to rank the economic indicators according to its importance to the crime rate. After that, the artificial neural network is used to select the significant economic indicators for forecasting the crime rates. In this study, we used economic indicators of unemployment rate, consumer price index, gross domestic product and consumer sentiment index, as well as data rates of property crime and violent crime for the United States. Levenberg-Marquardt neural network is used in this study. From our experiments, we found that consumer price index is an important economic indicator that has a significant influence on the violent crime rate. While for property crime rate, the gross domestic product, unemployment rate and consumer price index are the influential economic indicators. The Cooperative Feature Selection is also found to produce smaller errors as compared to Multiple Linear Regression in forecasting property and violent crime rates.

  6. 基于特征选择集成学习的果蝇求偶行为识别%Recognition of drosophila courtship behavior based on feature selection ensemble leaning

    Institute of Scientific and Technical Information of China (English)

    谢元澄; 梁敬东; 王书平; 余倩倩; 李飞

    2011-01-01

    A machine learning classification algorithm was designed to automatically recognize drosophila courtship behaviors. Following the image normalization,the texture and geometry features of drosophila courtship images were acquired by extracting the local binary patterns ( LBP) and walsh fetures. A strong classifier was constructed to detect the drosophila courtship behaviors by feature selection based ensemble learning. We performed a standard 10-fold cross validation to evaluate the performance of the proposed method. The experimental results showed that the proposed method had a better performance than the traditional image segmentation methods. The fast selective ensemble based feature selection ensemble learning was more effective than the traditional ensemble method. It was feasible to recognize the insect's complex texture based on feature selection ensemble learning. Drosophila ethograms might be recognized automatically in large-scale behavioral screening based on machine learning, and this technology should facilitate the investigations for genes and neural circuits controlling courtship and aggression.%设计一个机器学习分类算法,实现对果蝇求偶行为的自动识别.在对图片规范化的基础上,提取图像局部二元模式统计特征与沃尔什特征获得果蝇求偶图像纹理几何特征.通过基于特征选择的集成学习来构建一个强分类器,实现对果蝇求偶行为的检测;采用十折交叉验证的方法进行验证,检测结果优于传统的图像处理分割算法.基于特征选择快速选择性集成,效率高于传统集成方法,基于特征选择集成学习识别昆虫的复杂纹理是可行的.通过机器学习方法来识别果蝇行为谱可以实现大规模的行为筛查,这将有助于基因和神经回路控制行为的研究.

  7. 基于粗糙集与信息增益的情感特征选择方法%A Sentiment Feature Selection Method Based on Rough Set and Information Gain

    Institute of Scientific and Technical Information of China (English)

    蒲国林

    2016-01-01

    为了提高情感特征提取的准确率 ,为高性能情感分析打下坚实的基础 ,提出了一种融合粗糙集与信息增益的情感特征选择方法 .该方法借助信息增益判据选出高相关性的特征子集 ,再通过粗糙集剔除高冗余性的特征 ,从而得到最优的特征子集 .在多个数据集上的测试表明 ,该方法可将若干经典方法的准确率提高4~9个百分点 ,是一种优秀的特征选择方法 ,对提升情感分析的整体性能有明显意义 .%A Rough Set and Information Gain based on sentiment feature selection method is proposed for building a solid foundation in sentiment analysis .The novel method firstly uses Information Gain to select a feature subset which has high relativity with the class attribute .Secondly ,the features which have high redundancy will be eliminated by Rough Set .Experimental results on several datasets reveal the method makes accuracy increase 4-9 percentages than other methods .It is an outstanding feature selection method and has significance in sentiment analysis .

  8. Making Trillion Correlations Feasible in Feature Grouping and Selection.

    Science.gov (United States)

    Zhai, Yiteng; Ong, Yew-Soon; Tsang, Ivor W

    2016-12-01

    Today, modern databases with "Big Dimensionality" are experiencing a growing trend. Existing approaches that require the calculations of pairwise feature correlations in their algorithmic designs have scored miserably on such databases, since computing the full correlation matrix (i.e., square of dimensionality in size) is computationally very intensive (i.e., million features would translate to trillion correlations). This poses a notable challenge that has received much lesser attention in the field of machine learning and data mining research. Thus, this paper presents a study to fill in this gap. Our findings on several established databases with big dimensionality across a wide spectrum of domains have indicated that an extremely small portion of the feature pairs contributes significantly to the underlying interactions and there exists feature groups that are highly correlated. Inspired by the intriguing observations, we introduce a novel learning approach that exploits the presence of sparse correlations for the efficient identifications of informative and correlated feature groups from big dimensional data that translates to a reduction in complexity from O(m(2)n) to O(mlogm + Ka mn), where Ka strategy, designed to filter out the large number of non-contributing correlations that could otherwise confuse the classifier while identifying the correlated and informative feature groups, forms one of the highlights of our approach. We also demonstrated the proposed method on one-class learning, where notable speedup can be observed when solving one-class problem on big dimensional data. Further, to identify robust informative features with minimal sampling bias, our feature selection strategy embeds the V-fold cross validation in the learning model, so as to seek for features that exhibit stable or consistent performance accuracy on multiple data folds. Extensive empirical studies on both synthetic and several real-world datasets comprising up to 30 million

  9. A FEATURE SELECTION ALGORITHM DESIGN AND ITS IMPLEMENTATION IN INTRUSION DETECTION SYSTEM

    Institute of Scientific and Technical Information of China (English)

    杨向荣; 沈钧毅

    2003-01-01

    Objective Present a new features selection algorithm. Methods based on rule induction and field knowledge. Results This algorithm can be applied in catching dataflow when detecting network intrusions, only the sub-dataset including discriminating features is catched. Then the time spend in following behavior patterns mining is reduced and the patterns mined are more precise. Conclusion The experiment results show that the feature subset catched by this algorithm is more informative and the dataset's quantity is reduced significantly.

  10. A Hybrid method of face detection based on Feature Extraction using PIFR and Feature Optimization using TLBO

    Directory of Open Access Journals (Sweden)

    Kapil Verma

    2016-01-01

    Full Text Available In this paper we proposed a face detection method based on feature selection and feature optimization. Now in current research trend of biometric security used the process of feature optimization for better improvement of face detection technique. Basically our face consists of three types of feature such as skin color, texture and shape and size of face. The most important feature of face is skin color and texture of face. In this detection technique used texture feature of face image. For the texture extraction of image face used partial feature extraction function, these function is most promising shape feature analysis. For the selection of feature and optimization of feature used multi-objective TLBO. TLBO algorithm is population based searching technique and defines two constraints function for the process of selection and optimization. The proposed algorithm of face detection based on feature selection and feature optimization process. Initially used face image data base and passes through partial feature extractor function and these transform function gives a texture feature of face image. For the evaluation of performance our proposed algorithm implemented in MATLAB 7.8.0 software and face image used provided by Google face image database. For numerical analysis of result used hit and miss ratio. Our empirical evaluation of result shows better prediction result in compression of PIFR method of face detection.

  11. Distance-based features in pattern classification

    Directory of Open Access Journals (Sweden)

    Lin Wei-Yang

    2011-01-01

    Full Text Available Abstract In data mining and pattern classification, feature extraction and representation methods are a very important step since the extracted features have a direct and significant impact on the classification accuracy. In literature, numbers of novel feature extraction and representation methods have been proposed. However, many of them only focus on specific domain problems. In this article, we introduce a novel distance-based feature extraction method for various pattern classification problems. Specifically, two distances are extracted, which are based on (1 the distance between the data and its intra-cluster center and (2 the distance between the data and its extra-cluster centers. Experiments based on ten datasets containing different numbers of classes, samples, and dimensions are examined. The experimental results using naïve Bayes, k-NN, and SVM classifiers show that concatenating the original features provided by the datasets to the distance-based features can improve classification accuracy except image-related datasets. In particular, the distance-based features are suitable for the datasets which have smaller numbers of classes, numbers of samples, and the lower dimensionality of features. Moreover, two datasets, which have similar characteristics, are further used to validate this finding. The result is consistent with the first experiment result that adding the distance-based features can improve the classification performance.

  12. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

    KAUST Repository

    Abusamra, Heba

    2013-05-01

    Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in

  13. Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

    CERN Document Server

    Donalek, Ciro; Djorgovski, S G; Mahabal, Ashish A; Graham, Matthew J; Fuchs, Thomas J; Turmon, Michael J; Philip, N Sajeeth; Yang, Michael Ting-Chang; Longo, Giuseppe

    2013-01-01

    The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection is quickly becoming a crucial task in analyzing astronomical data sets. Using data sets extracted from the ongoing Catalina Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a variety of feature selection strategies used to identify the subsets that give the most information and the results achieved applying these techniques to three major astronomical problems.

  14. Feature Selection for Generator Excitation Neurocontroller Development Using Filter Technique

    Directory of Open Access Journals (Sweden)

    Abdul Ghani Abro

    2011-09-01

    Full Text Available Essentially, motive behind using control system is to generate suitable control signal for yielding desired response of a physical process. Control of synchronous generator has always remained very critical in power system operation and control. For certain well known reasons power generators are normally operated well below their steady state stability limit. This raises demand for efficient and fast controllers. Artificial intelligence has been reported to give revolutionary outcomes in the field of control engineering. Artificial Neural Network (ANN, a branch of artificial intelligence has been used for nonlinear and adaptive control, utilizing its inherent observability. The overall performance of neurocontroller is dependent upon input features too. Selecting optimum features to train a neurocontroller optimally is very critical. Both quality and size of data are of equal importance for better performance. In this work filter technique is employed to select independent factors for ANN training.

  15. Acute Exercise Modulates Feature-selective Responses in Human Cortex.

    Science.gov (United States)

    Bullock, Tom; Elliott, James C; Serences, John T; Giesbrecht, Barry

    2017-04-01

    An organism's current behavioral state influences ongoing brain activity. Nonhuman mammalian and invertebrate brains exhibit large increases in the gain of feature-selective neural responses in sensory cortex during locomotion, suggesting that the visual system becomes more sensitive when actively exploring the environment. This raises the possibility that human vision is also more sensitive during active movement. To investigate this possibility, we used an inverted encoding model technique to estimate feature-selective neural response profiles from EEG data acquired from participants performing an orientation discrimination task. Participants (n = 18) fixated at the center of a flickering (15 Hz) circular grating presented at one of nine different orientations and monitored for a brief shift in orientation that occurred on every trial. Participants completed the task while seated on a stationary exercise bike at rest and during low- and high-intensity cycling. We found evidence for inverted-U effects; such that the peak of the reconstructed feature-selective tuning profiles was highest during low-intensity exercise compared with those estimated during rest and high-intensity exercise. When modeled, these effects were driven by changes in the gain of the tuning curve and in the profile bandwidth during low-intensity exercise relative to rest. Thus, despite profound differences in visual pathways across species, these data show that sensitivity in human visual cortex is also enhanced during locomotive behavior. Our results reveal the nature of exercise-induced gain on feature-selective coding in human sensory cortex and provide valuable evidence linking the neural mechanisms of behavior state across species.

  16. Feature selection for face recognition: a memetic algorithmic approach

    Institute of Scientific and Technical Information of China (English)

    Dinesh KUMAR; Shakti KUMAR; C. S. RAI

    2009-01-01

    The eigenface method that uses principal component analysis (PCA) has been the standard and popular method used in face recognition. This paper presents a PCA-memetic algorithm (PCA-MA) approach for feature selection. PCA has been extended by MAs where the former was used for feature extraction/dimensionality reduction and the latter exploited for feature selection. Simulations were performed over ORL and YaleB face databases using Euclidean norm as the classifier. It was found that as far as the recognition rate is concerned, PCA-MA completely outperforms the eigenface method. We compared the performance of PCA extended with genetic algorithm (PCA-GA) with our proposed PCA-MA method. The results also clearly established the supremacy of the PCA-MA method over the PCA-GA method. We further extended linear discriminant analysis (LDA) and kernel principal component analysis (KPCA) approaches with the MA and observed significant improvement in recognition rate with fewer features. This paper also compares the performance of PCA-MA, LDA-MA and KPCA-MA approaches.

  17. Multifinger Feature Level Fusion Based Fingerprint Identification

    Directory of Open Access Journals (Sweden)

    Praveen N

    2012-12-01

    Full Text Available Fingerprint based authentication systems are one of the cost-effective biometric authentication techniques employed for personal identification. As the data base population increases, fast identification/recognition algorithms are required with high accuracy. Accuracy can be increased using multimodal evidences collected by multiple biometric traits. In this work, consecutive fingerprint images are taken, global singularities are located using directional field strength and their local orientation vector is formulated with respect to the base line of the finger. Featurelevel fusion is carried out and a 32 element feature template is obtained. A matching score is formulated for the identification and 100% accuracy was obtained for a database of 300 persons. The polygonal feature vector helps to reduce the size of the feature database from the present 70-100 minutiae features to just 32 features and also a lower matching threshold can be fixed compared to single finger based identification

  18. Multi-Stage Recognition of Speech Emotion Using Sequential Forward Feature Selection

    Directory of Open Access Journals (Sweden)

    Liogienė Tatjana

    2016-07-01

    Full Text Available The intensive research of speech emotion recognition introduced a huge collection of speech emotion features. Large feature sets complicate the speech emotion recognition task. Among various feature selection and transformation techniques for one-stage classification, multiple classifier systems were proposed. The main idea of multiple classifiers is to arrange the emotion classification process in stages. Besides parallel and serial cases, the hierarchical arrangement of multi-stage classification is most widely used for speech emotion recognition. In this paper, we present a sequential-forward-feature-selection-based multi-stage classification scheme. The Sequential Forward Selection (SFS and Sequential Floating Forward Selection (SFFS techniques were employed for every stage of the multi-stage classification scheme. Experimental testing of the proposed scheme was performed using the German and Lithuanian emotional speech datasets. Sequential-feature-selection-based multi-stage classification outperformed the single-stage scheme by 12–42 % for different emotion sets. The multi-stage scheme has shown higher robustness to the growth of emotion set. The decrease in recognition rate with the increase in emotion set for multi-stage scheme was lower by 10–20 % in comparison with the single-stage case. Differences in SFS and SFFS employment for feature selection were negligible.

  19. Survival Prediction and Feature Selection in Patients with Breast Cancer Using Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Shahrbanoo Goli

    2016-01-01

    Full Text Available The Support Vector Regression (SVR model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model.

  20. Discriminative multi-task feature selection for multi-modality classification of Alzheimer's disease.

    Science.gov (United States)

    Ye, Tingting; Zu, Chen; Jie, Biao; Shen, Dinggang; Zhang, Daoqiang

    2016-09-01

    Recently, multi-task based feature selection methods have been used in multi-modality based classification of Alzheimer's disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). However, in traditional multi-task feature selection methods, some useful discriminative information among subjects is usually not well mined for further improving the subsequent classification performance. Accordingly, in this paper, we propose a discriminative multi-task feature selection method to select the most discriminative features for multi-modality based classification of AD/MCI. Specifically, for each modality, we train a linear regression model using the corresponding modality of data, and further enforce the group-sparsity regularization on weights of those regression models for joint selection of common features across multiple modalities. Furthermore, we propose a discriminative regularization term based on the intra-class and inter-class Laplacian matrices to better use the discriminative information among subjects. To evaluate our proposed method, we perform extensive experiments on 202 subjects, including 51 AD patients, 99 MCI patients, and 52 healthy controls (HC), from the baseline MRI and FDG-PET image data of the Alzheimer's Disease Neuroimaging Initiative (ADNI). The experimental results show that our proposed method not only improves the classification performance, but also has potential to discover the disease-related biomarkers useful for diagnosis of disease, along with the comparison to several state-of-the-art methods for multi-modality based AD/MCI classification.

  1. SIFT based algorithm for point feature tracking

    Directory of Open Access Journals (Sweden)

    Adrian BURLACU

    2007-12-01

    Full Text Available In this paper a tracking algorithm for SIFT features in image sequences is developed. For each point feature extracted using SIFT algorithm a descriptor is computed using information from its neighborhood. Using an algorithm based on minimizing the distance between two descriptors tracking point features throughout image sequences is engaged. Experimental results, obtained from image sequences that capture scaling of different geometrical type object, reveal the performances of the tracking algorithm.

  2. Identity Recognition Algorithm Using Improved Gabor Feature Selection of Gait Energy Image

    Science.gov (United States)

    Chao, LIANG; Ling-yao, JIA; Dong-cheng, SHI

    2017-01-01

    This paper describes an effective gait recognition approach based on Gabor features of gait energy image. In this paper, the kernel Fisher analysis combined with kernel matrix is proposed to select dominant features. The nearest neighbor classifier based on whitened cosine distance is used to discriminate different gait patterns. The approach proposed is tested on the CASIA and USF gait databases. The results show that our approach outperforms other state of gait recognition approaches in terms of recognition accuracy and robustness.

  3. Linear feature detection based on ridgelet

    Institute of Scientific and Technical Information of China (English)

    HOU; Biao; (侯彪); LIU; Fang; (刘芳); JIAO; Licheng; (焦李成)

    2003-01-01

    Linear feature detection is very important in image processing. The detection efficiency will directly affect the perfomance of pattern recognition and pattern classification. Based on the idea of ridgelet, this paper presents a new discrete localized ridgelet transform and a new method for detecting linear feature in anisotropic images. Experimental results prove the efficiency of the proposed method.

  4. 基于高光谱成像的鲜桃虫害检测特征向量的选取%Feature Vectors Selection for Fresh Peach Pest Detection Based on Hyperspectral Imaging

    Institute of Scientific and Technical Information of China (English)

    王帅帅

    2015-01-01

    本文采用阈值分割和主成分分析方法对高光谱图像进行处理,以得到虫害区域分割结果. 然后选取2个特征波长作为光谱特征,提取4个纹理参数作为纹理特征,并将其优化组合成4组特征向量. 利用BP神经网络进行鲜桃虫害检测. 结果表明,667 nm和746 nm波段的光谱反射值的光谱特征和270 °方向的能量、对比度、熵、相关性的纹理特征的组合为鲜桃虫害检测的最优特征向量,果实识别正确率为100%.%The paper used the method of threshold segmentation and principal component analysis to process the hyperspectral image in order to get the result of the segmentation of the pest regions .Then selected 2 characteristic wavelengths as spectral features , extracted 4 texture parameters as texture features , optimized above both to 4 sets of feature vector , at last used BP neural network to detect the peach pests.The results showed that optimal feature vector was the combination of the spectral characteristics for 667 and 746nm bands and the texture feature based on energy , contrast, entropy and correlation of 270 degrees.The correct recognition rate was 100% of peaches .

  5. Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction

    Directory of Open Access Journals (Sweden)

    Hui Liu

    2015-01-01

    Full Text Available It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that p2, p1, p1′, and p2′ are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Decision fusion of subsets that getting after feature selection obtains excellent prediction performance, which proves feature selection combined with decision fusion is an effective and useful method for the task of HIV-1 protease cleavage site prediction. The results and analysis in this paper can provide useful instruction and help designing HIV-1 protease inhibitor in the future.

  6. 一种自适应的Gabor图像特征抽取和权重选择的人脸识别方法%An Adaptive Feature and Weight Selection Method Based on Gabor Image for Face Recognition

    Institute of Scientific and Technical Information of China (English)

    刘中华; 殷俊; 金忠

    2011-01-01

    To overcome the negative effect of factors such as illumination and expression on face recognition, an adaptive feature and weight selection method was proposed. The method was based on Gabor image for face recognition. Firstly, 40 independent feature matrices which were reconstructed with the same scale and the same direction transform results of the different face images were obtained by regarding every Gabor wavelet transformed output image as an independent sample. In order to enhance the robustness to facial expression and illumination variations, the contribution of each new feature matrix could be adaptively computed by the proposed adaptive weight method. Secondly, after applying discrete cosine transform to each feature matrix, the coefficients which had more power to discriminate different classes than others were selected by discrimination power analysis to construct feature vectors. And, linear discriminant analysis features were extracted to fulfill recognition task. Experiments on the face databases demonstrate the effectiveness of the proposed method.%为了克服光照、表情变化等因素对人脸识别的影响,本文提出了一种自适应的Gabor图像特征抽取和权重选择的人脸识别方法.该方法首先把每幅人脸图像经过Gabor小波变换后得到的40个不同尺度和方向下的图像都看作是独立的样本,再把不同人脸中的同一尺度和方向的变换结果进行特征重组,得到40个独立地新特征矩阵.为了增强对光照、表情变化的鲁棒性,每一新特征矩阵的识别贡献被本文所提出的自适应权重方法计算得到.其次,对每一新特征矩阵采用离散余弦变化进行降维,并采用了鉴别力量分析方法来选取最有鉴别力的离散余弦变换系数作为特征向量.最后,抽取线性鉴别分析特征进行识别.大量的实验证明了本文所提方法的有效性.

  7. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

    KAUST Repository

    Abusamra, Heba

    2013-11-01

    Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

  8. Efficient sparse kernel feature extraction based on partial least squares.

    Science.gov (United States)

    Dhanjal, Charanpal; Gunn, Steve R; Shawe-Taylor, John

    2009-08-01

    The presence of irrelevant features in training data is a significant obstacle for many machine learning tasks. One approach to this problem is to extract appropriate features and, often, one selects a feature extraction method based on the inference algorithm. Here, we formalize a general framework for feature extraction, based on Partial Least Squares, in which one can select a user-defined criterion to compute projection directions. The framework draws together a number of existing results and provides additional insights into several popular feature extraction methods. Two new sparse kernel feature extraction methods are derived under the framework, called Sparse Maximal Alignment (SMA) and Sparse Maximal Covariance (SMC), respectively. Key advantages of these approaches include simple implementation and a training time which scales linearly in the number of examples. Furthermore, one can project a new test example using only k kernel evaluations, where k is the output dimensionality. Computational results on several real-world data sets show that SMA and SMC extract features which are as predictive as those found using other popular feature extraction methods. Additionally, on large text retrieval and face detection data sets, they produce features which match the performance of the original ones in conjunction with a Support Vector Machine.

  9. 基于自适应特征选择的自动白平衡方法%Automatic White Balance Method Based on Self-adaptive Feature Selection

    Institute of Scientific and Technical Information of China (English)

    印蔚蔚

    2015-01-01

    This paper presents a method of automatic white balance based on self adaptive feature selection. Three kinds of neutral color feature is used in the process: the characteristics of the modified grey world, white spot method of linear fea-ture and color clustering, they are complementary to each other and the calculation of the effectiveness. the test results show that the proposed algorithm is superior to the traditional and in a wide range of scenarios to color compensation and reason-able.%文章提出了一种基于自适应特征选择的自动白平衡方法。该方法中使用了3种中性颜色特征:修改过的灰色世界法的特征、白斑法的特征以及线性颜色聚类的特征,它们具有互补性和计算有效性。试验结果证明文章提出的算法优于传统的以假设为基础的方法并且在广泛的场景上达到了合理的色彩补偿。

  10. Text feature selection method based on hybrid clone quantum genetic strategy%基于混合克隆量子遗传策略的文本特征选择方法

    Institute of Scientific and Technical Information of China (English)

    符保龙

    2012-01-01

    The metrics of vector reduction rate and classification accuracy, and to use of the qubits encoded on the genetic algorithm, combined with the cloning operator, this paper proposed a strategy based on hybrid genetic quantum cloning text feature selection method. Experimental results show that the method can effectively reduce the dimension of feature vector text, set of extracted features can improve the quantum accuracy of text classification.%引入向量约简率和分类准确率的度量标准,采用量子比特对遗传算法进行编码,结合克隆算子,提出一种基于混合克隆量子遗传策略的文本特征选择方法.实验结果显示,该方法能有效地降低文本特征向量的维度,所提取的特征向量子集能有效提高文本分类的精度.

  11. Computer-aided Selection System for Cutting Tools and Parameters Based on Machining Features%面向加工特征的刀具和切削参数计算机辅助选择系统

    Institute of Scientific and Technical Information of China (English)

    郝传海; 刘战强; 任小平; 万熠

    2012-01-01

    Cutting tool manufacturers are facing increasing demands to supply a comprehensive advice service with relation to selection of appropriate tools and cutting parameters for a widely variety of part materials and machining features. The central element for process planning is to select the appropriate cutting tools and machining parameters, too. However, the main attention has been only paid on the part materials. It causes the mismatches between workpieces and tools. This study is to describe the development of a computer - aided selection system for cutting tools and cutting parameters based on machining features (FTCPS), which is designed to cover different component shapes including turning, milling, drilling as well as boring operation features. The kinematic link between machined surface feature with a simple icon based interface being used to input data records, and a relational database combined with data - driven method and rule - based decision logic is used to select cutting tool geometry and machining parameters for a range of machining operations. The system also utilizes mathematical model to calculate processing conditions (machining time in single path, cutting power, maximum harshness, etc. ). Process planning is completed in the end. By turning tools and turning parameters selection for example , the result shows the realization method of system. FTCPS will help the designers and manufacturing planners to select optimal set of cutting tools and cutting conditions.%切削刀具制造商面临围绕大量工件材料和加工特征为客户提供合理刀具和切削参数的现状,切削工艺规划的核心步骤也是刀具和切削参数的确定.确定刀具和切削参数一般多从零件材料角度出发,可能导致工件与刀具不匹配.文中提出面向加工特征的刀具和切削参数计算机辅助选择系统的开发.系统包括车削特征、铣削特征、钻削和镗削加工特征,系统利用特征

  12. Cost effective approach on feature selection using genetic algorithms and fuzzy logic for diabetes diagnosis

    CERN Document Server

    Ephzibah, E P

    2011-01-01

    A way to enhance the performance of a model that combines genetic algorithms and fuzzy logic for feature selection and classification is proposed. Early diagnosis of any disease with less cost is preferable. Diabetes is one such disease. Diabetes has become the fourth leading cause of death in developed countries and there is substantial evidence that it is reaching epidemic proportions in many developing and newly industrialized nations. In medical diagnosis, patterns consist of observable symptoms along with the results of diagnostic tests. These tests have various associated costs and risks. In the automated design of pattern classification, the proposed system solves the feature subset selection problem. It is a task of identifying and selecting a useful subset of pattern-representing features from a larger set of features. Using fuzzy rule-based classification system, the proposed system proves to improve the classification accuracy.

  13. Textural feature selection for enhanced detection of stationary humans in through-the-wall radar imagery

    Science.gov (United States)

    Chaddad, A.; Ahmad, F.; Amin, M. G.; Sevigny, P.; DiFilippo, D.

    2014-05-01

    Feature-based methods have been recently considered in the literature for detection of stationary human targets in through-the-wall radar imagery. Specifically, textural features, such as contrast, correlation, energy, entropy, and homogeneity, have been extracted from gray-level co-occurrence matrices (GLCMs) to aid in discriminating the true targets from multipath ghosts and clutter that closely mimic the target in size and intensity. In this paper, we address the task of feature selection to identify the relevant subset of features in the GLCM domain, while discarding those that are either redundant or confusing, thereby improving the performance of feature-based scheme to distinguish between targets and ghosts/clutter. We apply a Decision Tree algorithm to find the optimal combination of co-occurrence based textural features for the problem at hand. We employ a K-Nearest Neighbor classifier to evaluate the performance of the optimal textural feature based scheme in terms of its target and ghost/clutter discrimination capability and use real-data collected with the vehicle-borne multi-channel through-the-wall radar imaging system by Defence Research and Development Canada. For the specific data analyzed, it is shown that the identified dominant features yield a higher classification accuracy, with lower number of false alarms and missed detections, compared to the full GLCM based feature set.

  14. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.

    Science.gov (United States)

    Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina

    2016-02-06

    The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew's Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions.

  15. Ontology Based Feature Driven Development Life Cycle

    Directory of Open Access Journals (Sweden)

    Farheen Siddiqui

    2012-01-01

    Full Text Available The upcoming technology support for semantic web promises fresh directions for Software Engineering community. Also semantic web has its roots in knowledge engineering that provoke software engineers to look for application of ontology applications throughout the Software Engineering lifecycle. The internal components of a semantic web are "light weight", and may be of less quality standards than the externally visible modules. In fact the internal components are generated from external (ontological component. That's the reason agile development approaches such as feature driven development are suitable for applications internal component development. As yet there is no particular procedure that describes the role of ontology in FDD processes. Therefore we propose an ontology based feature driven development for semantic web application that can be used form application model development to feature design and implementation. Features are precisely defined in the OWL-based domain model. Transition from OWL based domain model to feature list is directly defined in transformation rules. On the other hand the ontology based overall model can be easily validated through automated tools. Advantages of ontology-based feature Driven development are also discussed.

  16. 乳腺癌数据的几何代数特征提取和微分进化特征选择研究%Feature Extraction for Breast Cancer Data Based on Geometric Algebra Theory and Feature Selection Using Differential Evolution

    Institute of Scientific and Technical Information of China (English)

    李静; 洪文学

    2014-01-01

    模式识别问题中特征提取和特征选择是一个重要问题.基于向量的几何代数表示方法,提出了一种新的几何代数片积系数特征提取方法,并对其中存在的维数升高问题进行了研究,提出了改进的微分进化特征选择方法.本文分类器采用线性判别分析,以公开的乳腺癌生物医学数据集进行10折交叉验证(10 CV),得到的分类结果超过了96%,优于原始特征和传统特征提取方法下的分类性能.%The feature extraction and feature selection are the important issues in pattern recognition.Based on the geometric algebra representation of vector,a new feature extraction method using blade coefficient of geometric algebra was proposed in this study.At the same time,an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue.The simple linear discriminant analysis was used as the classifier.The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96 % and proved superior to that of the original features and traditional feature extraction method.

  17. A Text Feature Selection Algorithm Based on Analysing the Relationship Between Words%基于词间关系分析的文本特征选择算法

    Institute of Scientific and Technical Information of China (English)

    吴双; 张文生; 徐海瑞

    2012-01-01

    传统的特征选择方法通常使用特征评价函数从原始词集中筛选出最具有类别区分能力的特征.这些方法是基于以独立的词作为语义单元的向量空间模型,忽略了词与词之间的关联关系,难以突出文本内容中的关键特征.针对传统特征选择方法的不足,本文提出一种新的基于词间关系的文本特征选择算法.该方法考虑对文本内容表示起到关键性作用的词,利用关联规则挖掘算法发现词语之间的关联关系,并且通过相关分析对强关联规则进行筛选,最终生成与类别属性密切相关的特征空间.实验结果表明,该方法更好地表示了文本的语义内容,而且分类效果优于传统算法.%The traditional feature selection algorithms usually select features distinguishing the different types of documents by the evaluation functions. However, these methods take the separate word as unit to establish a vector space model. The important words in the documents and the relationship between words are not realized. In allusion to the disadvantages mentioned above, a new feature selection algorithm based on the relationship between words is presented. This algorithm considers key words, mines words' association and checks these association rules by a correlation analysis to produce a feature space which closely relates to the category attributes. The experiment indicates that this method is better to express the semantic content of the documents and has a good categorization result.

  18. 基于蚁群算法特征选择的语音情感识别%Feature Selection of Speech Emotional Recognition Based on Ant Colony Optimization Algorithm

    Institute of Scientific and Technical Information of China (English)

    杨鸿章

    2013-01-01

    情感特征提取是语音情感准确识别的关键,传统方法采用单一特征或者简单组合特征提取方法,单一特征无法全面反映语音情感变化,简单组合特征会使特征间产生大量冗余特征,影响识别正确结果.为了提高语音情感识别率,提了一种蚁群算法的语音情感智能识别方法.首先采用语音识别正确率和特征子集维数加权作为目标函数,然后利用蚁群算法找到最优语音特征子集,消除特征冗余信息.通过汉话和丹麦语两种情感语音库进行仿真测试,仿真结果表明,改进方法不仅消除了冗余、无用特征,降低了特征维数,而且提高了语音情感识别率,是一种有效的语音情感智能识别方法.%Speech emotion information has the characteristics of high dimension and redundancy, in order to improve the accuracy of speech emotion recognition, this paper put forward a speech emotion recognition model to select features based on ant colony optimization algorithm. The classification accuracy of KNN and the selected feature dimension form the fitness function, and the ant colony optimization algorithm provides good global searching capability and multiple sub - optimal solutions. A local refinement searching scheme was designed to exclude the redundant features and improve the convergence rate. The performance of method was tested by Chinese emotional speech database and a Danish Emotional Speech. The simulation results show that the proposed method can not only eliminate redundant and useless features to reduce the dimension of features, but also improve the speech emotion recognition rate, therefore the proposed model is an effective speech emotion recognition method.

  19. Development of in Silico Models for Predicting P-Glycoprotein Inhibitors Based on a Two-Step Approach for Feature Selection and Its Application to Chinese Herbal Medicine Screening.

    Science.gov (United States)

    Yang, Ming; Chen, Jialei; Shi, Xiufeng; Xu, Liwen; Xi, Zhijun; You, Lisha; An, Rui; Wang, Xinhong

    2015-10-01

    P-glycoprotein (P-gp) is regarded as an important factor in determining the ADMET (absorption, distribution, metabolism, elimination, and toxicity) characteristics of drugs and drug candidates. Successful prediction of P-gp inhibitors can thus lead to an improved understanding of the underlying mechanisms of both changes in the pharmacokinetics of drugs and drug-drug interactions. Therefore, there has been considerable interest in the development of in silico modeling of P-gp inhibitors in recent years. Considering that a large number of molecular descriptors are used to characterize diverse structural moleculars, efficient feature selection methods are required to extract the most informative predictors. In this work, we constructed an extensive available data set of 2428 molecules that includes 1518 P-gp inhibitors and 910 P-gp noninhibitors from multiple resources. Importantly, a two-step feature selection approach based on a genetic algorithm and a greedy forward-searching algorithm was employed to select the minimum set of the most informative descriptors that contribute to the prediction of P-gp inhibitors. To determine the best machine learning algorithm, 18 classifiers coupled with the feature selection method were compared. The top three best-performing models (flexible discriminant analysis, support vector machine, and random forest) and their ensemble model using respectively only 3, 9, 7, and 14 descriptors achieve an overall accuracy of 83.2%-86.7% for the training set containing 1040 compounds, an overall accuracy of 82.3%-85.5% for the test set containing 1039 compounds, and a prediction accuracy of 77.4%-79.9% for the external validation set containing 349 compounds. The models were further extensively validated by DrugBank database (1890 compounds). The proposed models are competitive with and in some cases better than other published models in terms of prediction accuracy and minimum number of descriptors. Applicability domain then was addressed

  20. A Feature Subset Selection Algorithm Based on Neighborhood Rough Set for Incremental Updating Datasets%基于邻域粗糙集的增量特征选择

    Institute of Scientific and Technical Information of China (English)

    李楠; 谢娟英

    2011-01-01

    A feature subset selection algorithm is presented based on neighborhood rough set theory for die datasets which are updated by the increment in their samples. It is well known that the increment in samples can cause the changeable in the reduction of attributes of the dataset. Did a through-paced analysis to the variety on positive region brought by the new added sample to the dataset, and discussed the selective updating to the feature subset (attribute reduction) according to all the cases. The selective updating to the original reduction of attributes of the dataset can avoid the unwanted operations, and reduce the complexity of the feature subset selection algorithm. Finally, gave a real example and demonstrated the algorithm.%针对连续型属性的数据集,当有新样本加入时,可能引起最佳属性约简子集变化的问题,提出了基于邻域粗糙集的特征子集增量式更新方法.根据新增样本对正域的影响,分情况对原数据集的属性约简子集进行动态更新,以便得到增加样本后的新数据的最佳属性约简子集.这种对原约简集合进行的有选择的动态更新可以有效地避免重复操作,降低算法复杂度,只有在最坏的情况下才需要对整个数据集进行重新约简.并以一个实例进行分析说明.实例分析表明,先对新增样本进行分析,然后选择性对新数据集进行约简可以有效地避免重复操作,得到新数据集的最佳属性约简子集.

  1. Feature selection and classification methodology for the detection of knee-joint disorders.

    Science.gov (United States)

    Nalband, Saif; Sundar, Aditya; Prince, A Amalin; Agarwal, Anita

    2016-04-01

    Vibroarthographic (VAG) signals emitted from the knee joint disorder provides an early diagnostic tool. The nonstationary and nonlinear nature of VAG signal makes an important aspect for feature extraction. In this work, we investigate VAG signals by proposing a wavelet based decomposition. The VAG signals are decomposed into sub-band signals of different frequencies. Nonlinear features such as recurrence quantification analysis (RQA), approximate entropy (ApEn) and sample entropy (SampEn) are extracted as features of VAG signal. A total of twenty-four features form a vector to characterize a VAG signal. Two feature selection (FS) techniques, apriori algorithm and genetic algorithm (GA) selects six and four features as the most significant features. Least square support vector machines (LS-SVM) and random forest are proposed as classifiers to evaluate the performance of FS techniques. Results indicate that the classification accuracy was more prominent with features selected from FS algorithms. Results convey that LS-SVM using the apriori algorithm gives the highest accuracy of 94.31% with false discovery rate (FDR) of 0.0892. The proposed work also provided better classification accuracy than those reported in the previous studies which gave an accuracy of 88%. This work can enhance the performance of existing technology for accurately distinguishing normal and abnormal VAG signals. And the proposed methodology could provide an effective non-invasive diagnostic tool for knee joint disorders.

  2. Ear Recognition Based on Gabor Features and KFDA

    Directory of Open Access Journals (Sweden)

    Li Yuan

    2014-01-01

    Full Text Available We propose an ear recognition system based on 2D ear images which includes three stages: ear enrollment, feature extraction, and ear recognition. Ear enrollment includes ear detection and ear normalization. The ear detection approach based on improved Adaboost algorithm detects the ear part under complex background using two steps: offline cascaded classifier training and online ear detection. Then Active Shape Model is applied to segment the ear part and normalize all the ear images to the same size. For its eminent characteristics in spatial local feature extraction and orientation selection, Gabor filter based ear feature extraction is presented in this paper. Kernel Fisher Discriminant Analysis (KFDA is then applied for dimension reduction of the high-dimensional Gabor features. Finally distance based classifier is applied for ear recognition. Experimental results of ear recognition on two datasets (USTB and UND datasets and the performance of the ear authentication system show the feasibility and effectiveness of the proposed approach.

  3. Ear recognition based on Gabor features and KFDA.

    Science.gov (United States)

    Yuan, Li; Mu, Zhichun

    2014-01-01

    We propose an ear recognition system based on 2D ear images which includes three stages: ear enrollment, feature extraction, and ear recognition. Ear enrollment includes ear detection and ear normalization. The ear detection approach based on improved Adaboost algorithm detects the ear part under complex background using two steps: offline cascaded classifier training and online ear detection. Then Active Shape Model is applied to segment the ear part and normalize all the ear images to the same size. For its eminent characteristics in spatial local feature extraction and orientation selection, Gabor filter based ear feature extraction is presented in this paper. Kernel Fisher Discriminant Analysis (KFDA) is then applied for dimension reduction of the high-dimensional Gabor features. Finally distance based classifier is applied for ear recognition. Experimental results of ear recognition on two datasets (USTB and UND datasets) and the performance of the ear authentication system show the feasibility and effectiveness of the proposed approach.

  4. Robust feature-based object tracking

    Science.gov (United States)

    Han, Bing; Roberts, William; Wu, Dapeng; Li, Jian

    2007-04-01

    Object tracking is an important component of many computer vision systems. It is widely used in video surveillance, robotics, 3D image reconstruction, medical imaging, and human computer interface. In this paper, we focus on unsupervised object tracking, i.e., without prior knowledge about the object to be tracked. To address this problem, we take a feature-based approach, i.e., using feature points (or landmark points) to represent objects. Feature-based object tracking consists of feature extraction and feature correspondence. Feature correspondence is particularly challenging since a feature point in one image may have many similar points in another image, resulting in ambiguity in feature correspondence. To resolve the ambiguity, algorithms, which use exhaustive search and correlation over a large neighborhood, have been proposed. However, these algorithms incur high computational complexity, which is not suitable for real-time tracking. In contrast, Tomasi and Kanade's tracking algorithm only searches corresponding points in a small candidate set, which significantly reduces computational complexity; but the algorithm may lose track of feature points in a long image sequence. To mitigate the limitations of the aforementioned algorithms, this paper proposes an efficient and robust feature-based tracking algorithm. The key idea of our algorithm is as below. For a given target feature point in one frame, we first find a corresponding point in the next frame, which minimizes the sum-of-squared-difference (SSD) between the two points; then we test whether the corresponding point gives large value under the so-called Harris criterion. If not, we further identify a candidate set of feature points in a small neighborhood of the target point; then find a corresponding point from the candidate set, which minimizes the SSD between the two points. The algorithm may output no corresponding point due to disappearance of the target point. Our algorithm is capable of tracking

  5. Texture feature based liver lesion classification

    Science.gov (United States)

    Doron, Yeela; Mayer-Wolf, Nitzan; Diamant, Idit; Greenspan, Hayit

    2014-03-01

    Liver lesion classification is a difficult clinical task. Computerized analysis can support clinical workflow by enabling more objective and reproducible evaluation. In this paper, we evaluate the contribution of several types of texture features for a computer-aided diagnostic (CAD) system which automatically classifies liver lesions from CT images. Based on the assumption that liver lesions of various classes differ in their texture characteristics, a variety of texture features were examined as lesion descriptors. Although texture features are often used for this task, there is currently a lack of detailed research focusing on the comparison across different texture features, or their combinations, on a given dataset. In this work we investigated the performance of Gray Level Co-occurrence Matrix (GLCM), Local Binary Patterns (LBP), Gabor, gray level intensity values and Gabor-based LBP (GLBP), where the features are obtained from a given lesion`s region of interest (ROI). For the classification module, SVM and KNN classifiers were examined. Using a single type of texture feature, best result of 91% accuracy, was obtained with Gabor filtering and SVM classification. Combination of Gabor, LBP and Intensity features improved the results to a final accuracy of 97%.

  6. A new ensemble feature selection and its application to pattern classification

    Institute of Scientific and Technical Information of China (English)

    Dongbo ZHANG; Yaonan WANG

    2009-01-01

    Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.

  7. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection.

    Science.gov (United States)

    Moomen, Abdelniser; Ali, Abdulbaset; Ramahi, Omar M

    2016-04-19

    Nondestructive Testing (NDT) assessment of materials' health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies) are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms.

  8. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection

    Directory of Open Access Journals (Sweden)

    Abdelniser Moomen

    2016-04-01

    Full Text Available Nondestructive Testing (NDT assessment of materials’ health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms.

  9. On the selection of optimal feature region set for robust digital image watermarking.

    Science.gov (United States)

    Tsai, Jen-Sheng; Huang, Win-Bin; Kuo, Yau-Hwang

    2011-03-01

    A novel feature region selection method for robust digital image watermarking is proposed in this paper. This method aims to select a nonoverlapping feature region set, which has the greatest robustness against various attacks and can preserve image quality as much as possible after watermarked. It first performs a simulated attacking procedure using some predefined attacks to evaluate the robustness of every candidate feature region. According to the evaluation results, it then adopts a track-with-pruning procedure to search a minimal primary feature set which can resist the most predefined attacks. In order to enhance its resistance to undefined attacks under the constraint of preserving image quality, the primary feature set is then extended by adding into some auxiliary feature regions. This work is formulated as a multidimensional knapsack problem and solved by a genetic algorithm based approach. The experimental results for StirMark attacks on some benchmark images support our expectation that the primary feature set can resist all the predefined attacks and its extension can enhance the robustness against undefined attacks. Comparing with some well-known feature-based methods, the proposed method exhibits better performance in robust digital watermarking.

  10. Motor Imagery Recognition Based on Support Vector Feature Selection Method%基于支持向量特征筛选方法的想象动作识别

    Institute of Scientific and Technical Information of China (English)

    綦宏志; 明东; 万柏坤; 任超世; 刘志朋; 殷涛

    2012-01-01

    引入了支持向量特征筛选方法,以克服基于想象动作诱发脑电特征的脑-机接口识别中,由于特征维度较高而训练数据有限、不易获得理想识别效果的问题.支持向量特征筛选方法采用扰动支持向量机代价函数的方法测量特征的分类贡献度,进而建立特征序贯指数,以递归方法进行特征排序和优化筛选.对14例受试者的左右上肢想象动作诱发脑电信号进行分析,提取6类246维特征,采用支持向量递归筛选方法进行特征优选,利用支持向量机对优选特征进行识别,结果显示,支持向量递归筛选得到的优选特征可显著提高识别正确率.研究表明,支持向量特征筛选可以降低无效特征干扰,提高分类器效率,适用于特征维度较高的脑-机接口任务识别.%This paper introduces a support vector feature selection method to improve the recognition of the motor imagery in brain-computer interface, in which it is usually hard to achieve a satisfactory result due to the massive feature dimension and the limited training data. Support vector feature selection measures the contribution of each feature to classification by disturbing the objective function of SVM. Then it constructs a feature ranking criteria and recursively ranks all features, and finally it selects the optimal feature group. Evoked potential induced by left versus right upper limb imaginary motor from 14 subjects is analyzed in this paper. Overall 246 features from 6 species are extracted and then optimized by support vector recursive feature selection. The classification result obtained by employing support vector machine shows that the optimized feature group improves accuracy significantly. This study indicates that the support vector feature selection method is capable of reducing the influence from redundant features and improving recognition efficiency, especially in the high feature dimension situation of brain-computer interface.

  11. Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

    Energy Technology Data Exchange (ETDEWEB)

    Balabin, Roman M., E-mail: balabin@org.chem.ethz.ch [Department of Chemistry and Applied Biosciences, ETH Zurich, 8093 Zurich (Switzerland); Smirnov, Sergey V. [Unimilk Joint Stock Co., 143421 Moscow Region (Russian Federation)

    2011-04-29

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm{sup -1}) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  12. Electronic image stabilization system based on global feature tracking

    Institute of Scientific and Technical Information of China (English)

    Zhu Juanjuan; Guo Baolong

    2008-01-01

    A new robust electronic image stabilization system is presented, which involves feature-point, tracking based global motion estimation and Kalman filtering based motion compensation. First, global motion is estimated from the local motions of selected feature points. Considering the local moving objects or the inevitable mismatch,the matching validation, based on the stable relative distance between the points set is proposed, thus maintaining high accuracy and robustness. Next, the global motion parameters are accumulated for correction by Kalman filter-ation. The experimental result illustrates that the proposed system is effective to stabilize translational, rotational,and zooming jitter and robust to local motions.

  13. Feature Selection, Flaring Size and Time-to-Flare Prediction Using Support Vector Regression, and Automated Prediction of Flaring Behavior Based on Spatio-Temporal Measures Using Hidden Markov Models

    Science.gov (United States)

    Al-Ghraibah, Amani

    Solar flares release stored magnetic energy in the form of radiation and can have significant detrimental effects on earth including damage to technological infrastructure. Recent work has considered methods to predict future flare activity on the basis of quantitative measures of the solar magnetic field. Accurate advanced warning of solar flare occurrence is an area of increasing concern and much research is ongoing in this area. Our previous work 111] utilized standard pattern recognition and classification techniques to determine (classify) whether a region is expected to flare within a predictive time window, using a Relevance Vector Machine (RVM) classification method. We extracted 38 features which describing the complexity of the photospheric magnetic field, the result classification metrics will provide the baseline against which we compare our new work. We find a true positive rate (TPR) of 0.8, true negative rate (TNR) of 0.7, and true skill score (TSS) of 0.49. This dissertation proposes three basic topics; the first topic is an extension to our previous work [111, where we consider a feature selection method to determine an appropriate feature subset with cross validation classification based on a histogram analysis of selected features. Classification using the top five features resulting from this analysis yield better classification accuracies across a large unbalanced dataset. In particular, the feature subsets provide better discrimination of the many regions that flare where we find a TPR of 0.85, a TNR of 0.65 sightly lower than our previous work, and a TSS of 0.5 which has an improvement comparing with our previous work. In the second topic, we study the prediction of solar flare size and time-to-flare using support vector regression (SVR). When we consider flaring regions only, we find an average error in estimating flare size of approximately half a GOES class. When we additionally consider non-flaring regions, we find an increased average

  14. Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors

    Directory of Open Access Journals (Sweden)

    Hong Zhao

    2013-01-01

    Full Text Available Feature selection is an essential process in data mining applications since it reduces a model’s complexity. However, feature selection with various types of costs is still a new research topic. In this paper, we study the cost-sensitive feature selection problem of numeric data with measurement errors. The major contributions of this paper are fourfold. First, a new data model is built to address test costs and misclassification costs as well as error boundaries. It is distinguished from the existing models mainly on the error boundaries. Second, a covering-based rough set model with normal distribution measurement errors is constructed. With this model, coverings are constructed from data rather than assigned by users. Third, a new cost-sensitive feature selection problem is defined on this model. It is more realistic than the existing feature selection problems. Fourth, both backtracking and heuristic algorithms are proposed to deal with the new problem. Experimental results show the efficiency of the pruning techniques for the backtracking algorithm and the effectiveness of the heuristic algorithm. This study is a step toward realistic applications of the cost-sensitive learning.

  15. Feature selection based on fractal dimension and multi-objective genetic algorithm%基于分形维数和多目标遗传算法的特征选择

    Institute of Scientific and Technical Information of China (English)

    吴曼; 张公让; 刘恒

    2015-01-01

    In text categorization system, the characteristics of the advantages and disadvantages often greatly affect the design of classifier and performance. A feature subset selection algorithm is presented based on fractal dimension and with elitist strategy of fast non-dominated sorting genetic algorithm. In the algorithm, fractal dimension is used as an evaluation mechanism and NSGA-II algorithm will regard feature subset selection problem as a multi-objective optimization prob-lem to deal with. In order to analyze the validity of the results, the SVM algorithm is utilized to test Fudan University Cor-pus. The experimental results show that this method has good performance, it can effectively remove the invalid character and improve classification accuracy.%在文本分类系统中,特征的优劣往往极大地影响着分类器的设计和性能。提出一种利用分形维数和带精英策略的非劣支配排序遗传算法进行特征选择的方法。在该方法中分形维数作为特征选择的一个评价机制,利用NSGA-II算法将特征子集选择问题视为多目标优化问题来处理。为了分析结果的有效性,利用SVM分类算法对复旦大学语料库进行测试。实验结果表明该方法具有较好的性能,它可以有效去除无效特征并提高分类准确性。

  16. Statistical feature extraction based iris recognition system

    Indian Academy of Sciences (India)

    ATUL BANSAL; RAVINDER AGARWAL; R K SHARMA

    2016-05-01

    Iris recognition systems have been proposed by numerous researchers using different feature extraction techniques for accurate and reliable biometric authentication. In this paper, a statistical feature extraction technique based on correlation between adjacent pixels has been proposed and implemented. Hamming distance based metric has been used for matching. Performance of the proposed iris recognition system (IRS) has been measured by recording false acceptance rate (FAR) and false rejection rate (FRR) at differentthresholds in the distance metric. System performance has been evaluated by computing statistical features along two directions, namely, radial direction of circular iris region and angular direction extending from pupil tosclera. Experiments have also been conducted to study the effect of number of statistical parameters on FAR and FRR. Results obtained from the experiments based on different set of statistical features of iris images show thatthere is a significant improvement in equal error rate (EER) when number of statistical parameters for feature extraction is increased from three to six. Further, it has also been found that increasing radial/angular resolution,with normalization in place, improves EER for proposed iris recognition system

  17. Intrinsic feature-based pose measurement for imaging motion compensation

    Science.gov (United States)

    Baba, Justin S.; Goddard, Jr., James Samuel

    2014-08-19

    Systems and methods for generating motion corrected tomographic images are provided. A method includes obtaining first images of a region of interest (ROI) to be imaged and associated with a first time, where the first images are associated with different positions and orientations with respect to the ROI. The method also includes defining an active region in the each of the first images and selecting intrinsic features in each of the first images based on the active region. Second, identifying a portion of the intrinsic features temporally and spatially matching intrinsic features in corresponding ones of second images of the ROI associated with a second time prior to the first time and computing three-dimensional (3D) coordinates for the portion of the intrinsic features. Finally, the method includes computing a relative pose for the first images based on the 3D coordinates.

  18. Multi Feature Content Based Image Retrieval

    Directory of Open Access Journals (Sweden)

    Rajshree S. Dubey,

    2010-09-01

    Full Text Available There are numbers of methods prevailing for Image Mining Techniques. This Paper includes the features of four techniques I,e Color Histogram, Color moment, Texture, and Edge Histogram Descriptor. The nature of the Image is basically based on the Human Perception of the Image. The Machine interpretation of the Image is based on the Contours and surfaces of the Images. The study of the Image Mining is a very challenging task because it involves the Pattern Recognition which is a very important tool for the Machine Vision system. A combination of four feature extraction methods namely color istogram, Color Moment, texture, and Edge Histogram Descriptor. There is a provision to add new features in future for better retrievalefficiency. In this paper the combination of the four techniques are used and the Euclidian distances are calculated of the every features are added and the averages are made .The user interface is provided by the Mat lab. The image properties analyzed in this work are by using computer vision and image processing algorithms. For colorthe histogram of images are computed, for texture co occurrence matrix based entropy, energy, etc, are calculated and for edge density it is Edge Histogram Descriptor (EHD that is found. For retrieval of images, the averages of the four techniques are made and the resultant Image is retrieved.

  19. Spatiotemporal Features for Asynchronous Event-based Data

    Directory of Open Access Journals (Sweden)

    Xavier eLagorce

    2015-02-01

    Full Text Available Bio-inspired asynchronous event-based vision sensors are currently introducing a paradigm shift in visual information processing. These new sensors rely on a stimulus-driven principle of light acquisition similar to biological retinas. They are event-driven and fully asynchronous, thereby reducing redundancy and encoding exact times of input signal changes, leading to a very precise temporal resolution. Approaches for higher-level computer vision often rely on the realiable detection of features in visual frames, but similar definitions of features for the novel dynamic and event-based visual input representation of silicon retinas have so far been lacking. This article addresses the problem of learning and recognizing features for event-based vision sensors, which capture properties of truly spatiotemporal volumes of sparse visual event information. A novel computational architecture for learning and encoding spatiotemporal features is introduced based on a set of predictive recurrent reservoir networks, competing via winner-take-all selection. Features are learned in an unsupervised manner from real-world input recorded with event-based vision sensors. It is shown that the networks in the architecture learn distinct and task-specific dynamic visual features, and can predict their trajectories over time.

  20. Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection

    Institute of Scientific and Technical Information of China (English)

    Fatemeh Azmandian; Ayse Yilmazer; Jennifer G Dy; Javed A Aslam; David R Kaeli

    2014-01-01

    Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.

  1. A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks

    Data.gov (United States)

    National Aeronautics and Space Administration — In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used...

  2. Early Visual Cortex Dynamics during Top-Down Modulated Shifts of Feature-Selective Attention.

    Science.gov (United States)

    Müller, Matthias M; Trautmann, Mireille; Keitel, Christian

    2016-04-01

    Shifting attention from one color to another color or from color to another feature dimension such as shape or orientation is imperative when searching for a certain object in a cluttered scene. Most attention models that emphasize feature-based selection implicitly assume that all shifts in feature-selective attention underlie identical temporal dynamics. Here, we recorded time courses of behavioral data and steady-state visual evoked potentials (SSVEPs), an objective electrophysiological measure of neural dynamics in early visual cortex to investigate temporal dynamics when participants shifted attention from color or orientation toward color or orientation, respectively. SSVEPs were elicited by four random dot kinematograms that flickered at different frequencies. Each random dot kinematogram was composed of dashes that uniquely combined two features from the dimensions color (red or blue) and orientation (slash or backslash). Participants were cued to attend to one feature (such as color or orientation) and respond to coherent motion targets of the to-be-attended feature. We found that shifts toward color occurred earlier after the shifting cue compared with shifts toward orientation, regardless of the original feature (i.e., color or orientation). This was paralleled in SSVEP amplitude modulations as well as in the time course of behavioral data. Overall, our results suggest different neural dynamics during shifts of attention from color and orientation and the respective shifting destinations, namely, either toward color or toward orientation.

  3. Partial fingerprint matching based on SIFT Features

    Directory of Open Access Journals (Sweden)

    Ms. S.Malathi,

    2010-07-01

    Full Text Available Fingerprints are being extensively used for person identification in a number of commercial, civil, and forensic applications. The current Fingerprint matching technology is quite mature for matching full prints, matching partial fingerprints still needs lots of improvement. Most of the current fingerprint identification systems utilize features that are based on minutiae points and ridge patterns. The major challenges faced in partial fingerprint matching are the absence of sufficient minutiae features and other structures such as core and delta. However, this technology suffers from the problem of handling incomplete prints and often discards any partial fingerprints obtained. Recent research has begun to delve into the problems of latent or partial fingerprints. In this paper we present a novel approach for partial fingerprint matching scheme based on SIFT(Scale Invariant Feature Transform features and matching is achieved using a modified point matching process. Using Neurotechnology database, we demonstrate that the proposed method exhibits an improved performance when matching full print against partial print.

  4. Selecting Testlet Features With Predictive Value for the Testlet Effect

    Directory of Open Access Journals (Sweden)

    Muirne C. S. Paap

    2015-04-01

    Full Text Available High-stakes tests often consist of sets of questions (i.e., items grouped around a common stimulus. Such groupings of items are often called testlets. A basic assumption of item response theory (IRT, the mathematical model commonly used in the analysis of test data, is that individual items are independent of one another. The potential dependency among items within a testlet is often ignored in practice. In this study, a technique called tree-based regression (TBR was applied to identify key features of stimuli that could properly predict the dependence structure of testlet data for the Analytical Reasoning section of a high-stakes test. Relevant features identified included Percentage of “If” Clauses, Number of Entities, Theme/Topic, and Predicate Propositional Density; the testlet effect was smallest for stimuli that contained 31% or fewer “if” clauses, contained 9.8% or fewer verbs, and had Media or Animals as the main theme. This study illustrates the merits of TBR in the analysis of test data.

  5. BUILDING ROBUST APPEARANCE MODELS USING ON-LINE FEATURE SELECTION

    Energy Technology Data Exchange (ETDEWEB)

    PORTER, REID B. [Los Alamos National Laboratory; LOVELAND, ROHAN [Los Alamos National Laboratory; ROSTEN, ED [Los Alamos National Laboratory

    2007-01-29

    In many tracking applications, adapting the target appearance model over time can improve performance. This approach is most popular in high frame rate video applications where latent variables, related to the objects appearance (e.g., orientation and pose), vary slowly from one frame to the next. In these cases the appearance model and the tracking system are tightly integrated, and latent variables are often included as part of the tracking system's dynamic model. In this paper we describe our efforts to track cars in low frame rate data (1 frame/second) acquired from a highly unstable airborne platform. Due to the low frame rate, and poor image quality, the appearance of a particular vehicle varies greatly from one frame to the next. This leads us to a different problem: how can we build the best appearance model from all instances of a vehicle we have seen so far. The best appearance model should maximize the future performance of the tracking system, and maximize the chances of reacquiring the vehicle once it leaves the field of view. We propose an online feature selection approach to this problem and investigate the performance and computational trade-offs with a real-world dataset.

  6. Gender Recognition Based on Sift Features

    CERN Document Server

    Yousefi, Sahar

    2011-01-01

    This paper proposes a robust approach for face detection and gender classification in color images. Previous researches about gender recognition suppose an expensive computational and time-consuming pre-processing step in order to alignment in which face images are aligned so that facial landmarks like eyes, nose, lips, chin are placed in uniform locations in image. In this paper, a novel technique based on mathematical analysis is represented in three stages that eliminates alignment step. First, a new color based face detection method is represented with a better result and more robustness in complex backgrounds. Next, the features which are invariant to affine transformations are extracted from each face using scale invariant feature transform (SIFT) method. To evaluate the performance of the proposed algorithm, experiments have been conducted by employing a SVM classifier on a database of face images which contains 500 images from distinct people with equal ratio of male and female.

  7. Convolutional neural network features based change detection in satellite images

    Science.gov (United States)

    Mohammed El Amin, Arabi; Liu, Qingjie; Wang, Yunhong

    2016-07-01

    With the popular use of high resolution remote sensing (HRRS) satellite images, a huge research efforts have been placed on change detection (CD) problem. An effective feature selection method can significantly boost the final result. While hand-designed features have proven difficulties to design features that effectively capture high and mid-level representations, the recent developments in machine learning (Deep Learning) omit this problem by learning hierarchical representation in an unsupervised manner directly from data without human intervention. In this letter, we propose approaching the change detection problem from a feature learning perspective. A novel deep Convolutional Neural Networks (CNN) features based HR satellite images change detection method is proposed. The main guideline is to produce a change detection map directly from two images using a pretrained CNN. This method can omit the limited performance of hand-crafted features. Firstly, CNN features are extracted through different convolutional layers. Then, a concatenation step is evaluated after an normalization step, resulting in a unique higher dimensional feature map. Finally, a change map was computed using pixel-wise Euclidean distance. Our method has been validated on real bitemporal HRRS satellite images according to qualitative and quantitative analyses. The results obtained confirm the interest of the proposed method.

  8. Estimating stellar atmospheric parameters based on Lasso features

    Science.gov (United States)

    Liu, Chuan-Xing; Zhang, Pei-Ai; Lu, Yu

    2014-04-01

    With the rapid development of large scale sky surveys like the Sloan Digital Sky Survey (SDSS), GAIA and LAMOST (Guoshoujing telescope), stellar spectra can be obtained on an ever-increasing scale. Therefore, it is necessary to estimate stellar atmospheric parameters such as Teff, log g and [Fe/H] automatically to achieve the scientific goals and make full use of the potential value of these observations. Feature selection plays a key role in the automatic measurement of atmospheric parameters. We propose to use the least absolute shrinkage selection operator (Lasso) algorithm to select features from stellar spectra. Feature selection can reduce redundancy in spectra, alleviate the influence of noise, improve calculation speed and enhance the robustness of the estimation system. Based on the extracted features, stellar atmospheric parameters are estimated by the support vector regression model. Three typical schemes are evaluated on spectral data from both the ELODIE library and SDSS. Experimental results show the potential performance to a certain degree. In addition, results show that our method is stable when applied to different spectra.

  9. Dermoscopy analysis of RGB-images based on comparative features

    Science.gov (United States)

    Myakinin, Oleg O.; Zakharov, Valery P.; Bratchenko, Ivan A.; Artemyev, Dmitry N.; Neretin, Evgeny Y.; Kozlov, Sergey V.

    2015-09-01

    In this paper, we propose an algorithm for color and texture analysis for dermoscopic images of human skin based on Haar wavelets, Local Binary Patterns (LBP) and Histogram Analysis. This approach is a modification of «7-point checklist» clinical method. Thus, that is an "absolute" diagnostic method because one is using only features extracted from tumor's ROI (Region of Interest), which can be selected manually and/or using a special algorithm. We propose additional features extracted from the same image for comparative analysis of tumor and healthy skin. We used Euclidean distance, Cosine similarity, and Tanimoto coefficient as comparison metrics between color and texture features extracted from tumor's and healthy skin's ROI separately. A classifier for separating melanoma images from other tumors has been built by SVM (Support Vector Machine) algorithm. Classification's errors with and without comparative features between skin and tumor have been analyzed. Significant increase of recognition quality with comparative features has been demonstrated. Moreover, we analyzed two modes (manual and automatic) for ROI selecting on tumor and healthy skin areas. We have reached 91% of sensitivity using comparative features in contrast with 77% of sensitivity using the only "absolute" method. The specificity was the invariable (94%) in both cases.

  10. FEATURE EXTRACTION FOR EMG BASED PROSTHESES CONTROL

    Directory of Open Access Journals (Sweden)

    R. Aishwarya

    2013-01-01

    Full Text Available The control of prosthetic limb would be more effective if it is based on Surface Electromyogram (SEMG signals from remnant muscles. The analysis of SEMG signals depend on a number of factors, such as amplitude as well as time- and frequency-domain properties. Time series analysis using Auto Regressive (AR model and Mean frequency which is tolerant to white Gaussian noise are used as feature extraction techniques. EMG Histogram is used as another feature vector that was seen to give more distinct classification. The work was done with SEMG dataset obtained from the NINAPRO DATABASE, a resource for bio robotics community. Eight classes of hand movements hand open, hand close, Wrist extension, Wrist flexion, Pointing index, Ulnar deviation, Thumbs up, Thumb opposite to little finger are taken into consideration and feature vectors are extracted. The feature vectors can be given to an artificial neural network for further classification in controlling the prosthetic arm which is not dealt in this paper.

  11. Logistic回归筛选零值绝缘子红外特征的方法%Feature selection method of zero resistance insulator infrared thermal image based on Logistic regression analysis

    Institute of Scientific and Technical Information of China (English)

    张彦; 李庆峰; 龚磊; 姚建刚

    2013-01-01

    零值绝缘子是造成输配电网络对地短路故障的重要原因之一.利用红外热像技术检测零值绝缘子的关键在于获取最优且适当的红外热像特征.提出了一种Logistic回归分析筛选零值绝缘子红外热像特征的方法.利用中值滤波及小波自适应扩散法进行红外热像去噪和灰度拉伸法增强图像对比度;采用二维最大熵阈值分割形成二值图,经过二值图像填充处理,自动截取绝缘子串区域最小外接矩形,得到绝缘子串区域矩形图像,通过灰度-梯度共生矩阵提取矩形区域13个纹理特征参数;应用Logistic回归分析对污秽等级和纹理特征组成的14个特征参数进行筛选,得出其中7个参数对分类结果有显著性影响.实验表明,该方法实现简单,能有效筛选特征参数,剔除冗余数据.%The zero resistance insulator is one of the important reasons causing grounded fault of transmission and distribution network. The key of detecting zero resistance insulators by infrared thermal imaging is to obtain the optimal and appropriate infrared thermal image features. An infrared thermal image feature selection method based on Logistic regression analysis is proposed. The infrared thermal image is denoised with median filtering and wavelet adaptive diffusion, and the contrast of the image is enhanced by means of gray stretching. The image is segmented to binary image with two-dimensional maximum entropy threshold. After binary image filling, the minimum enclosing rectangle around the disc surface of insulator string is intercepted automatically to obtain rectangle image of the insulator string area, through gray-gradient co-occurrence matrix, 13 texture features of rectangle area are extracted, and 14 characteristic parameters consisting of pollution class and texture features are screened by Logistic regression analysis, with 7 parameters among them having significant influence on the classification result. The experiments

  12. Feature selection and multi-kernel learning for sparse representation on a manifold.

    Science.gov (United States)

    Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao et al. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods.

  13. Research into a Feature Selection Method for Hyperspectral Imagery Using PSO and SVM

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Classification and recognition of hyperspectral remote sensing images is not the same as that of conventional multi-spectral remote sensing images.We propose, a novel feature selection and classification method for hyperspectral images by combining the global optimization ability of particle swarm optimization (PSO) algorithm and the superior classification performance of a support vector machine (SVM).Global optimal search performance of PSO is improved by using a chaotic optimization search technique.Granularity based grid search strategy is used to optimize the SVM model parameters.Parameter optimization and classification of the SVM are addressed using the training date corresponding to the feature subset.A false classification rate is adopted as a fitness function.Tests of feature selection and classification are carried out on a hyperspectral data set.Classification performances are also compared among different feature extraction methods commonly used today.Results indicate that this hybrid method has a higher classification accuracy and can effectively extract optimal bands.A feasible approach is provided for feature selection and classification of hyperspectral image data.

  14. Feature selection and multi-kernel learning for adaptive graph regularized nonnegative matrix factorization

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-09-20

    Nonnegative matrix factorization (NMF), a popular part-based representation technique, does not capture the intrinsic local geometric structure of the data space. Graph regularized NMF (GNMF) was recently proposed to avoid this limitation by regularizing NMF with a nearest neighbor graph constructed from the input data set. However, GNMF has two main bottlenecks. First, using the original feature space directly to construct the graph is not necessarily optimal because of the noisy and irrelevant features and nonlinear distributions of data samples. Second, one possible way to handle the nonlinear distribution of data samples is by kernel embedding. However, it is often difficult to choose the most suitable kernel. To solve these bottlenecks, we propose two novel graph-regularized NMF methods, AGNMFFS and AGNMFMK, by introducing feature selection and multiple-kernel learning to the graph regularized NMF, respectively. Instead of using a fixed graph as in GNMF, the two proposed methods learn the nearest neighbor graph that is adaptive to the selected features and learned multiple kernels, respectively. For each method, we propose a unified objective function to conduct feature selection/multi-kernel learning, NMF and adaptive graph regularization simultaneously. We further develop two iterative algorithms to solve the two optimization problems. Experimental results on two challenging pattern classification tasks demonstrate that the proposed methods significantly outperform state-of-the-art data representation methods.

  15. Feature selection and multi-kernel learning for sparse representation on a manifold

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao etal. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods. © 2013 Elsevier Ltd.

  16. On the use of feature selection to improve the detection of sea oil spills in SAR images

    Science.gov (United States)

    Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo

    2017-03-01

    Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to

  17. Feature Selection in Detection of Adverse Drug Reactions from the Health Improvement Network (THIN Database

    Directory of Open Access Journals (Sweden)

    Yihui Liu

    2015-02-01

    Full Text Available Adverse drug reaction (ADR is widely concerned for public health issue. ADRs are one of most common causes to withdraw some drugs from market. Prescription event monitoring (PEM is an important approach to detect the adverse drug reactions. The main problem to deal with this method is how to automatically extract the medical events or side effects from high-throughput medical events, which are collected from day to day clinical practice. In this study we propose a novel concept of feature matrix to detect the ADRs. Feature matrix, which is extracted from big medical data from The Health Improvement Network (THIN database, is created to characterize the medical events for the patients who take drugs. Feature matrix builds the foundation for the irregular and big medical data. Then feature selection methods are performed on feature matrix to detect the significant features. Finally the ADRs can be located based on the significant features. The experiments are carried out on three drugs: Atorvastatin, Alendronate, and Metoclopramide. Major side effects for each drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on computerized methods, further investigation is needed.

  18. Cuckoo search optimisation for feature selection in cancer classification: a new approach.

    Science.gov (United States)

    Gunavathi, C; Premalatha, K

    2015-01-01

    Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.

  19. Feature Learning Based Random Walk for Liver Segmentation

    Science.gov (United States)

    Zheng, Yongchang; Ai, Danni; Zhang, Pan; Gao, Yefei; Xia, Likun; Du, Shunda; Sang, Xinting; Yang, Jian

    2016-01-01

    Liver segmentation is a significant processing technique for computer-assisted diagnosis. This method has attracted considerable attention and achieved effective result. However, liver segmentation using computed tomography (CT) images remains a challenging task because of the low contrast between the liver and adjacent organs. This paper proposes a feature-learning-based random walk method for liver segmentation using CT images. Four texture features were extracted and then classified to determine the classification probability corresponding to the test images. Seed points on the original test image were automatically selected and further used in the random walk (RW) algorithm to achieve comparable results to previous segmentation methods. PMID:27846217

  20. Selective Remote Sensing Image Fusion Method Based on the Local Feature of Contourlet Coefficients%基于Contourlet系数局部特征的选择性遥感图像融合算法

    Institute of Scientific and Technical Information of China (English)

    朱康; 贺新光

    2012-01-01

    为了使融合后的多光谱图像在显著提高空间分辨率的同时,尽可能多地保持原始多光谱特性,提出了一种基于Contourlet变换系数局部特征的选择性遥感图像融合方法.根据多光谱和全色图像融合过程中Contourlet变换后的低频和高频部分融合目的的不同,对得到的近似和各层各方向的细节分量分别运用窗口邻域移动模板逐一计算相应区域Contourlet系数阵的不同局部特征量,然后选择适当的准则,对图像的近似和细节分量分别应用不同的策略在Contourlet系数域内进行选择性融合,通过Contourlet和亮度-色调-饱和度(IHS)逆变换得到融合的高分辨率多光谱图像.采用Landsat TM多光谱和SPOT全色图像进行的融合实验结果表明:提出的算法在显著提高空间分辨率的同时,又能很好地保持原始图像的光谱特征,并优于传统的融合方法.%In order to remarkably improve the spatial resolution of the fused multispectral images and preserve the original multispectral characteristics as much as possible, a selective remote sensing image fusion method is proposed based on the local feature of contourlet coefficients. Firstly, a window neighborhood mobile template is used to calculate the different local features of corresponding contourlet coefficient matrix one by one for the approximate components and the detail components of each direction of each layer resulting from contourlet transform according to the different fusion purposes of low and high frequency parts in the fusion process of multi-spectral and panchromatic images. Then the approximate images and detail images are fused selectively in contourlet coefficients domain by applying different fusion rules based on proper criterion. The resultant image with high resolution and multi-spectral characteristics is obtained by inverse contourlet transform and inverse intensity-hue-saturation (HIS) transform. Landsat TM and SPOT images are used to

  1. Feature Selection for Better Identification of Subtypes of Guillain-Barré Syndrome

    Directory of Open Access Journals (Sweden)

    José Hernández-Torruco

    2014-01-01

    Full Text Available Guillain-Barré syndrome (GBS is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS, chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.

  2. Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier.

    Science.gov (United States)

    Paul, Desbordes; Su, Ruan; Romain, Modzelewski; Sébastien, Vauclin; Pierre, Vera; Isabelle, Gardin

    2016-12-28

    The outcome prediction of patients can greatly help to personalize cancer treatment. A large amount of quantitative features (clinical exams, imaging, …) are potentially useful to assess the patient outcome. The challenge is to choose the most predictive subset of features. In this paper, we propose a new feature selection strategy called GARF (genetic algorithm based on random forest) extracted from positron emission tomography (PET) images and clinical data. The most relevant features, predictive of the therapeutic response or which are prognoses of the patient survival 3 years after the end of treatment, were selected using GARF on a cohort of 65 patients with a local advanced oesophageal cancer eligible for chemo-radiation therapy. The most relevant predictive results were obtained with a subset of 9 features leading to a random forest misclassification rate of 18±4% and an areas under the of receiver operating characteristic (ROC) curves (AUC) of 0.823±0.032. The most relevant prognostic results were obtained with 8 features leading to an error rate of 20±7% and an AUC of 0.750±0.108. Both predictive and prognostic results show better performances using GARF than using 4 other studied methods.

  3. Feature Subset Selection for Hot Method Prediction using Genetic Algorithm wrapped with Support Vector Machines

    Directory of Open Access Journals (Sweden)

    S. Johnson

    2011-01-01

    Full Text Available Problem statement: All compilers have simple profiling-based heuristics to identify and predict program hot methods and also to make optimization decisions. The major challenge in the profile-based optimization is addressing the problem of overhead. The aim of this work is to perform feature subset selection using Genetic Algorithms (GA to improve and refine the machine learnt static hot method predictive technique and to compare the performance of the new models against the simple heuristics. Approach: The relevant features for training the predictive models are extracted from an initial set of randomly selected ninety static program features, with the help of the GA wrapped with the predictive model using the Support Vector Machine (SVM, a Machine Learning (ML algorithm. Results: The GA-generated feature subsets containing thirty and twenty nine features respectively for the two predictive models when tested on MiBench predict Long Running Hot Methods (LRHM and frequently called hot methods (FCHM with the respective accuracies of 71% and 80% achieving an increase of 19% and 22%. Further, inlining of the predicted LRHM and FCHM improve the program performance by 3% and 5% as against 4% and 6% with Low Level Virtual Machines (LLVM default heuristics. When intra-procedural optimizations (IPO are performed on the predicted hot methods, this system offers a performance improvement of 5% and 4% as against 0% and 3% by LLVM default heuristics on LRHM and FCHM respectively. However, we observe an improvement of 36% in certain individual programs. Conclusion: Overall, the results indicate that the GA wrapped with SVM derived feature reduction improves the hot method prediction accuracy and that the technique of hot method prediction based optimization is potentially useful in selective optimization.

  4. Feature selection and classification of multiparametric medical images using bagging and SVM

    Science.gov (United States)

    Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

    2008-03-01

    This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.

  5. Highly accurate SVM model with automatic feature selection for word sense disambiguation

    Institute of Scientific and Technical Information of China (English)

    王浩; 陈贵林; 吴连献

    2004-01-01

    A novel algorithm for word sense disambiguation(WSD) that is based on SVM model improved with automatic feature selection is introduced. This learning method employs rich contextual features to predict the proper senses for specific words. Experimental results show that this algorithm can achieve an execellent performance on the set of data released during the SENSEEVAL-2 competition. We present the results obtained and discuss the transplantation of this algorithm to other languages such as Chinese. Experimental results on Chinese corpus show that our algorithm achieves an accuracy of 70.0 % even with small training data.

  6. 采用多特征融合的自动适配区选择方法%Automatic suitable-matching area selection method based on multi-feature fusion

    Institute of Scientific and Technical Information of China (English)

    罗海波; 常铮; 余新荣; 丁庆海

    2011-01-01

    Target tracking with local non-texture is a difficult point and hot topic in the field of ground imaging guidance. Since the automatic suitable-matching area selection is an effective method to solve this problem, an algorithm of automatic suitable-matching area selection based on multi-feature fusion was proposed. Firstly, the edge density, the average edge strength, the edge direction dispersion degree andthe space distance were integrated to form a suitable-matching measure function. Then, the credibility of suitable-matching of each point in the image was calculated by this function. Lastly, through developing adaptive selection strategy to the suitable-matching area, three suitable-matching areas with high credibility were segmented as target template for matching tracking. Experimental results show that the segmented suitable-matching area with proposed algorithm can achieve more tracking precision compared with the results judged by the human experience. This proposed algorithm can be widely used in the applications of the ground imaging-guided target tracking with local non-texture target and the scene matching task planning.%局部无纹理目标跟踪是当今空地成像制导领域的一个难点和热点问题,而自动适配区选择是解决该难题的一种有效方法.介绍了一种基于多特征融合的自动适配区选择方法.首先,构造一个融合边缘密度、平均边缘强度、边缘方向离散度以及空间距离的适配性度量函数;然后,采用该函数计算图像中每一点的适配置信度;通过制定适当的适配区选择策略,分割出3个置信度相对较高的适配区,用作匹配跟踪的目标模板.实验结果表明,采用该方法分割出的适配区与通过人工经验判断的结果相近,获得了较好的结果.该方法可广泛用于空地成像制导的局部无纹理目标跟踪以及景象匹配任务规划等应用中.

  7. Feature Selection for Bayesian Evaluation of Trauma Death Risk

    CERN Document Server

    Jakaite, L

    2008-01-01

    In the last year more than 70,000 people have been brought to the UK hospitals with serious injuries. Each time a clinician has to urgently take a patient through a screening procedure to make a reliable decision on the trauma treatment. Typically, such procedure comprises around 20 tests; however the condition of a trauma patient remains very difficult to be tested properly. What happens if these tests are ambiguously interpreted, and information about the severity of the injury will come misleading? The mistake in a decision can be fatal: using a mild treatment can put a patient at risk of dying from posttraumatic shock, while using an overtreatment can also cause death. How can we reduce the risk of the death caused by unreliable decisions? It has been shown that probabilistic reasoning, based on the Bayesian methodology of averaging over decision models, allows clinicians to evaluate the uncertainty in decision making. Based on this methodology, in this paper we aim at selecting the most important screeni...

  8. Localization of neural efficiency of the mathematically gifted brain through a feature subset selection method.

    Science.gov (United States)

    Zhang, Li; Gan, John Q; Wang, Haixian

    2015-10-01

    Based on the neural efficiency hypothesis and task-induced EEG gamma-band response (GBR), this study investigated the brain regions where neural resource could be most efficiently recruited by the math-gifted adolescents in response to varying cognitive demands. In this experiment, various GBR-based mental states were generated with three factors (level of mathematical ability, task complexity, and short-term learning) modulating the level of neural activation. A feature subset selection method based on the sequential forward floating search algorithm was used to identify an "optimal" combination of EEG channel locations, where the corresponding GBR feature subset could obtain the highest accuracy in discriminating pairwise mental states influenced by each experiment factor. The integrative results from multi-factor selections suggest that the right-lateral fronto-parietal system is highly involved in neural efficiency of the math-gifted brain, primarily including the bilateral superior frontal, right inferior frontal, right-lateral central and right temporal regions. By means of the localization method based on single-trial classification of mental states, new GBR features and EEG channel-based brain regions related to mathematical giftedness were identified, which could be useful for the brain function improvement of children/adolescents in mathematical learning through brain-computer interface systems.

  9. EMG feature assessment for myoelectric pattern recognition and channel selection: a study with incomplete spinal cord injury.

    Science.gov (United States)

    Liu, Jie; Li, Xiaoyan; Li, Guanglin; Zhou, Ping

    2014-07-01

    Myoelectric pattern recognition with a large number of electromyogram (EMG) channels provides an approach to assessing motor control information available from the recorded muscles. In order to develop a practical myoelectric control system, a feature dependent channel reduction method was developed in this study to determine a small number of EMG channels for myoelectric pattern recognition analysis. The method selects appropriate raw EMG features for classification of different movements, using the minimum Redundancy Maximum Relevance (mRMR) and the Markov random field (MRF) methods to rank a large number of EMG features, respectively. A k-nearest neighbor (KNN) classifier was used to evaluate the performance of the selected features in terms of classification accuracy. The method was tested using 57 channels' surface EMG signals recorded from forearm and hand muscles of individuals with incomplete spinal cord injury (SCI). Our results demonstrate that appropriate selection of a small number of raw EMG features from different recording channels resulted in similar high classification accuracies as achieved by using all the EMG channels or features. Compared with the conventional sequential forward selection (SFS) method, the feature dependent method does not require repeated classifier implementation. It can effectively reduce redundant information not only cross different channels, but also cross different features in the same channel. Such hybrid feature-channel selection from a large number of EMG recording channels can reduce computational cost for implementation of a myoelectric pattern recognition based control system.

  10. On a Variational Model for Selective Image Segmentation of Features with Infinite Perimeter

    Institute of Scientific and Technical Information of China (English)

    Lavdie RADA; Ke CHEN

    2013-01-01

    Variational models provide reliable formulation for segmentation of features and their boundaries in an image,following the seminal work of Mumford-Shah (1989,Commun.Pure Appl.Math.) on dividing a general surface into piecewise smooth sub-surfaces.A central idea of models based on this work is to minimize the length of feature's boundaries (i.e.,(H)1 Hausdorff measure).However there exist problems with irregular and oscillatory object boundaries,where minimizing such a length is not appropriate,as noted by Barchiesi et al.(2010,SIAM J.Multiscale Model.Simu.) who proposed to miminize (L)2 Lebesgue measure of the γ-neighborhood of the boundaries.This paper presents a dual level set selective segmentation model based on Barchiesi et al.(2010) to automatically select a local feature instead of all global features.Our model uses two level set functions:a global level set which segments all boundaries,and the local level set which evolves and finds the boundary of the object closest to the geometric constraints.Using real life images with oscillatory boundaries,we show qualitative results demonstrating the effectiveness of the proposed method.

  11. Joint feature-sample selection and robust diagnosis of Parkinson's disease from MRI data.

    Science.gov (United States)

    Adeli, Ehsan; Shi, Feng; An, Le; Wee, Chong-Yaw; Wu, Guorong; Wang, Tao; Shen, Dinggang

    2016-11-01

    Parkinson's disease (PD) is an overwhelming neurodegenerative disorder caused by deterioration of a neurotransmitter, known as dopamine. Lack of this chemical messenger impairs several brain regions and yields various motor and non-motor symptoms. Incidence of PD is predicted to double in the next two decades, which urges more research to focus on its early diagnosis and treatment. In this paper, we propose an approach to diagnose PD using magnetic resonance imaging (MRI) data. Specifically, we first introduce a joint feature-sample selection (JFSS) method for selecting an optimal subset of samples and features, to learn a reliable diagnosis model. The proposed JFSS model effectively discards poor samples and irrelevant features. As a result, the selected features play an important role in PD characterization, which will help identify the most relevant and critical imaging biomarkers for PD. Then, a robust classification framework is proposed to simultaneously de-noise the selected subset of features and samples, and learn a classification model. Our model can also de-noise testing samples based on the cleaned training data. Unlike many previous works that perform de-noising in an unsupervised manner, we perform supervised de-noising for both training and testing data, thus boosting the diagnostic accuracy. Experimental results on both synthetic and publicly available PD datasets show promising results. To evaluate the proposed method, we use the popular Parkinson's progression markers initiative (PPMI) database. Our results indicate that the proposed method can differentiate between PD and normal control (NC), and outperforms the competing methods by a relatively large margin. It is noteworthy to mention that our proposed framework can also be used for diagnosis of other brain disorders. To show this, we have also conducted experiments on the widely-used ADNI database. The obtained results indicate that our proposed method can identify the imaging biomarkers and

  12. Feature selection by merging sequential bidirectional search into relevance vector machine in condition monitoring

    Science.gov (United States)

    Zhang, Kui; Dong, Yu; Ball, Andrew

    2015-11-01

    For more accurate fault detection and diagnosis, there is an increasing trend to use a large number of sensors and to collect data at high frequency. This inevitably produces large-scale data and causes difficulties in fault classification. Actually, the classification methods are simply intractable when applied to high-dimensional condition monitoring data. In order to solve the problem, engineers have to resort to complicated feature extraction methods to reduce the dimensionality of data. However, the features transformed by the methods cannot be understood by the engineers due to a loss of the original engineering meaning. In this paper, other forms of dimensionality reduction technique(feature selection methods) are employed to identify machinery condition, based only on frequency spectrum data. Feature selection methods are usually divided into three main types: filter, wrapper and embedded methods. Most studies are mainly focused on the first two types, whilst the development and application of the embedded feature selection methods are very limited. This paper attempts to explore a novel embedded method. The method is formed by merging a sequential bidirectional search algorithm into scale parameters tuning within a kernel function in the relevance vector machine. To demonstrate the potential for applying the method to machinery fault diagnosis, the method is implemented to rolling bearing experimental data. The results obtained by using the method are consistent with the theoretical interpretation, proving that this algorithm has important engineering significance in revealing the correlation between the faults and relevant frequency features. The proposed method is a theoretical extension of relevance vector machine, and provides an effective solution to detect the fault-related frequency components with high efficiency.

  13. Feature Selection by Merging Sequential Bidirectional Search into Relevance Vector Machine in Condition Monitoring

    Institute of Scientific and Technical Information of China (English)

    ZHANG Kui; DONG Yu; BALL Andrew

    2015-01-01

    For more accurate fault detection and diagnosis, there is an increasing trend to use a large number of sensors and to collect data at high frequency. This inevitably produces large-scale data and causes difficulties in fault classification. Actually, the classification methods are simply intractable when applied to high-dimensional condition monitoring data. In order to solve the problem, engineers have to resort to complicated feature extraction methods to reduce the dimensionality of data. However, the features transformed by the methods cannot be understood by the engineers due to a loss of the original engineering meaning. In this paper, other forms of dimensionality reduction technique(feature selection methods) are employed to identify machinery condition, based only on frequency spectrum data. Feature selection methods are usually divided into three main types: filter, wrapper and embedded methods. Most studies are mainly focused on the first two types, whilst the development and application of the embedded feature selection methods are very limited. This paper attempts to explore a novel embedded method. The method is formed by merging a sequential bidirectional search algorithm into scale parameters tuning within a kernel function in the relevance vector machine. To demonstrate the potential for applying the method to machinery fault diagnosis, the method is implemented to rolling bearing experimental data. The results obtained by using the method are consistent with the theoretical interpretation, proving that this algorithm has important engineering significance in revealing the correlation between the faults and relevant frequency features. The proposed method is a theoretical extension of relevance vector machine, and provides an effective solution to detect the fault-related frequency components with high efficiency.

  14. Image Recommendation Algorithm Using Feature-Based Collaborative Filtering

    Science.gov (United States)

    Kim, Deok-Hwan

    As the multimedia contents market continues its rapid expansion, the amount of image contents used in mobile phone services, digital libraries, and catalog service is increasing remarkably. In spite of this rapid growth, users experience high levels of frustration when searching for the desired image. Even though new images are profitable to the service providers, traditional collaborative filtering methods cannot recommend them. To solve this problem, in this paper, we propose feature-based collaborative filtering (FBCF) method to reflect the user's most recent preference by representing his purchase sequence in the visual feature space. The proposed approach represents the images that have been purchased in the past as the feature clusters in the multi-dimensional feature space and then selects neighbors by using an inter-cluster distance function between their feature clusters. Various experiments using real image data demonstrate that the proposed approach provides a higher quality recommendation and better performance than do typical collaborative filtering and content-based filtering techniques.

  15. Feature selection and validated predictive performance in the domain of Legionella pneumophila: A comparative study

    NARCIS (Netherlands)

    T. van der Ploeg (Tjeerd); E.W. Steyerberg (Ewout)

    2016-01-01

    textabstractBackground: Genetic comparisons of clinical and environmental Legionella strains form an essential part of outbreak investigations. DNA microarrays often comprise many DNA markers (features). Feature selection and the development of prediction models are particularly challenging in this

  16. 基于特征选择的模糊聚类异常入侵行为检测%Anomaly Intrusion Behavior Detection Based on Fuzzy Clustering and Features Selection

    Institute of Scientific and Technical Information of China (English)

    唐成华; 刘鹏程; 汤申生; 谢逸

    2015-01-01

    网络攻击连接具有行为的多变性和复杂性等特征,利用基于传统聚类的行为挖掘技术来构建异常入侵检测模型是不可行的。针对网络攻击行为的特点,提出了基于特征选择的模糊聚类异常入侵模型。首先通过层次聚类算法改善了FCM 聚类算法结果对初始聚类中心的敏感性,再利用遗传算法的全局搜索能力克服了其在迭代时易陷入局部最优的缺点,并将它们结合构成一种AGFCM 算法;然后采用信息增益算法对网络攻击连接数据集的特征属性进行排序,同时利用约登指数来删减数据集的特征属性以确定特征属性容量;最后利用低维特征属性集和改进的FCM 聚类算法构建了异常入侵检测模型。实验结果表明该模型对绝大多数的网络攻击类型具有很好的检测能力,为解决异常入侵检测模型的误警率和检测率等问题提供了一种可行的解决途径。%The behaviors of network attack connection are always changeable and complex .Typical behavior mining methods ,which always do using traditional clustering ,do not fit in with constructing anomaly intrusion detection model .According to the characteristics of network attacks ,the anomaly intrusion detection model based on fuzzy clustering and features selection are proposed .Firstly ,the results that the fuzzy C‐means clustering algorithm is sensitive to the initial cluster centers is improved using hierarchical clustering algorithm ,the disadvantage that FCM is easy to fall into local optimum in the iteration is overcome using the global search ability of genetic algorithm ,and they are combined into a AGFCM algorithm .Secondly ,the feature attribute data sets of network attack connection are sorted through the information gain algorithm .The capacity of feature attributes is determined by using the Youden index to cut the data sets at the same time .Lastly ,the anomaly intrusion detection model is built

  17. UNLABELED SELECTED SAMPLES IN FEATURE EXTRACTION FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES WITH LIMITED TRAINING SAMPLES

    Directory of Open Access Journals (Sweden)

    A. Kianisarkaleh

    2015-12-01

    Full Text Available Feature extraction plays a key role in hyperspectral images classification. Using unlabeled samples, often unlimitedly available, unsupervised and semisupervised feature extraction methods show better performance when limited number of training samples exists. This paper illustrates the importance of selecting appropriate unlabeled samples that used in feature extraction methods. Also proposes a new method for unlabeled samples selection using spectral and spatial information. The proposed method has four parts including: PCA, prior classification, posterior classification and sample selection. As hyperspectral image passes these parts, selected unlabeled samples can be used in arbitrary feature extraction methods. The effectiveness of the proposed unlabeled selected samples in unsupervised and semisupervised feature extraction is demonstrated using two real hyperspectral datasets. Results show that through selecting appropriate unlabeled samples, the proposed method can improve the performance of feature extraction methods and increase classification accuracy.

  18. Multi scale feature based matched filter processing

    Institute of Scientific and Technical Information of China (English)

    LI Jun; HOU Chaohuan

    2004-01-01

    Using the extreme difference of self-similarity and kurtosis at large level scale of wavelet transform approximation between the PTFM (Pulse Trains of Frequency Modulated)signals and its reverberation, a feature-based matched filter method using the classify-beforedetect paragriam is proposed to improve the detection performance in reverberation and multipath environments. Processing the data of lake-trails showed that the processing gain of the proposed method is bigger than that of matched filter about 10 dB. In multipath environments, detection performance of matched filter become badly poorer, while that of the proposed method is improved better. It shows that the method is much more robust with the effect of multipath.

  19. Improved AAG based recognization of machining feature

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The lost information caused by feature interaction is restored by using auxiliary faces(AF)and virtual links(VL).The delta volume of the interacted features represented by concave attachable connected graph (CACG)can be decomposed into several isolated features represented by complete concave adjacency graph (CCAG).We can recognize the features sketchy type by using CCAG as a hint; the exact type of the feature can be attained by deleting the auxiliary faces from the isolated feature.United machining feature(UMF)is used to represent the features that can be machined in the same machining process.It is important to the rationalizing of the process plans and reduce the time costing in machining.An example is given to demonstrate the effectiveness of this method.

  20. Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

    Directory of Open Access Journals (Sweden)

    D. Das

    2014-04-01

    Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.

  1. Biosensor method and system based on feature vector extraction

    Science.gov (United States)

    Greenbaum, Elias [Knoxville, TN; Rodriguez, Jr., Miguel; Qi, Hairong [Knoxville, TN; Wang, Xiaoling [San Jose, CA

    2012-04-17

    A method of biosensor-based detection of toxins comprises the steps of providing at least one time-dependent control signal generated by a biosensor in a gas or liquid medium, and obtaining a time-dependent biosensor signal from the biosensor in the gas or liquid medium to be monitored or analyzed for the presence of one or more toxins selected from chemical, biological or radiological agents. The time-dependent biosensor signal is processed to obtain a plurality of feature vectors using at least one of amplitude statistics and a time-frequency analysis. At least one parameter relating to toxicity of the gas or liquid medium is then determined from the feature vectors based on reference to the control signal.

  2. An Organelle Correlation-Guided Feature Selection Approach for Classifying Multi-Label Subcellular Bio-images.

    Science.gov (United States)

    Shao, Wei; Liu, Mingxia; Xu, Ying-Ying; Shen, Hong-Bin; Zhang, Daoqiang

    2017-03-03

    Nowadays, with the advances in microscopic imaging, accurate classification of bioimage-based protein subcellular location pattern has attracted as much attention as ever. One of the basic challenging problems is how to select the useful feature components among thousands of potential features to describe the images. This is not an easy task especially considering there is a high ratio of multi-location proteins. Existing feature selection methods seldom take the correlation among different cellular compartments into consideration, and thus may miss some features that will be co-important for several subcellular locations. To deal with this problem, we make use of the important structural correlation among different cellular compartments and propose an organelle structural correlation regularized feature selection method CSF (Common-Sets of Features) in this paper. We formulate the multi-label classification problem by adopting a group-sparsity regularizer to select common subsets of relevant features from different cellular compartments. In addition, we also add a cell structural correlation regularized Laplacian term, which utilizes the prior biological structural information to capture the intrinsic dependency among different cellular compartments. The CSF provides a new feature selection strategy for multi-label bio-image subcellular pattern classifications, and the experimental results also show its superiority when comparing with several existing algorithms.

  3. Advances in feature selection methods for hyperspectral image processing in food industry applications: a review.

    Science.gov (United States)

    Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An

    2015-01-01

    There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.

  4. Feature and Model Selection in Feedforward Neural Networks

    Science.gov (United States)

    1994-06-01

    output of middle nodej - M is the number of feature inputs -wij is the weight from input node i to middle node j - e0 is the input layer bias term, and is...is the updated weight from input i to middle node j - (wtuj) is the old weight from from input i to middle nodej - q7 is the step size - 62 = (d

  5. Intelligent feature selection techniques for pattern classification of Lamb wave signals

    Energy Technology Data Exchange (ETDEWEB)

    Hinders, Mark K.; Miller, Corey A. [College of William and Mary, Department of Applied Science, Williamsburg, Virginia 23187-8795 (United States)

    2014-02-18

    Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crosshole tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy.

  6. Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

    Science.gov (United States)

    Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu

    Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

  7. Effective automated feature construction and selection for classification of biological sequences.

    Directory of Open Access Journals (Sweden)

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  8. Deep sparse multi-task learning for feature selection in Alzheimer's disease diagnosis.

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2016-06-01

    Recently, neuroimaging-based Alzheimer's disease (AD) or mild cognitive impairment (MCI) diagnosis has attracted researchers in the field, due to the increasing prevalence of the diseases. Unfortunately, the unfavorable high-dimensional nature of neuroimaging data, but a limited small number of samples available, makes it challenging to build a robust computer-aided diagnosis system. Machine learning techniques have been considered as a useful tool in this respect and, among various methods, sparse regression has shown its validity in the literature. However, to our best knowledge, the existing sparse regression methods mostly try to select features based on the optimal regression coefficients in one step. We argue that since the training feature vectors are composed of both informative and uninformative or less informative features, the resulting optimal regression coefficients are inevidently affected by the uninformative or less informative features. To this end, we first propose a novel deep architecture to recursively discard uninformative features by performing sparse multi-task learning in a hierarchical fashion. We further hypothesize that the optimal regression coefficients reflect the relative importance of features in representing the target response variables. In this regard, we use the optimal regression coefficients learned in one hierarchy as feature weighting factors in the following hierarchy, and formulate a weighted sparse multi-task learning method. Lastly, we also take into account the distributional characteristics of samples per class and use clustering-induced subclass label vectors as target response values in our sparse regression model. In our experiments on the ADNI cohort, we performed both binary and multi-class classification tasks in AD/MCI diagnosis and showed the superiority of the proposed method by comparing with the state-of-the-art methods.

  9. 基于阿尔茨海默病早期诊断集成特征选择方法的研究%RESEARCH OF INTEGRATED FEATURE SELECTION METHOD BASED ON THE EARLY DIAGNOSIS OF ALZHEIMER'S DISEASE

    Institute of Scientific and Technical Information of China (English)

    曹元磊; 胡斌; 高翔

    2016-01-01

    Alzheimer is a disease which effects our lives. It is difficult to cure. However,the diagnose of mild cognitive impairment,the early stage of Alzheimer,is the key to delay the progress and treatment of the diseas. MRI is a kind of important image data. Analysis of MRI,use of classification algorithm and separating MCI from normal control is a significant method. And feature selection is an essential step to improve the accuracy of classification. We proposed integrated feature selection method combining the mutual information and Pearson correlation coefficient,not only investigating the correlation between each feature and class labels,and ensuring minimum redundancy between the selected feature subsets. Compared with the classification model of support vector institutions with single mutual information method and max - relevance and min - redundancy method,the results show that the proposed method of classification in higher prediction accuracy,illustrating certain advantages.%阿尔茨海默病是一种严重影响人类生活的病症,它具有难以治愈的特点。而其早期症状,轻度认知障碍的诊断就成了延缓发展和治疗的关键。核磁共振图像是诊断脑部疾病的重要影像资料。通过分析核磁共振图像,再利用分类算法,将轻度认知障碍患者从正常人中区分开来成为一种重要的方法。而特征选择则是提高分类准确率的必要步骤。本文提出将互信息和皮尔逊相关系数集成的特征选择方法,不仅考察每个特征对类标签的相关性,而且保证选出的特征子集之间冗余度最小。实验结果证明,与互信息和 mRMR 方法结合支持向量机进行分类性能比较,本文提出的方法分类准确性更高,说明本文的特征选择方法具有较好的优势。

  10. A combinational feature selection and ensemble neural network method for classification of gene expression data

    Directory of Open Access Journals (Sweden)

    Jiang Tianzi

    2004-09-01

    Full Text Available Abstract Background Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification. Results We validate our new method on several recent publicly available datasets both with predictive accuracy of testing samples and through cross validation. Compared with the best performance of other current methods, remarkably improved results can be obtained using our new strategy on a wide range of different datasets. Conclusions Thus, we conclude that our methods can obtain more information in microarray data to get more accurate classification and also can help to extract the latent marker genes of the diseases for better diagnosis and treatment.

  11. 基于改进量子遗传算法的入侵检测特征选择%Intrusion Detection Feature Selection Based on Improved Quantum Genetic Algorithm

    Institute of Scientific and Technical Information of China (English)

    刘晙; 狄文辉

    2011-01-01

    Analysis the characteristics of the input intrusion detection data, and higher dimensions problem of intrusion detection. According to the characteristics of intrusion detection, feature selection will be considered as an optimization problem, using quantum genetic algorithm to feature selection, full use of the quantum genetic algorithm global search and parallel processing capabilities, to eliminate redundant attributes and reduce the scale of the problem and improve the data classification quality, faster data processing speed. Data sets in KDD CUP1999 Experimental results show that genetic algorithms and particle swarm algorithm, this method can more effectively streamline features, improve the quality of classification.%针对入侵检测前必须分析输入散据的特征以及检测中数据维数较高的问题,根据入侵检测的特点,将特征选择问题作为优化问题来考虑,采用量子遗传算法对特征进行选择,充分利用其并行处理及全局搜索能力,提高数据分类质量、降低问题规模、消除冗余属性、加快数据处理速度;在KDD CUP1999数据集上进行实验,结果表明与遗传算法以及粒子群算法相比,该方法可以更有效地精简特征,提高分类质量.

  12. Feature selection of seismic waveforms for long period event detection at Cotopaxi Volcano

    Science.gov (United States)

    Lara-Cueva, R. A.; Benítez, D. S.; Carrera, E. V.; Ruiz, M.; Rojo-Álvarez, J. L.

    2016-04-01

    Volcano Early Warning Systems (VEWS) have become a research topic in order to preserve human lives and material losses. In this setting, event detection criteria based on classification using machine learning techniques have proven useful, and a number of systems have been proposed in the literature. However, to the best of our knowledge, no comprehensive and principled study has been conducted to compare the influence of the many different sets of possible features that have been used as input spaces in previous works. We present an automatic recognition system of volcano seismicity, by considering feature extraction, event classification, and subsequent event detection, in order to reduce the processing time as a first step towards a high reliability automatic detection system in real-time. We compiled and extracted a comprehensive set of temporal, moving average, spectral, and scale-domain features, for separating long period seismic events from background noise. We benchmarked two usual kinds of feature selection techniques, namely, filter (mutual information and statistical dependence) and embedded (cross-validation and pruning), each of them by using suitable and appropriate classification algorithms such as k Nearest Neighbors (k-NN) and Decision Trees (DT). We applied this approach to the seismicity presented at Cotopaxi Volcano in Ecuador during 2009 and 2010. The best results were obtained by using a 15 s segmentation window, feature matrix in the frequency domain, and DT classifier, yielding 99% of detection accuracy and sensitivity. Selected features and their interpretation were consistent among different input spaces, in simple terms of amplitude and spectral content. Our study provides the framework for an event detection system with high accuracy and reduced computational requirements.

  13. The fate of task-irrelevant visual motion: perceptual load versus feature-based attention.

    Science.gov (United States)

    Taya, Shuichiro; Adams, Wendy J; Graf, Erich W; Lavie, Nilli

    2009-11-18

    We tested contrasting predictions derived from perceptual load theory and from recent feature-based selection accounts. Observers viewed moving, colored stimuli and performed low or high load tasks associated with one stimulus feature, either color or motion. The resultant motion aftereffect (MAE) was used to evaluate attentional allocation. We found that task-irrelevant visual features received less attention than co-localized task-relevant features of the same objects. Moreover, when color and motion features were co-localized yet perceived to belong to two distinct surfaces, feature-based selection was further increased at the expense of object-based co-selection. Load theory predicts that the MAE for task-irrelevant motion would be reduced with a higher load color task. However, this was not seen for co-localized features; perceptual load only modulated the MAE for task-irrelevant motion when this was spatially separated from the attended color location. Our results suggest that perceptual load effects are mediated by spatial selection and do not generalize to the feature domain. Feature-based selection operates to suppress processing of task-irrelevant, co-localized features, irrespective of perceptual load.

  14. Evaluating EMG Feature and Classifier Selection for Application to Partial-Hand Prosthesis Control

    Science.gov (United States)

    Adewuyi, Adenike A.; Hargrove, Levi J.; Kuiken, Todd A.

    2016-01-01

    Pattern recognition-based myoelectric control of upper-limb prostheses has the potential to restore control of multiple degrees of freedom. Though this control method has been extensively studied in individuals with higher-level amputations, few studies have investigated its effectiveness for individuals with partial-hand amputations. Most partial-hand amputees retain a functional wrist and the ability of pattern recognition-based methods to correctly classify hand motions from different wrist positions is not well studied. In this study, focusing on partial-hand amputees, we evaluate (1) the performance of non-linear and linear pattern recognition algorithms and (2) the performance of optimal EMG feature subsets for classification of four hand motion classes in different wrist positions for 16 non-amputees and 4 amputees. Our results show that linear discriminant analysis and linear and non-linear artificial neural networks perform significantly better than the quadratic discriminant analysis for both non-amputees and partial-hand amputees. For amputees, including information from multiple wrist positions significantly decreased error (p < 0.001) but no further significant decrease in error occurred when more than 4, 2, or 3 positions were included for the extrinsic (p = 0.07), intrinsic (p = 0.06), or combined extrinsic and intrinsic muscle EMG (p = 0.08), respectively. Finally, we found that a feature set determined by selecting optimal features from each channel outperformed the commonly used time domain (p < 0.001) and time domain/autoregressive feature sets (p < 0.01). This method can be used as a screening filter to select the features from each channel that provide the best classification of hand postures across different wrist positions. PMID:27807418

  15. Evaluating EMG Feature and Classifier Selection for Application to Partial-Hand Prosthesis Control.

    Science.gov (United States)

    Adewuyi, Adenike A; Hargrove, Levi J; Kuiken, Todd A

    2016-01-01

    Pattern recognition-based myoelectric control of upper-limb prostheses has the potential to restore control of multiple degrees of freedom. Though this control method has been extensively studied in individuals with higher-level amputations, few studies have investigated its effectiveness for individuals with partial-hand amputations. Most partial-hand amputees retain a functional wrist and the ability of pattern recognition-based methods to correctly classify hand motions from different wrist positions is not well studied. In this study, focusing on partial-hand amputees, we evaluate (1) the performance of non-linear and linear pattern recognition algorithms and (2) the performance of optimal EMG feature subsets for classification of four hand motion classes in different wrist positions for 16 non-amputees and 4 amputees. Our results show that linear discriminant analysis and linear and non-linear artificial neural networks perform significantly better than the quadratic discriminant analysis for both non-amputees and partial-hand amputees. For amputees, including information from multiple wrist positions significantly decreased error (p < 0.001) but no further significant decrease in error occurred when more than 4, 2, or 3 positions were included for the extrinsic (p = 0.07), intrinsic (p = 0.06), or combined extrinsic and intrinsic muscle EMG (p = 0.08), respectively. Finally, we found that a feature set determined by selecting optimal features from each channel outperformed the commonly used time domain (p < 0.001) and time domain/autoregressive feature sets (p < 0.01). This method can be used as a screening filter to select the features from each channel that provide the best classification of hand postures across different wrist positions.

  16. Evaluating EMG Feature and Classifier Selection for Application to Partial-Hand Prosthesis Control

    Directory of Open Access Journals (Sweden)

    Adenike A. Adewuyi

    2016-10-01

    Full Text Available Pattern recognition-based myoelectric control of upper limb prostheses has the potential to restore control of multiple degrees of freedom. Though this control method has been extensively studied in individuals with higher-level amputations, few studies have investigated its effectiveness for individuals with partial-hand amputations. Most partial-hand amputees retain a functional wrist and the ability of pattern recognition-based methods to correctly classify hand motions from different wrist positions is not well studied. In this study, focusing on partial-hand amputees, we evaluate (1 the performance of non-linear and linear pattern recognition algorithms and (2 the performance of optimal EMG feature subsets for classification of four hand motion classes in different wrist positions for 16 non-amputees and 4 amputees. Our results show that linear discriminant analysis and linear and non-linear artificial neural networks perform significantly better than the quadratic discriminant analysis for both non-amputees and partial-hand amputees. For amputees, including information from multiple wrist positions significantly decreased error (p<0.001 but no further significant decrease in error occurred when more than 4, 2, or 3 positions were included for the extrinsic (p=0.07, intrinsic (p=0.06, or combined extrinsic and intrinsic muscle EMG (p=0.08, respectively. Finally, we found that a feature set determined by selecting optimal features from each channel outperformed the commonly used time domain (p<0.001 and time domain/autoregressive feature sets (p<0.01. This method can be used as a screening filter to select the features from each channel that provide the best classification of hand postures across different wrist positions.

  17. Feature Subset Selection by Estimation of Distribution Algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E

    2002-01-17

    This paper describes the application of four evolutionary algorithms to the identification of feature subsets for classification problems. Besides a simple GA, the paper considers three estimation of distribution algorithms (EDAs): a compact GA, an extended compact GA, and the Bayesian Optimization Algorithm. The objective is to determine if the EDAs present advantages over the simple GA in terms of accuracy or speed in this problem. The experiments used a Naive Bayes classifier and public-domain and artificial data sets. In contrast with previous studies, we did not find evidence to support or reject the use of EDAs for this problem.

  18. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis.

    Science.gov (United States)

    Ding, Hui; Feng, Peng-Mian; Chen, Wei; Lin, Hao

    2014-08-01

    The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells. Accurate identification of bacteriophage virion proteins is very important for understanding their functions and clarifying the lysis mechanism of bacterial cells. In this study, a new sequence-based method was developed to identify phage virion proteins. In the new method, the protein sequences were initially formulated by the g-gap dipeptide compositions. Subsequently, the analysis of variance (ANOVA) with incremental feature selection (IFS) was used to search for the optimal feature set. It was observed that, in jackknife cross-validation, the optimal feature set including 160 optimized features can produce the maximum accuracy of 85.02%. By performing feature analysis, we found that the correlation between two amino acids with one gap was more important than other correlations for phage virion protein prediction and that some of the 1-gap dipeptides were important and mainly contributed to the virion protein prediction. This analysis will provide novel insights into the function of phage virion proteins. On the basis of the proposed method, an online web-server, PVPred, was established and can be freely accessed from the website (http://lin.uestc.edu.cn/server/PVPred). We believe that the PVPred will become a powerful tool to study phage virion proteins and to guide the related experimental validations.

  19. Local Feature based Gender Independent Bangla ASR

    Directory of Open Access Journals (Sweden)

    Bulbul Ahamed

    2012-11-01

    Full Text Available This paper presents an automatic speech recognition (ASR for Bangla (widely used as Bengali by suppressing the speaker gender types based on local features extracted from an input speech. Speaker-specific characteristics play an important role on the performance of Bangla automatic speech recognition (ASR. Gender factor shows adverse effect in the classifier while recognizing a speech by an opposite gender, such as, training a classifier by male but testing is done by female or vice-versa. To obtain a robust ASR system in practice it is necessary to invent a system that incorporates gender independent effect for particular gender. In this paper, we have proposed a Gender-Independent technique for ASR that focused on a gender factor. The proposed method trains the classifier with the both types of gender, male and female, and evaluates the classifier for the male and female. For the experiments, we have designed a medium size Bangla (widely known as Bengali speech corpus for both the male and female.The proposed system has showed a significant improvement of word correct rates, word accuracies and sentence correct rates in comparison with the method that suffers from gender effects using. Moreover, it provides the highest level recognition performance by taking a fewer mixture component in hidden Markov model (HMMs.

  20. Fuzzy rough sets, and a granular neural network for unsupervised feature selection.

    Science.gov (United States)

    Ganivada, Avatharam; Ray, Shubhra Sankar; Pal, Sankar K

    2013-12-01

    A granular neural network for identifying salient features of data, based on the concepts of fuzzy set and a newly defined fuzzy rough set, is proposed. The formation of the network mainly involves an input vector, initial connection weights and a target value. Each feature of the data is normalized between 0 and 1 and used to develop granulation structures by a user defined α-value. The input vector and the target value of the network are defined using granulation structures, based on the concept of fuzzy sets. The same granulation structures are also presented to a decision system. The decision system helps in extracting the domain knowledge about data in the form of dependency factors, using the notion of new fuzzy rough set. These dependency factors are assigned as the initial connection weights of the proposed network. It is then trained using minimization of a novel feature evaluation index in an unsupervised manner. The effectiveness of the proposed network, in evaluating selected features, is demonstrated on several real-life datasets. The results of FRGNN are found to be statistically more significant than related methods in 28 instances of 40 instances, i.e., 70% of instances, using the paired t-test.

  1. Face Recognition Based on Facial Features

    Directory of Open Access Journals (Sweden)

    Muhammad Sharif

    2012-08-01

    Full Text Available Commencing from the last decade several different methods have been planned and developed in the prospect of face recognition that is one of the chief stimulating zone in the area of image processing. Face recognitions processes have various applications in the prospect of security systems and crime investigation systems. The study is basically comprised of three phases, i.e., face detection, facial features extraction and face recognition. The first phase is the face detection process where region of interest i.e., features region is extracted. The 2nd phase is features extraction. Here face features i.e., eyes, nose and lips are extracted out commencing the extracted face area. The last module is the face recognition phase which makes use of the extracted left eye for the recognition purpose by combining features of Eigenfeatures and Fisherfeatures.

  2. Multi-Feature Segmentation and Cluster based Approach for Product Feature Categorization

    Directory of Open Access Journals (Sweden)

    Bharat Singh

    2016-03-01

    Full Text Available At a recent time, the web has become a valuable source of online consumer review however as the number of reviews is growing in high speed. It is infeasible for user to read all reviews to make a valuable or satisfying decision because the same features, people can write it contrary words or phrases. To produce a useful summary of domain synonyms words and phrase, need to be a group into same feature group. We focus on feature-based opinion mining problem and this paper mainly studies feature based product categorization from the number of users - generated review available on the different website. First, a multi-feature segmentation method is proposed which segment multi-feature review sentences into the single feature unit. Second part of speech dictionary and context information is used to consider the irrelevant feature identification, sentiment words are used to identify the polarity of feature and finally an unsupervised clustering based product feature categorization method is proposed. Clustering is unsupervised machine learning approach that groups feature that have a high degree of similarity in a same cluster. The proposed approach provides satisfactory results and can achieve 100% average precision for clustering based product feature categorization task. This approach can be applicable to different product.

  3. Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection

    Directory of Open Access Journals (Sweden)

    Park Jinho

    2012-06-01

    Full Text Available Abstract Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1 the area between QRS offset and T-peak points, 2 the normalized and signed sum from QRS offset to effective zero voltage point, and 3 the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE and support vector machine (SVM methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical

  4. Innovations in individual feature history management - The significance of feature-based temporal model

    Science.gov (United States)

    Choi, J.; Seong, J.C.; Kim, B.; Usery, E.L.

    2008-01-01

    A feature relies on three dimensions (space, theme, and time) for its representation. Even though spatiotemporal models have been proposed, they have principally focused on the spatial changes of a feature. In this paper, a feature-based temporal model is proposed to represent the changes of both space and theme independently. The proposed model modifies the ISO's temporal schema and adds new explicit temporal relationship structure that stores temporal topological relationship with the ISO's temporal primitives of a feature in order to keep track feature history. The explicit temporal relationship can enhance query performance on feature history by removing topological comparison during query process. Further, a prototype system has been developed to test a proposed feature-based temporal model by querying land parcel history in Athens, Georgia. The result of temporal query on individual feature history shows the efficiency of the explicit temporal relationship structure. ?? Springer Science+Business Media, LLC 2007.

  5. Welding Diagnostics by Means of Particle Swarm Optimization and Feature Selection

    Directory of Open Access Journals (Sweden)

    J. Mirapeix

    2012-01-01

    Full Text Available In a previous contribution, a welding diagnostics approach based on plasma optical spectroscopy was presented. It consisted of the employment of optimization algorithms and synthetic spectra to obtain the participation profiles of the species participating in the plasma. A modification of the model is discussed here: on the one hand the controlled random search algorithm has been substituted by a particle swarm optimization implementation. On the other hand a feature selection stage has been included to determine those spectral windows where the optimization process will take place. Both experimental and field tests will be shown to illustrate the performance of the solution that improves the results of the previous work.

  6. Fingerprint Feature Extraction Based on Macroscopic Curvature

    Institute of Scientific and Technical Information of China (English)

    Zhang Xiong; He Gui-ming; Zhang Yun

    2003-01-01

    In the Automatic Fingerprint Identification System (AFIS), extracting the feature of fingerprint is very important. The local curvature of ridges of fingerprint is irregular, so people have the barrier to effectively extract the fingerprint curve features to describe fingerprint. This article proposes a novel algorithm; it embraces information of few nearby fingerprint ridges to extract a new characteristic which can describe the curvature feature of fingerprint. Experimental results show the algorithm is feasible, and the characteristics extracted by it can clearly show the inner macroscopic curve properties of fingerprint. The result also shows that this kind of characteristic is robust to noise and pollution.

  7. Fingerprint Feature Extraction Based on Macroscopic Curvature

    Institute of Scientific and Technical Information of China (English)

    Zhang; Xiong; He; Gui-Ming; 等

    2003-01-01

    In the Automatic Fingerprint Identification System(AFIS), extracting the feature of fingerprint is very important. The local curvature of ridges of fingerprint is irregular, so people have the barrier to effectively extract the fingerprint curve features to describe fingerprint. This article proposes a novel algorithm; it embraces information of few nearby fingerprint ridges to extract a new characterstic which can describe the curvature feature of fingerprint. Experimental results show the algorithm is feasible, and the characteristics extracted by it can clearly show the inner macroscopic curve properties of fingerprint. The result also shows that this kind of characteristic is robust to noise and pollution.

  8. Channel Selection and Feature Projection for Cognitive Load Estimation Using Ambulatory EEG

    Directory of Open Access Journals (Sweden)

    Tian Lan

    2007-01-01

    Full Text Available We present an ambulatory cognitive state classification system to assess the subject's mental load based on EEG measurements. The ambulatory cognitive state estimator is utilized in the context of a real-time augmented cognition (AugCog system that aims to enhance the cognitive performance of a human user through computer-mediated assistance based on assessments of cognitive states using physiological signals including, but not limited to, EEG. This paper focuses particularly on the offline channel selection and feature projection phases of the design and aims to present mutual-information-based techniques that use a simple sample estimator for this quantity. Analyses conducted on data collected from 3 subjects performing 2 tasks (n-back/Larson at 2 difficulty levels (low/high demonstrate that the proposed mutual-information-based dimensionality reduction scheme can achieve up to 94% cognitive load estimation accuracy.

  9. Clustering Based Feature Learning on Variable Stars

    CERN Document Server

    Mackenzie, Cristóbal; Protopapas, Pavlos

    2016-01-01

    The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our ...

  10. Power Quality Disturbances Feature Selection and Recognition Using Optimal Multi-Resolution Fast S-Transform and CART Algorithm

    Directory of Open Access Journals (Sweden)

    Nantian Huang

    2016-11-01

    Full Text Available In order to improve the recognition accuracy and efficiency of power quality disturbances (PQD in microgrids, a novel PQD feature selection and recognition method based on optimal multi-resolution fast S-transform (OMFST and classification and regression tree (CART algorithm is proposed. Firstly, OMFST is carried out according to the frequency domain characteristic of disturbance signal, and 67 features are extracted by time-frequency analysis to construct the original feature set. Subsequently, the optimal feature subset is determined by Gini importance and sorted according to an embedded feature selection method based on the Gini index. Finally, one standard error rule subtree evaluation methods were applied for cost complexity pruning. After pruning, the optimal decision tree (ODT is obtained for PQD classification. The experiments show that the new method can effectively improve the classification efficiency and accuracy with feature selection step. Simultaneously, the ODT can be constructed automatically according to the ability of feature classification. In different noise environments, the classification accuracy of the new method is higher than the method based on probabilistic neural network, extreme learning machine, and support vector machine.

  11. COMPUTATIONALLY INEXPENSIVE SEQUENTIAL FORWARD FLOATING SELECTION FOR ACQUIRING SIGNIFICANT FEATURES FOR AUTHORSHIP INVARIANCENESS IN WRITER IDENTIFICATION

    OpenAIRE

    Satrya Fajri Pratama; Azah Kamilah Muda; Yun-Huoy Choo; and Noor Azilah Muda

    2011-01-01

    Handwriting is individualistic. The uniqueness of shape and style of handwriting can be used to identify the significant features in authenticating the author of writing. Acquiring these significant features leads to an important research in Writer Identification domain where to find the unique features of individual which also known as Individuality of Handwriting. This paper proposes an improved Sequential Forward Floating Selection method besides the exploration of significant features for...

  12. Surface Defect Target Identification on Copper Strip Based on Adaptive Genetic Algorithm and Feature Saliency

    Directory of Open Access Journals (Sweden)

    Xuewu Zhang

    2013-01-01

    Full Text Available To enhance the stability and robustness of visual inspection system (VIS, a new surface defect target identification method for copper strip based on adaptive genetic algorithm (AGA and feature saliency is proposed. First, the study uses gray level cooccurrence matrix (GLCM and HU invariant moments for feature extraction. Then, adaptive genetic algorithm, which is used for feature selection, is evaluated and discussed. In AGA, total error rates and false alarm rates are integrated to calculate the fitness value, and the probability of crossover and mutation is adjusted dynamically according to the fitness value. At last, the selected features are optimized in accordance with feature saliency and are inputted into a support vector machine (SVM. Furthermore, for comparison, we conduct experiments using the selected optimal feature subsequence (OFS and the total feature sequence (TFS separately. The experimental results demonstrate that the proposed method can guarantee the correct rates of classification and can lower the false alarm rates.

  13. Prediction of protein modification sites of pyrrolidone carboxylic acid using mRMR feature selection and analysis.

    Directory of Open Access Journals (Sweden)

    Lu-Lu Zheng

    Full Text Available Pyrrolidone carboxylic acid (PCA is formed during a common post-translational modification (PTM of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR and incremental feature selection (IFS. We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations.

  14. Analysis of the GRNs Inference by Using Tsallis Entropy and a Feature Selection Approach

    Science.gov (United States)

    Lopes, Fabrício M.; de Oliveira, Evaldo A.; Cesar, Roberto M.

    An important problem in the bioinformatics field is to understand how genes are regulated and interact through gene networks. This knowledge can be helpful for many applications, such as disease treatment design and drugs creation purposes. For this reason, it is very important to uncover the functional relationship among genes and then to construct the gene regulatory network (GRN) from temporal expression data. However, this task usually involves data with a large number of variables and small number of observations. In this way, there is a strong motivation to use pattern recognition and dimensionality reduction approaches. In particular, feature selection is specially important in order to select the most important predictor genes that can explain some phenomena associated with the target genes. This work presents a first study about the sensibility of entropy methods regarding the entropy functional form, applied to the problem of topology recovery of GRNs. The generalized entropy proposed by Tsallis is used to study this sensibility. The inference process is based on a feature selection approach, which is applied to simulated temporal expression data generated by an artificial gene network (AGN) model. The inferred GRNs are validated in terms of global network measures. Some interesting conclusions can be drawn from the experimental results, as reported for the first time in the present paper.

  15. Evaluation of Meta-Heuristic Algorithms for Stable Feature Selection

    Directory of Open Access Journals (Sweden)

    Maysam Toghraee

    2016-07-01

    Full Text Available Now a days, developing the science and technology and technology tools, the ability of reviewing and saving the important data has been provided. It is needed to have knowledge for searching the data to reach the necessary useful results. Data mining is searching for big data sources automatically to find patterns and dependencies which are not done by simple statistical analysis. The scope is to study the predictive role and usage domain of data mining in medical science and suggesting a frame for creating, assessing and exploiting the data mining patterns in this field. As it has been found out from previous researches that assessing methods can not be used to specify the data discrepancies, our suggestion is a new approach for assessing the data similarities to find out the relations between the variation in data and stability in selection. Therefore we have chosen meta heuristic methods to be able to choose the best and the stable algorithms among a set of algorithms

  16. Pupil size reflects the focus of feature-based attention.

    Science.gov (United States)

    Binda, Paola; Pereverzeva, Maria; Murray, Scott O

    2014-12-15

    We measured pupil size in adult human subjects while they selectively attended to one of two surfaces, bright and dark, defined by coherently moving dots. The two surfaces were presented at the same location; therefore, subjects could select the cued surface only on the basis of its features. With no luminance change in the stimulus, we find that pupil size was smaller when the bright surface was attended and larger when the dark surface was attended: an effect of feature-based (or surface-based) attention. With the same surfaces at nonoverlapping locations, we find a similar effect of spatial attention. The pupil size modulation cannot be accounted for by differences in eye position and by other variables known to affect pupil size such as task difficulty, accommodation, or the mere anticipation (imagery) of bright/dark stimuli. We conclude that pupil size reflects not just luminance or cognitive state, but the interaction between the two: it reflects which luminance level in the visual scene is relevant for the task at hand.

  17. SOFT COMPUTING BASED MEDICAL IMAGE RETRIEVAL USING SHAPE AND TEXTURE FEATURES

    Directory of Open Access Journals (Sweden)

    M. Mary Helta Daisy

    2014-01-01

    Full Text Available Image retrieval is a challenging and important research applications like digital libraries and medical image databases. Content-based image retrieval is useful in retrieving images from database based on the feature vector generated with the help of the image features. In this study, we present image retrieval based on the genetic algorithm. The shape feature and morphological based texture features are extracted images in the database and query image. Then generating chromosome based on the distance value obtained by the difference feature vector of images in the data base and the query image. In the selected chromosome the genetic operators like cross over and mutation are applied. After that the best chromosome selected and displays the most similar images to the query image. The retrieval performance of the method shows better retrieval result.

  18. Object learning improves feature extraction but does not improve feature selection.

    Directory of Open Access Journals (Sweden)

    Linus Holm

    Full Text Available A single glance at your crowded desk is enough to locate your favorite cup. But finding an unfamiliar object requires more effort. This superiority in recognition performance for learned objects has at least two possible sources. For familiar objects observers might: 1 select more informative image locations upon which to fixate their eyes, or 2 extract more information from a given eye fixation. To test these possibilities, we had observers localize fragmented objects embedded in dense displays of random contour fragments. Eight participants searched for objects in 600 images while their eye movements were recorded in three daily sessions. Performance improved as subjects trained with the objects: The number of fixations required to find an object decreased by 64% across the 3 sessions. An ideal observer model that included measures of fragment confusability was used to calculate the information available from a single fixation. Comparing human performance to the model suggested that across sessions information extraction at each eye fixation increased markedly, by an amount roughly equal to the extra information that would be extracted following a 100% increase in functional field of view. Selection of fixation locations, on the other hand, did not improve with practice.

  19. Face Recognition Based on Nonlinear Feature Approach

    Directory of Open Access Journals (Sweden)

    Eimad E.A. Abusham

    2008-01-01

    Full Text Available Feature extraction techniques are widely used to reduce the complexity high dimensional data. Nonlinear feature extraction via Locally Linear Embedding (LLE has attracted much attention due to their high performance. In this paper, we proposed a novel approach for face recognition to address the challenging task of recognition using integration of nonlinear dimensional reduction Locally Linear Embedding integrated with Local Fisher Discriminant Analysis (LFDA to improve the discriminating power of the extracted features by maximize between-class while within-class local structure is preserved. Extensive experimentation performed on the CMU-PIE database indicates that the proposed methodology outperforms Benchmark methods such as Principal Component Analysis (PCA, Fisher Discrimination Analysis (FDA. The results showed that 95% of recognition rate could be obtained using our proposed method.

  20. Palmprint Based Verification System Using SURF Features

    Science.gov (United States)

    Srinivas, Badrinath G.; Gupta, Phalguni

    This paper describes the design and development of a prototype of robust biometric system for verification. The system uses features extracted using Speeded Up Robust Features (SURF) operator of human hand. The hand image for features is acquired using a low cost scanner. The palmprint region extracted is robust to hand translation and rotation on the scanner. The system is tested on IITK database of 200 images and PolyU database of 7751 images. The system is found to be robust with respect to translation and rotation. It has FAR 0.02%, FRR 0.01% and accuracy of 99.98% and can be a suitable system for civilian applications and high-security environments.

  1. Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters.

    Science.gov (United States)

    Li, Yifeng; Chen, Chih-Yu; Wasserman, Wyeth W

    2016-05-01

    Sparse linear models approximate target variable(s) by a sparse linear combination of input variables. Since they are simple, fast, and able to select features, they are widely used in classification and regression. Essentially they are shallow feed-forward neural networks that have three limitations: (1) incompatibility to model nonlinearity of features, (2) inability to learn high-level features, and (3) unnatural extensions to select features in a multiclass case. Deep neural networks are models structured by multiple hidden layers with nonlinear activation functions. Compared with linear models, they have two distinctive strengths: the capability to (1) model complex systems with nonlinear structures and (2) learn high-level representation of features. Deep learning has been applied in many large and complex systems where deep models significantly outperform shallow ones. However, feature selection at the input level, which is very helpful to understand the nature of a complex system, is still not well studied. In genome research, the cis-regulatory elements in noncoding DNA sequences play a key role in the expression of genes. Since the activity of regulatory elements involves highly interactive factors, a deep tool is strongly needed to discover informative features. In order to address the above limitations of shallow and deep models for selecting features of a complex system, we propose a deep feature selection (DFS) model that (1) takes advantages of deep structures to model nonlinearity and (2) conveniently selects a subset of features right at the input level for multiclass data. Simulation experiments convince us that this model is able to correctly identify both linear and nonlinear features. We applied this model to the identification of active enhancers and promoters by integrating multiple sources of genomic information. Results show that our model outperforms elastic net in terms of size of discriminative feature subset and classification accuracy.

  2. Feature Extraction with Ordered Mean Values for Content Based Image Classification

    Directory of Open Access Journals (Sweden)

    Sudeep Thepade

    2014-01-01

    Full Text Available Categorization of images into meaningful classes by efficient extraction of feature vectors from image datasets has been dependent on feature selection techniques. Traditionally, feature vector extraction has been carried out using different methods of image binarization done with selection of global, local, or mean threshold. This paper has proposed a novel technique for feature extraction based on ordered mean values. The proposed technique was combined with feature extraction using discrete sine transform (DST for better classification results using multitechnique fusion. The novel methodology was compared to the traditional techniques used for feature extraction for content based image classification. Three benchmark datasets, namely, Wang dataset, Oliva and Torralba (OT-Scene dataset, and Caltech dataset, were used for evaluation purpose. Performance measure after evaluation has evidently revealed the superiority of the proposed fusion technique with ordered mean values and discrete sine transform over the popular approaches of single view feature extraction methodologies for classification.

  3. Ship Targets Discrimination Algorithm in SAR Images Based on Hu Moment Feature and Texture Feature

    Directory of Open Access Journals (Sweden)

    Liu Lei

    2016-01-01

    Full Text Available To discriminate the ship targets in SAR images, this paper proposed the method based on combination of Hu moment feature and texture feature. Firstly,7 Hu moment features should be extracted, while gray level co-occurrence matrix is then used to extract the features of mean, variance, uniformity, energy, entropy, inertia moment, correlation and differences. Finally the k-neighbour classifier was used to analysis the 15 dimensional feature vectors. The experimental results show that the method of this paper has a good effect.

  4. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Maolong Xi

    2016-01-01

    Full Text Available This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO for cancer feature gene selection, coupling support vector machine (SVM for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV. Finally, the BQPSO coupling SVM (BQPSO/SVM, binary PSO coupling SVM (BPSO/SVM, and genetic algorithm coupling SVM (GA/SVM are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms.

  5. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine.

    Science.gov (United States)

    Xi, Maolong; Sun, Jun; Liu, Li; Fan, Fangyun; Wu, Xiaojun

    2016-01-01

    This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO) for cancer feature gene selection, coupling support vector machine (SVM) for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV). Finally, the BQPSO coupling SVM (BQPSO/SVM), binary PSO coupling SVM (BPSO/SVM), and genetic algorithm coupling SVM (GA/SVM) are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms.

  6. Sparse coding based feature representation method for remote sensing images

    Science.gov (United States)

    Oguslu, Ender

    In this dissertation, we study sparse coding based feature representation method for the classification of multispectral and hyperspectral images (HSI). The existing feature representation systems based on the sparse signal model are computationally expensive, requiring to solve a convex optimization problem to learn a dictionary. A sparse coding feature representation framework for the classification of HSI is presented that alleviates the complexity of sparse coding through sub-band construction, dictionary learning, and encoding steps. In the framework, we construct the dictionary based upon the extracted sub-bands from the spectral representation of a pixel. In the encoding step, we utilize a soft threshold function to obtain sparse feature representations for HSI. Experimental results showed that a randomly selected dictionary could be as effective as a dictionary learned from optimization. The new representation usually has a very high dimensionality requiring a lot of computational resources. In addition, the spatial information of the HSI data has not been included in the representation. Thus, we modify the framework by incorporating the spatial information of the HSI pixels and reducing the dimension of the new sparse representations. The enhanced model, called sparse coding based dense feature representation (SC-DFR), is integrated with a linear support vector machine (SVM) and a composite kernels SVM (CKSVM) classifiers to discriminate different types of land cover. We evaluated the proposed algorithm on three well known HSI datasets and compared our method to four recently developed classification methods: SVM, CKSVM, simultaneous orthogonal matching pursuit (SOMP) and image fusion and recursive filtering (IFRF). The results from the experiments showed that the proposed method can achieve better overall and average classification accuracies with a much more compact representation leading to more efficient sparse models for HSI classification. To further

  7. Corporate Features and Faith-Based Academies

    Science.gov (United States)

    Green, Elizabeth

    2009-01-01

    This article forms an introductory exploration into the relationship between corporate features and religious values in Academies sponsored by a Christian foundation. This is a theme which arose from research comprising the ethnography of a City Technology College (CTC) with a Christian ethos. The Christian foundation which sponsors the CTC also…

  8. Surface characterization based upon significant topographic features

    Energy Technology Data Exchange (ETDEWEB)

    Blanc, J; Grime, D; Blateyron, F, E-mail: fblateyron@digitalsurf.fr [Digital Surf, 16 rue Lavoisier, F-25000 Besancon (France)

    2011-08-19

    Watershed segmentation and Wolf pruning, as defined in ISO 25178-2, allow the detection of significant features on surfaces and their characterization in terms of dimension, area, volume, curvature, shape or morphology. These new tools provide a robust way to specify functional surfaces.

  9. A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer’s Disease Diagnosis

    Science.gov (United States)

    An, Le; Adeli, Ehsan; Liu, Mingxia; Zhang, Jun; Lee, Seong-Whan; Shen, Dinggang

    2017-01-01

    Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer’s disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals. PMID:28358032

  10. Grammar-based feature generation for time-series prediction

    CERN Document Server

    De Silva, Anthony Mihirana

    2015-01-01

    This book proposes a novel approach for time-series prediction using machine learning techniques with automatic feature generation. Application of machine learning techniques to predict time-series continues to attract considerable attention due to the difficulty of the prediction problems compounded by the non-linear and non-stationary nature of the real world time-series. The performance of machine learning techniques, among other things, depends on suitable engineering of features. This book proposes a systematic way for generating suitable features using context-free grammar. A number of feature selection criteria are investigated and a hybrid feature generation and selection algorithm using grammatical evolution is proposed. The book contains graphical illustrations to explain the feature generation process. The proposed approaches are demonstrated by predicting the closing price of major stock market indices, peak electricity load and net hourly foreign exchange client trade volume. The proposed method ...

  11. Straight line feature based image distortion correction

    Institute of Scientific and Technical Information of China (English)

    Zhang Haofeng; Zhao Chunxia; Lu Jianfeng; Tang Zhenmin; Yang Jingyu

    2008-01-01

    An image distortion correction method is proposed, which uses the straight line features. Many parallel lines of different direction from different images were extracted, and then were used to optimize the distortion parameters by nonlinear least square. The thought of step by step was added when the optimization method working. 3D world coordi-nation is not need to know, and the method is easy to implement. The experiment result shows its high accuracy.

  12. Geometrically Invariant Watermarking Scheme Based on Local Feature Points

    Directory of Open Access Journals (Sweden)

    Jing Li

    2012-06-01

    Full Text Available Based on local invariant feature points and cross ratio principle, this paper presents a feature-point-based image watermarking scheme. It is robust to geometric attacks and some signal processes. It extracts local invariant feature points from the image using the improved scale invariant feature transform algorithm. Utilizing these points as vertexes it constructs some quadrilaterals to be as local feature regions. Watermark is inserted these local feature regions repeatedly. In order to get stable local regions it adjusts the number and distribution of extracted feature points. In every chosen local feature region it decides locations to embed watermark bits based on the cross ratio of four collinear points, the cross ratio is invariant to projective transformation. Watermark bits are embedded by quantization modulation, in which the quantization step value is computed with the given PSNR. Experimental results show that the proposed method can strongly fight more geometrical attacks and the compound attacks of geometrical ones.

  13. Document image retrieval based on multi-density features

    Institute of Scientific and Technical Information of China (English)

    HU Zhilan; LIN Xinggang; YAN Hong

    2007-01-01

    The development of document image databases is becoming a challenge for document image retrieval techniques.Traditional layout-reconstructed-based methods rely on high quality document images as well as an optical character recognition (OCR) precision,and can only deal with several widely used languages.The complexity of document layouts greatly hinders layout analysis-based approaches.This paper describes a multi-density feature based algorithm for binary document images,which is independent of OCR or layout analyses.The text area was extracted after preprocessing such as skew correction and marginal noise removal.Then the aspect ratio and multi-density features were extracted from the text area to select the best candidates from the document image database.Experimental results show that this approach is simple with loss rates less than 3% and can efficiently analyze images with different resolutions and different input systems.The system is also robust to noise due to its notes and complex layouts,etc.

  14. Fuzzy - Rough Feature Selection With {\\Pi}- Membership Function For Mammogram Classification

    CERN Document Server

    Thangavel, K

    2012-01-01

    Breast cancer is the second leading cause for death among women and it is diagnosed with the help of mammograms. Oncologists are miserably failed in identifying the micro calcification at the early stage with the help of the mammogram visually. In order to improve the performance of the breast cancer screening, most of the researchers have proposed Computer Aided Diagnosis using image processing. In this study mammograms are preprocessed and features are extracted, then the abnormality is identified through the classification. If all the extracted features are used, most of the cases are misidentified. Hence feature selection procedure is sought. In this paper, Fuzzy-Rough feature selection with {\\pi} membership function is proposed. The selected features are used to classify the abnormalities with help of Ant-Miner and Weka tools. The experimental analysis shows that the proposed method improves the mammograms classification accuracy.

  15. Hybridization of Evolutionary Mechanisms for Feature Subset Selection in Unsupervised Learning

    Science.gov (United States)

    Torres, Dolores; Ponce-de-León, Eunice; Torres, Aurora; Ochoa, Alberto; Díaz, Elva

    Feature subset selection for unsupervised learning, is a very important topic in artificial intelligence because it is the base for saving computational resources. In this implementation we use a typical testor’s methodology in order to incorporate an importance index for each variable. This paper presents the general framework and the way two hybridized meta-heuristics work in this NP-complete problem. The evolutionary mechanisms are based on the Univariate Marginal Distribution Algorithm (UMDA) and the Genetic Algorithm (GA). GA and UMDA - Estimation of Distribution Algorithm (EDA) use a very useful rapid operator implemented for finding typical testors on a very large dataset and also, both algorithms, have a local search mechanism for improving time and fitness. Experiments show that EDA is faster than GA because it has a better exploitation performance; nevertheless, GA’ solutions are more consistent.

  16. Ensemble based system for whole-slide prostate cancer probability mapping using color texture features.

    LENUS (Irish Health Repository)

    DiFranco, Matthew D

    2011-01-01

    We present a tile-based approach for producing clinically relevant probability maps of prostatic carcinoma in histological sections from radical prostatectomy. Our methodology incorporates ensemble learning for feature selection and classification on expert-annotated images. Random forest feature selection performed over varying training sets provides a subset of generalized CIEL*a*b* co-occurrence texture features, while sample selection strategies with minimal constraints reduce training data requirements to achieve reliable results. Ensembles of classifiers are built using expert-annotated tiles from training images, and scores for the probability of cancer presence are calculated from the responses of each classifier in the ensemble. Spatial filtering of tile-based texture features prior to classification results in increased heat-map coherence as well as AUC values of 95% using ensembles of either random forests or support vector machines. Our approach is designed for adaptation to different imaging modalities, image features, and histological decision domains.

  17. Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

    Directory of Open Access Journals (Sweden)

    Raghavendra B. K

    2010-11-01

    Full Text Available A credit-risk evaluation decision involves processing huge volumes of raw data, and hence requires powerful data mining tools. Several techniques that were developed in machine learning have been used for financial credit-risk evaluation decisions. Data mining is the process of finding patterns and relations in large databases. Neural Networks are one of the popular tools for building predictive models in data mining. The major drawback of neural network is the curse of dimensionality which requires optimal feature subset. Feature selection is an important topic of research in data mining. Feature selection is the problem of choosing a small subset of features that optimally is necessary and sufficient to describe the target concept. In this research an attempt has been made to investigate the preprocessing framework for feature selection in credit scoring using neural network. Feature selection techniques like best first search, info gain etc. methods have been evaluated for the effectiveness of the classification of the risk groups on publicly available data sets. In particular, German, Australian, and Japanese credit rating data sets have been used for evaluation. The results have been conclusive about the effectiveness of feature selection for neural networks and validate the hypothesis of the research.

  18. Double feature selection and cluster analyses in mining of microarray data from cotton

    Directory of Open Access Journals (Sweden)

    Wilkins Thea A

    2008-06-01

    Full Text Available Abstract Background Cotton fiber is a single-celled seed trichome of major biological and economic importance. In recent years, genomic approaches such as microarray-based expression profiling were used to study fiber growth and development to understand the developmental mechanisms of fiber at the molecular level. The vast volume of microarray expression data generated requires a sophisticated means of data mining in order to extract novel information that addresses fundamental questions of biological interest. One of the ways to approach microarray data mining is to increase the number of dimensions/levels to the analysis, such as comparing independent studies from different genotypes. However, adding dimensions also creates a challenge in finding novel ways for analyzing multi-dimensional microarray data. Results Mining of independent microarray studies from Pima and Upland (TM1 cotton using double feature selection and cluster analyses identified species-specific and stage-specific gene transcripts that argue in favor of discrete genetic mechanisms that govern developmental programming of cotton fiber morphogenesis in these two cultivated species. Double feature selection analysis identified the highest number of differentially expressed genes that distinguish the fiber transcriptomes of developing Pima and TM1 fibers. These results were based on the finding that differences in fibers harvested between 17 and 24 day post-anthesis (dpa represent the greatest expressional distance between the two species. This powerful selection method identified a subset of genes expressed during primary (PCW and secondary (SCW cell wall biogenesis in Pima fibers that exhibits an expression pattern that is generally reversed in TM1 at the same developmental stage. Cluster and functional analyses revealed that this subset of genes are primarily regulated during the transition stage that overlaps the termination of PCW and onset of SCW biogenesis, suggesting

  19. Kernel based visual tracking with scale invariant features

    Institute of Scientific and Technical Information of China (English)

    Risheng Han; Zhongliang Jing; Yuanxiang Li

    2008-01-01

    The kernel based tracking has two disadvantages:the tracking window size cannot be adjusted efficiently,and the kernel based color distribution may not have enough ability to discriminate object from clutter background.FDr boosting up the feature's discriminating ability,both scale invariant features and kernel based color distribution features are used as descriptors of tracked object.The proposed algorithm can keep tracking object of varying scales even when the surrounding background is similar to the object's appearance.

  20. Comparison of texture features based on Gabor filters

    NARCIS (Netherlands)

    Grigorescu, Simona E.; Petkov, Nicolai; Kruizinga, Peter

    2002-01-01

    Texture features that are based on the local power spectrum obtained by a bank of Gabor filters are compared. The features differ in the type of nonlinear post-processing which is applied to the local power spectrum. The following features are considered: Gabor energy, complex moments, and grating c

  1. Study on Isomerous CAD Model Exchange Based on Feature

    Institute of Scientific and Technical Information of China (English)

    SHAO Xiaodong; CHEN Feng; XU Chenguang

    2006-01-01

    A model-exchange method based on feature between isomerous CAD systems is put forward in this paper. In this method, CAD model information is accessed at both feature and geometry levels and converted according to standard feature operation. The feature information including feature tree, dimensions and constraints, which will be lost in traditional data conversion, as well as geometry are converted completely from source CAD system to destination one. So the transferred model can be edited through feature operation, which cannot be implemented by general model-exchange interface.

  2. Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2013-01-01

    Full Text Available Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.

  3. CONSTRUCTION AND MODIFICATION OF FLEXIBLE FEATURE-BASED MODELS

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    A new approach is proposed to generate flexible featrure-based models (FFBM), which can be modified dynamically. BRep/CSFG/FRG hybrid scheme is used to describe FFBM, in which BRep explicitly defines the model, CSFG (Constructive solid-feature geometry) tree records the feature-based modelling procedure and FRG (Feature relation graph) reflects different knids of relationship among features. Topological operators with local retrievability are designed to implement feature addition, which is traced by topological operation list (TOL) in detail. As a result, FFBM can be modified directly in the system database. Related features' chain reactions and variable topologies are supported in design modification, after which the product information adhering on features will not be lost. Further, a feature can be modified as rapidly as it was added.

  4. EEG signal features extraction based on fractal dimension.

    Science.gov (United States)

    Finotello, Francesca; Scarpa, Fabio; Zanon, Mattia

    2015-01-01

    The spread of electroencephalography (EEG) in countless applications has fostered the development of new techniques for extracting synthetic and informative features from EEG signals. However, the definition of an effective feature set depends on the specific problem to be addressed and is currently an active field of research. In this work, we investigated the application of features based on fractal dimension to a problem of sleep identification from EEG data. We demonstrated that features based on fractal dimension, including two novel indices defined in this work, add valuable information to standard EEG features and significantly improve sleep identification performance.

  5. Robust speech features representation based on computational auditory model

    Institute of Scientific and Technical Information of China (English)

    LU Xugang; JIA Chuan; DANG Jianwu

    2004-01-01

    A speech signal processing and features extracting method based on computational auditory model is proposed. The computational model is based on psychological, physiological knowledge and digital signal processing methods. In each stage of a hearing perception system, there is a corresponding computational model to simulate its function. Based on this model, speech features are extracted. In each stage, the features in different kinds of level are extracted. A further processing for primary auditory spectrum based on lateral inhibition is proposed to extract much more robust speech features. All these features can be regarded as the internal representations of speech stimulation in hearing system. The robust speech recognition experiments are conducted to test the robustness of the features. Results show that the representations based on the proposed computational auditory model are robust representations for speech signals.

  6. Accurate Image Retrieval Algorithm Based on Color and Texture Feature

    Directory of Open Access Journals (Sweden)

    Chunlai Yan

    2013-06-01

    Full Text Available Content-Based Image Retrieval (CBIR is one of the most active hot spots in the current research field of multimedia retrieval. According to the description and extraction of visual content (feature of the image, CBIR aims to find images that contain specified content (feature in the image database. In this paper, several key technologies of CBIR, e. g. the extraction of the color and texture features of the image, as well as the similarity measures are investigated. On the basis of the theoretical research, an image retrieval system based on color and texture features is designed. In this system, the Weighted Color Feature based on HSV space is adopted as a color feature vector, four features of the Co-occurrence Matrix, saying Energy, Entropy, Inertia Quadrature and Correlation, are used to construct texture vectors, and the Euclidean distance for similarity measure is employed as well. Experimental results show that this CBIR system is efficient in image retrieval.

  7. A Dynamic Feature-Based Method for Hybrid Blurred/Multiple Object Detection in Manufacturing Processes

    Directory of Open Access Journals (Sweden)

    Tsun-Kuo Lin

    2016-01-01

    Full Text Available Vision-based inspection has been applied for quality control and product sorting in manufacturing processes. Blurred or multiple objects are common causes of poor performance in conventional vision-based inspection systems. Detecting hybrid blurred/multiple objects has long been a challenge in manufacturing. For example, single-feature-based algorithms might fail to exactly extract features when concurrently detecting hybrid blurred/multiple objects. Therefore, to resolve this problem, this study proposes a novel vision-based inspection algorithm that entails selecting a dynamic feature-based method on the basis of a multiclassifier of support vector machines (SVMs for inspecting hybrid blurred/multiple object images. The proposed algorithm dynamically selects suitable inspection schemes for classifying the hybrid images. The inspection schemes include discrete wavelet transform, spherical wavelet transform, moment invariants, and edge-feature-descriptor-based classification methods. The classification methods for single and multiple objects are adaptive region growing- (ARG- based and local adaptive region growing- (LARG- based learning approaches, respectively. The experimental results demonstrate that the proposed algorithm can dynamically select suitable inspection schemes by applying a selection algorithm, which uses SVMs for classifying hybrid blurred/multiple object samples. Moreover, the method applies suitable feature-based schemes on the basis of the classification results for employing the ARG/LARG-based method to inspect the hybrid objects. The method improves conventional methods for inspecting hybrid blurred/multiple objects and achieves high recognition rates for that in manufacturing processes.

  8. Computing Dialogue Acts from Features with Transformation-Based Learning

    CERN Document Server

    Samuel, K B; Vijay-Shanker, K; Samuel, Ken; Carberry, Sandra

    1998-01-01

    To interpret natural language at the discourse level, it is very useful to accurately recognize dialogue acts, such as SUGGEST, in identifying speaker intentions. Our research explores the utility of a machine learning method called Transformation-Based Learning (TBL) in computing dialogue acts, because TBL has a number of advantages over alternative approaches for this application. We have identified some extensions to TBL that are necessary in order to address the limitations of the original algorithm and the particular demands of discourse processing. We use a Monte Carlo strategy to increase the applicability of the TBL method, and we select features of utterances that can be used as input to improve the performance of TBL. Our system is currently being tested on the VerbMobil corpora of spoken dialogues, producing promising preliminary results.

  9. Segmentation-Based PolSAR Image Classification Using Visual Features: RHLBP and Color Features

    Directory of Open Access Journals (Sweden)

    Jian Cheng

    2015-05-01

    Full Text Available A segmentation-based fully-polarimetric synthetic aperture radar (PolSAR image classification method that incorporates texture features and color features is designed and implemented. This method is based on the framework that conjunctively uses statistical region merging (SRM for segmentation and support vector machine (SVM for classification. In the segmentation step, we propose an improved local binary pattern (LBP operator named the regional homogeneity local binary pattern (RHLBP to guarantee the regional homogeneity in PolSAR images. In the classification step, the color features extracted from false color images are applied to improve the classification accuracy. The RHLBP operator and color features can provide discriminative information to separate those pixels and regions with similar polarimetric features, which are from different classes. Extensive experimental comparison results with conventional methods on L-band PolSAR data demonstrate the effectiveness of our proposed method for PolSAR image classification.

  10. A Multistage Feature Selection Model for Document Classification Using Information Gain and Rough Set

    Directory of Open Access Journals (Sweden)

    Mrs. Leena. H. Patil

    2014-11-01

    Full Text Available Huge number of documents are increasing rapidly, therefore, to organize it in digitized form text categorization becomes an challenging issue. A major issue for text categorization is its large number of features. Most of the features are noisy, irrelevant and redundant, which may mislead the classifier. Hence, it is most important to reduce dimensionality of data to get smaller subset and provide the most gain in information. Feature selection techniques reduce the dimensionality of feature space. It also improves the overall accuracy and performance. Hence, to overcome the issues of text categorization feature selection is considered as an efficient technique . Therefore, we, proposed a multistage feature selection model to improve the overall accuracy and performance of classification. In the first stage document preprocessing part is performed. Secondly, each term within the documents are ranked according to their importance for classification using the information gain. Thirdly rough set technique is applied to the terms which are ranked importantly and feature reduction is carried out. Finally a document classification is performed on the core features using Naive Bayes and KNN classifier. Experiments are carried out on three UCI datasets, Reuters 21578, Classic 04 and Newsgroup 20. Results show the better accuracy and performance of the proposed model.

  11. YamiPred: A novel evolutionary method for predicting pre-miRNAs and selecting relevant features

    KAUST Repository

    Kleftogiannis, Dimitrios A.

    2015-01-23

    MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of Support Vector Machines (SVM) with Genetic Algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.

  12. Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants

    Directory of Open Access Journals (Sweden)

    Malik Yousef

    2016-01-01

    Full Text Available MicroRNAs (miRNAs are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.

  13. Level Sets and Voronoi based Feature Extraction from any Imagery

    DEFF Research Database (Denmark)

    Sharma, O.; Anton, François; Mioc, Darka

    2012-01-01

    Polygon features are of interest in many GEOProcessing applications like shoreline mapping, boundary delineation, change detection, etc. This paper presents a unique new GPU-based methodology to automate feature extraction combining level sets, or mean shift based segmentation together with Voronoi...

  14. 一个混合特征属性选择算法%A Mixing Algorithm for Feature Attribute Selection

    Institute of Scientific and Technical Information of China (English)

    刘明吉; 王秀峰; 饶一梅

    2000-01-01

    The feature attribute selection is a very interesting problem.With the development of Rough Set theory(RS)during these years,many researchers and scholars proposed the attribute selection based on RS.But with the increasement of the attribute number,the efficiency declines rapidly.In this paper,we combine the RS theory with GA and propose a mixing heuristic algorithm for attribute selection.The experiment result shows that it can get better result and higher efficiency especially for settling the problem of large attribute number.

  15. Ensemble classification of colon biopsy images based on information rich hybrid features.

    Science.gov (United States)

    Rathore, Saima; Hussain, Mutawarra; Aksam Iftikhar, Muhammad; Jalil, Abdul

    2014-04-01

    In recent years, classification of colon biopsy images has become an active research area. Traditionally, colon cancer is diagnosed using microscopic analysis. However, the process is subjective and leads to considerable inter/intra observer variation. Therefore, reliable computer-aided colon cancer detection techniques are in high demand. In this paper, we propose a colon biopsy image classification system, called CBIC, which benefits from discriminatory capabilities of information rich hybrid feature spaces, and performance enhancement based on ensemble classification methodology. Normal and malignant colon biopsy images differ with each other in terms of the color distribution of different biological constituents. The colors of different constituents are sharp in normal images, whereas the colors diffuse with each other in malignant images. In order to exploit this variation, two feature types, namely color components based statistical moments (CCSM) and Haralick features have been proposed, which are color components based variants of their traditional counterparts. Moreover, in normal colon biopsy images, epithelial cells possess sharp and well-defined edges. Histogram of oriented gradients (HOG) based features have been employed to exploit this information. Different combinations of hybrid features have been constructed from HOG, CCSM, and Haralick features. The minimum Redundancy Maximum Relevance (mRMR) feature selection method has been employed to select meaningful features from individual and hybrid feature sets. Finally, an ensemble classifier based on majority voting has been proposed, which classifies colon biopsy images using the selected features. Linear, RBF, and sigmoid SVM have been employed as base classifiers. The proposed system has been tested on 174 colon biopsy images, and improved performance (=98.85%) has been observed compared to previously reported studies. Additionally, the use of mRMR method has been justified by comparing the

  16. Exploitation of Intra-Spectral Band Correlation for Rapid Feature Selection, and Target Identification in Hyperspectral Imagery

    Science.gov (United States)

    2009-03-01

    entitled “Improved Feature Extraction, Feature Selection, and Identification Techniques that Create a Fast Unsupervised Hyperspectral Target Detection...thesis proposal “Improved Feature Extraction, Feature Selection, and Identification Techniques that Create a Fast Unsupervised Hyperspectral Target...target or non-target classifications . Integration of this type of autonomous target detection algorithm along with hyperspectral imaging sensors

  17. A DYNAMIC FEATURE SELECTION METHOD FOR DOCUMENT RANKING WITH RELEVANCE FEEDBACK APPROACH

    Directory of Open Access Journals (Sweden)

    K. Latha

    2010-07-01

    Full Text Available Ranking search results is essential for information retrieval and Web search. Search engines need to not only return highly relevant results, but also be fast to satisfy users. As a result, not all available features can be used for ranking, and in fact only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. In this paper we describe a 0/1 knapsack procedure for automatically selecting features to use within Generalization model for Document Ranking. We propose an approach for Relevance Feedback using Expectation Maximization method and evaluate the algorithm on the TREC Collection for describing classes of feedback textual information retrieval features. Experimental results, evaluated on standard TREC-9 part of the OHSUMED collections, show that our feature selection algorithm produces models that are either significantly more effective than, or equally effective as, models such as Markov Random Field model, Correlation Co-efficient and Count Difference method

  18. Comparative Study on Feature Selection and Fusion Schemes for Emotion Recognition from Speech

    Directory of Open Access Journals (Sweden)

    Santiago Planet

    2012-09-01

    Full Text Available The automatic analysis of speech to detect affective states may improve the way users interact with electronic devices. However, the analysis only at the acoustic level could be not enough to determine the emotion of a user in a realistic scenario. In this paper we analyzed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We compared two versions of this greedy selection algorithm by performing the search of the relevant features forwards and backwards. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme. Moreover, the extremely reduced set of acoustic features obtained by the greedy forward search selection algorithm improved the results provided by the full set.

  19. A biological mechanism for Bayesian feature selection: Weight decay and raising the LASSO.

    Science.gov (United States)

    Connor, Patrick; Hollensen, Paul; Krigolson, Olav; Trappenberg, Thomas

    2015-07-01

    Biological systems are capable of learning that certain stimuli are valuable while ignoring the many that are not, and thus perform feature selection. In machine learning, one effective feature selection approach is the least absolute shrinkage and selection operator (LASSO) form of regularization, which is equivalent to assuming a Laplacian prior distribution on the parameters. We review how such Bayesian priors can be implemented in gradient descent as a form of weight decay, which is a biologically plausible mechanism for Bayesian feature selection. In particular, we describe a new prior that offsets or "raises" the Laplacian prior distribution. We evaluate this alongside the Gaussian and Cauchy priors in gradient descent using a generic regression task where there are few relevant and many irrelevant features. We find that raising the Laplacian leads to less prediction error because it is a better model of the underlying distribution. We also consider two biologically relevant online learning tasks, one synthetic and one modeled after the perceptual expertise task of Krigolson et al. (2009). Here, raising the Laplacian prior avoids the fast erosion of relevant parameters over the period following training because it only allows small weights to decay. This better matches the limited loss of association seen between days in the human data of the perceptual expertise task. Raising the Laplacian prior thus results in a biologically plausible form of Bayesian feature selection that is effective in biologically relevant contexts.

  20. Stereo vision-based pedestrian detection using multiple features for automotive application

    Science.gov (United States)

    Lee, Chung-Hee; Kim, Dongyoung

    2015-12-01

    In this paper, we propose a stereo vision-based pedestrian detection using multiple features for automotive application. The disparity map from stereo vision system and multiple features are utilized to enhance the pedestrian detection performance. Because the disparity map offers us 3D information, which enable to detect obstacles easily and reduce the overall detection time by removing unnecessary backgrounds. The road feature is extracted from the v-disparity map calculated by the disparity map. The road feature is a decision criterion to determine the presence or absence of obstacles on the road. The obstacle detection is performed by comparing the road feature with all columns in the disparity. The result of obstacle detection is segmented by the bird's-eye-view mapping to separate the obstacle area which has multiple objects into single obstacle area. The histogram-based clustering is performed in the bird's-eye-view map. Each segmented result is verified by the classifier with the training model. To enhance the pedestrian recognition performance, multiple features such as HOG, CSS, symmetry features are utilized. In particular, the symmetry feature is proper to represent the pedestrian standing or walking. The block-based symmetry feature is utilized to minimize the type of image and the best feature among the three symmetry features of H-S-V image is selected as the symmetry feature in each pixel. ETH database is utilized to verify our pedestrian detection algorithm.

  1. A HYBRID APPROACH BASED MEDICAL IMAGE RETRIEVAL SYSTEM USING FEATURE OPTIMIZED CLASSIFICATION SIMILARITY FRAMEWORK

    Directory of Open Access Journals (Sweden)

    Yogapriya Jaganathan

    2013-01-01

    Full Text Available For the past few years, massive upgradation is obtained in the pasture of Content Based Medical Image Retrieval (CBMIR for effective utilization of medical images based on visual feature analysis for the purpose of diagnosis and educational research. The existing medical image retrieval systems are still not optimal to solve the feature dimensionality reduction problem which increases the computational complexity and decreases the speed of a retrieval process. The proposed CBMIR is used a hybrid approach based on Feature Extraction, Optimization of Feature Vectors, Classification of Features and Similarity Measurements. This type of CBMIR is called Feature Optimized Classification Similarity (FOCS framework. The selected features are Textures using Gray level Co-occurrence Matrix Features (GLCM and Tamura Features (TF in which extracted features are formed as feature vector database. The Fuzzy based Particle Swarm Optimization (FPSO technique is used to reduce the feature vector dimensionality and classification is performed using Fuzzy based Relevance Vector Machine (FRVM to form groups of relevant image features that provide a natural way to classify dimensionally reduced feature vectors of images. The Euclidean Distance (ED is used as similarity measurement to measure the significance between the query image and the target images. This FOCS approach can get the query from the user and has retrieved the needed images from the databases. The retrieval algorithm performances are estimated in terms of precision and recall. This FOCS framework comprises several benefits when compared to existing CBMIR. GLCM and TF are used to extract texture features and form a feature vector database. Fuzzy-PSO is used to reduce the feature vector dimensionality issues while selecting the important features in the feature vector database in which computational complexity is decreased. Fuzzy based RVM is used for feature classification in which it increases the

  2. Remote sensing image classification based on block feature point density analysis and multiple-feature fusion

    Science.gov (United States)

    Li, Shijin; Jiang, Yaping; Zhang, Yang; Feng, Jun

    2015-10-01

    With the development of remote sensing (RS) and the related technologies, the resolution of RS images is enhancing. Compared with moderate or low resolution images, high-resolution ones can provide more detailed ground information. However, a variety of terrain has complex spatial distribution. The different objectives of high-resolution images have a variety of features. The effectiveness of these features is not the same, but some of them are complementary. Considering the above information and characteristics, a new method is proposed to classify RS images based on hierarchical fusion of multi-features. Firstly, RS images are pre-classified into two categories in terms of whether feature points are uniformly or non-uniformly distributed. Then, the color histogram and Gabor texture feature are extracted from the uniformly-distributed categories, and the linear spatial pyramid matching using sparse coding (ScSPM) feature is obtained from the non-uniformly-distributed categories. Finally, the classification is performed by two support vector machine classifiers. The experimental results on a large RS image database with 2100 images show that the overall classification accuracy is boosted by 10.1% in comparison with the highest accuracy of single feature classification method. Compared with other multiple-feature fusion methods, the proposed method has achieved the highest classification accuracy on this dataset which has reached 90.1%, and the time complexity of the algorithm is also greatly reduced.

  3. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  4. A feature selection approach for identification of signature genes from SAGE data

    Directory of Open Access Journals (Sweden)

    Silva Paulo JS

    2007-05-01

    Full Text Available Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expressi