WorldWideScience

Sample records for based feature selection

  1. CBFS: high performance feature selection algorithm based on feature clearness.

    Directory of Open Access Journals (Sweden)

    Minseok Seo

    Full Text Available BACKGROUND: The goal of feature selection is to select useful features and simultaneously exclude garbage features from a given dataset for classification purposes. This is expected to bring reduction of processing time and improvement of classification accuracy. METHODOLOGY: In this study, we devised a new feature selection algorithm (CBFS based on clearness of features. Feature clearness expresses separability among classes in a feature. Highly clear features contribute towards obtaining high classification accuracy. CScore is a measure to score clearness of each feature and is based on clustered samples to centroid of classes in a feature. We also suggest combining CBFS and other algorithms to improve classification accuracy. CONCLUSIONS/SIGNIFICANCE: From the experiment we confirm that CBFS is more excellent than up-to-date feature selection algorithms including FeaLect. CBFS can be applied to microarray gene selection, text categorization, and image classification.

  2. Rough set-based feature selection method

    Institute of Scientific and Technical Information of China (English)

    ZHAN Yanmei; ZENG Xiangyang; SUN Jincai

    2005-01-01

    A new feature selection method is proposed based on the discern matrix in rough set in this paper. The main idea of this method is that the most effective feature, if used for classification, can distinguish the most number of samples belonging to different classes. Experiments are performed using this method to select relevant features for artificial datasets and real-world datasets. Results show that the selection method proposed can correctly select all the relevant features of artificial datasets and drastically reduce the number of features at the same time. In addition, when this method is used for the selection of classification features of real-world underwater targets,the number of classification features after selection drops to 20% of the original feature set, and the classification accuracy increases about 6% using dataset after feature selection.

  3. A Genetic Algorithm-Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Babatunde Oluleye

    2014-07-01

    Full Text Available This article details the exploration and application of Genetic Algorithm (GA for feature selection. Particularly a binary GA was used for dimensionality reduction to enhance the performance of the concerned classifiers. In this work, hundred (100 features were extracted from set of images found in the Flavia dataset (a publicly available dataset. The extracted features are Zernike Moments (ZM, Fourier Descriptors (FD, Lengendre Moments (LM, Hu 7 Moments (Hu7M, Texture Properties (TP and Geometrical Properties (GP. The main contributions of this article are (1 detailed documentation of the GA Toolbox in MATLAB and (2 the development of a GA-based feature selector using a novel fitness function (kNN-based classification error which enabled the GA to obtain a combinatorial set of feature giving rise to optimal accuracy. The results obtained were compared with various feature selectors from WEKA software and obtained better results in many ways than WEKA feature selectors in terms of classification accuracy

  4. Feature selection with neighborhood entropy-based cooperative game theory.

    Science.gov (United States)

    Zeng, Kai; She, Kun; Niu, Xinzheng

    2014-01-01

    Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT) yield better performance than classical ones.

  5. Feature Selection with Neighborhood Entropy-Based Cooperative Game Theory

    Directory of Open Access Journals (Sweden)

    Kai Zeng

    2014-01-01

    Full Text Available Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT yield better performance than classical ones.

  6. Feature Selection for Neural Network Based Stock Prediction

    Science.gov (United States)

    Sugunnasil, Prompong; Somhom, Samerkae

    We propose a new methodology of feature selection for stock movement prediction. The methodology is based upon finding those features which minimize the correlation relation function. We first produce all the combination of feature and evaluate each of them by using our evaluate function. We search through the generated set with hill climbing approach. The self-organizing map based stock prediction model is utilized as the prediction method. We conduct the experiment on data sets of the Microsoft Corporation, General Electric Co. and Ford Motor Co. The results show that our feature selection method can improve the efficiency of the neural network based stock prediction.

  7. High Dimensional Data Clustering Using Fast Cluster Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Karthikeyan.P

    2014-03-01

    Full Text Available Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST using the Kruskal‟s Algorithm clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Index Terms—

  8. Linear feature selection in texture analysis - A PLS based method

    DEFF Research Database (Denmark)

    Marques, Joselene; Igel, Christian; Lillholm, Martin;

    2013-01-01

    , which first applied a PLS regression to rank the features and then defined the best number of features to retain in the model by an iterative learning phase. The outliers in the dataset, that could inflate the number of selected features, were eliminated by a pre-processing step. To cope...... and considering all CV groups, the methods selected 36 % of the original features available. The diagnosis evaluation reached a generalization area-under-the-ROC curve of 0.92, which was higher than established cartilage-based markers known to relate to OA diagnosis....

  9. Acoustic Event Detection Based on MRMR Selected Feature Vectors

    OpenAIRE

    VOZARIKOVA Eva; Juhar, Jozef; CIZMAR Anton

    2012-01-01

    This paper is focused on the detection of potentially dangerous acoustic events such as gun shots and breaking glass in the urban environment. Various feature extraction methods can be used forrepresenting the sound in the detection system based on Hidden Markov Models of acoustic events. Mel – frequency cepstral coefficients, low - level descriptors defined in MPEG-7 standard and another time andspectral features were considered in the system. For the selection of final subset of features Mi...

  10. Mutual information-based feature selection for radiomics

    Science.gov (United States)

    Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine

    2016-03-01

    Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.

  11. Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2015-01-01

    Full Text Available Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs. Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  12. Feature selection gait-based gender classification under different circumstances

    Science.gov (United States)

    Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

    2014-05-01

    This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.

  13. Feature selection for splice site prediction: A new method using EDA-based feature ranking

    Directory of Open Access Journals (Sweden)

    Rouzé Pierre

    2004-05-01

    Full Text Available Abstract Background The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. Results In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. Conclusion We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.

  14. Feature-based attention across saccades and immediate postsaccadic selection.

    Science.gov (United States)

    Eymond, Cécile; Cavanagh, Patrick; Collins, Thérèse

    2016-07-01

    Before each eye movement, attentional resources are drawn to the saccade goal. This saccade-related attention is known to be spatial in nature, and in this study we asked whether it also evokes any feature selectivity that is maintained across the saccade. After a saccade toward a colored target, participants performed a postsaccadic feature search on an array displayed at landing. The saccade target either had the same color as the search target in the postsaccadic array (congruent trials) or a different color (incongruent or neutral trials). Our results show that the color of the saccade target did not prime the subsequent feature search. This suggests that "landmark search", the process of searching for the saccade target once the eye lands (Deubel in Visual Cognition, 11, 173-202, 2004), may not involve the attentional mechanisms that underlie feature search. We also analyzed intertrial effects and observed priming of pop-out (Maljkovic & Nakayama in Memory & Cognition, 22, 657-672, 1994) for the postsaccadic feature search: the detection of the color singleton became faster when its color was repeated on successive trials. However, search performance revealed no effect of congruency between the saccade and search targets, either within or across trials, suggesting that the priming of pop-out is specific to target repetitions within the same task and is not seen for repetitions across tasks. Our results support a dissociation between feature-based attention and the attentional mechanisms associated with eye movement programming. PMID:27084700

  15. Unsupervised Feature Selection Based on the Distribution of Features Attributed to Imbalanced Data Sets

    Directory of Open Access Journals (Sweden)

    Mina Alibeigi, Sattar Hashemi & Ali Hamzeh

    2011-04-01

    Full Text Available Since dealing with high dimensional data is computationally complex and sometimes evenintractable, recently several feature reduction methods have been developed to reduce thedimensionality of the data in order to simplify the calculation analysis in various applications suchas text categorization, signal processing, image retrieval and gene expressions among manyothers. Among feature reduction techniques, feature selection is one of the most popular methodsdue to the preservation of the original meaning of features. However, most of the current featureselection methods do not have a good performance when fed on imbalanced data sets which arepervasive in real world applications.In this paper, we propose a new unsupervised feature selection method attributed to imbalanceddata sets, which will remove redundant features from the original feature space based on thedistribution of features. To show the effectiveness of the proposed method, popular featureselection methods have been implemented and compared. Experimental results on the severalimbalanced data sets, derived from UCI repository database, illustrate the effectiveness of theproposed method in comparison with other rival methods in terms of both AUC and F1performance measures of 1-Nearest Neighbor and Naïve Bayes classifiers and the percent of theselected features.

  16. Shape-based feature selection for microcalcification evaluation

    Science.gov (United States)

    Marti, Joan; Cufi, Xavier; Regincos, Jordi; Espanol, Josep; Pont, Josep; Barcelo, Carles

    1998-06-01

    This work focuses on the selection of a set of shape-based features in order to assist radiologists in differentiating between malignant and benignant clustered microcalcifications in mammograms. The results obtained allow the creation of a model for the evaluation of the benignant or malignant character of the microcalcifications in a mammogram, based exclusively on the following parameters: number of clusters, number of holes, area, Feret elongation, roughness and elongation. The performance of the classification scheme is close to the mean performance of three expert radiologists, which allows to consider the proposed method for assisting the diagnosis and encourages to continue the investigation in this field. Additionally, the work is based on an unpublished database formed by patients of the Regional Health Area of Girona, which in the future may contribute to increase the digital mammogram databases.

  17. Unsupervised Feature Selection Based on the Morisita Index

    Science.gov (United States)

    Golay, Jean; Kanevski, Mikhail

    2016-04-01

    Recent breakthroughs in technology have radically improved our ability to collect and store data. As a consequence, the size of datasets has been increasing rapidly both in terms of number of variables (or features) and number of instances. Since the mechanism of many phenomena is not well known, too many variables are sampled. A lot of them are redundant and contribute to the emergence of three major challenges in data mining: (1) the complexity of result interpretation, (2) the necessity to develop new methods and tools for data processing, (3) the possible reduction in the accuracy of learning algorithms because of the curse of dimensionality. This research deals with a new algorithm for selecting the smallest subset of features conveying all the information of a dataset (i.e. an algorithm for removing redundant features). It is a new version of the Fractal Dimensionality Reduction (FDR) algorithm [1] and it relies on two ideas: (a) In general, data lie on non-linear manifolds of much lower dimension than that of the spaces where they are embedded. (b) The situation describes in (a) is partly due to redundant variables, since they do not contribute to increasing the dimension of manifolds, called Intrinsic Dimension (ID). The suggested algorithm implements these ideas by selecting only the variables influencing the data ID. Unlike the FDR algorithm, it resorts to a recently introduced ID estimator [2] based on the Morisita index of clustering and to a sequential forward search strategy. Consequently, in addition to its ability to capture non-linear dependences, it can deal with large datasets and its implementation is straightforward in any programming environment. Many real world case studies are considered. They are related to environmental pollution and renewable resources. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158

  18. Target Tracking Feature Selection Algorithm Based on Adaboost

    OpenAIRE

    Chen Yi

    2013-01-01

         With the development of image processing technology and popularization of computer technology, intelligent machine vision technology has a wide range of application in the medical, military, industrial and other fields. Target tracking feature selection algorithm is one of research focuses in the machine intelligent vision technology. Therefore, to design the target tracking feature selection algorithm with high accuracy and good stability is extremely necessary. This paper presents a ta...

  19. Dominant Local Binary Pattern Based Face Feature Selection and Detection

    Directory of Open Access Journals (Sweden)

    Kavitha.T

    2010-04-01

    Full Text Available Face Detection plays a major role in Biometrics.Feature selection is a problem of formidable complexity. Thispaper proposes a novel approach to extract face features forface detection. The LBP features can be extracted faster in asingle scan through the raw image and lie in a lower dimensional space, whilst still retaining facial information efficiently. The LBP features are robust to low-resolution images. The dominant local binary pattern (DLBP is used to extract features accurately. A number of trainable methods are emerging in the empirical practice due to their effectiveness. The proposed method is a trainable system for selecting face features from over-completes dictionaries of imagemeasurements. After the feature selection procedure is completed the SVM classifier is used for face detection. The main advantage of this proposal is that it is trained on a very small training set. The classifier is used to increase the selection accuracy. This is not only advantageous to facilitate the datagathering stage, but, more importantly, to limit the training time. CBCL frontal faces dataset is used for training and validation.

  20. Hybrid Local Feature Selection In DNA Analysis Based Cancer Classification

    OpenAIRE

    Akila, M.; Mr.S.Senthamarai kannan

    2012-01-01

    Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data and increasing learning accuracy. The development of microarray dataset technology has supplied a large volume of data to many fields. In particular, it has been applied to prediction and diagnosis of cancer, so that it helps us to exactly predict and diagnose cancer. To precisely classify cancer we have to select genes related to cancer. The challenging task in ca...

  1. Emotion of Physiological Signals Classification Based on TS Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Wang Yujing; Mo Jianlin

    2015-01-01

    This paper propose a method of TS-MLP about emotion recognition of physiological signal.It can recognize emotion successfully by Tabu search which selects features of emotion’s physiological signals and multilayer perceptron that is used to classify emotion.Simulation shows that it has achieved good emotion classification performance.

  2. Feature selection using a genetic algorithm-based hybrid approach

    Directory of Open Access Journals (Sweden)

    Luis Felipe Giraldo

    2010-04-01

    Full Text Available The present work proposes a hybrid feature selection model aimed at reducing training time whilst maintaining classification accuracy. The model includes adlusting a decision tree for producing feature subsets. Such subsets’ statistical relevance was evaluated from their resulting classification error. Evaluation involved using the k-nearest neighbors’ rule. Dimension reduction techniques usually assume an element of error; however, the hybrid selection model was tuned by means of genetic algorithms in this work. They simultaneously minimise the number of fea- tures and training error. Contrasting with conventional methods, this model also led to quantifying the relevance of each training set’s features. The model was tested on speech signals (hypernasality classification and ECG identification (ischemic cardiopathy. In the case of speech signals, the database consisted of 90 children (45 recordings per sample; the ECG database had 100 electrocardiograph records (50 recordings per sample. Results showed average reduction rates of up to 88%, classification error being less than 6%.

  3. Supervised Feature Subset Selection based on Modified Fuzzy Relative Information Measure for classifier Cart

    OpenAIRE

    K.SAROJINI,; Dr. K.THANGAVEL; D.DEVAKUMARI

    2010-01-01

    Feature subset selection is an essential task in data mining. This paper presents a new method for dealing with supervised feature subset selection based on Modified Fuzzy Relative Information Measure (MFRIM). First, Discretization algorithm is applied to discretize numeric features to construct the membership functions of each fuzzy sets of a feature. Then the proposed MFRIM is applied to select the feature subset focusing on boundary samples. The proposed method can select feature subset wi...

  4. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  5. Supervised Feature Subset Selection based on Modified Fuzzy Relative Information Measure for classifier Cart

    Directory of Open Access Journals (Sweden)

    K.SAROJINI,

    2010-06-01

    Full Text Available Feature subset selection is an essential task in data mining. This paper presents a new method for dealing with supervised feature subset selection based on Modified Fuzzy Relative Information Measure (MFRIM. First, Discretization algorithm is applied to discretize numeric features to construct the membership functions of each fuzzy sets of a feature. Then the proposed MFRIM is applied to select the feature subset focusing on boundary samples. The proposed method can select feature subset with minimum number of features, which are relevant to get higher average classification accuracy for datasets. The experimental results with UCI datasets show that the proposed algorithm is effective and efficient in selecting subset with minimum number of features getting higher average classification accuracy than the consistency based feature subset selection method.

  6. Human motion recognition based on features and models selected HMM

    Science.gov (United States)

    Lu, Haixiang; Zhou, Hongjun

    2015-03-01

    This paper research on the motion recognition based on HMM with Kinect. Kinect provides skeletal data consist of 3D body joints with its lower price and convenience. In this work, several methods are used to determine the optimal subset of features among Cartesian coordinates, distance to hip center, velocity, angle and angular velocity, in order to improve the recognition rate. K-means is used for vector quantization and HMM is used as recognition method. HMM is an effective signal processing method which contains time calibration, provides a learning mechanism and recognition ability. Cluster numbers of K-means, structure and state numbers of HMM are optimized as well. The proposed methods are applied to the MSR Action3D dataset. Results show that the proposed methods obtain better recognition accuracy than the state of the art methods.

  7. Soft computing based feature selection for environmental sound classification

    NARCIS (Netherlands)

    Shakoor, A.; May, T.M.; Van Schijndel, N.H.

    2010-01-01

    Environmental sound classification has a wide range of applications,like hearing aids, mobile communication devices, portable media players, and auditory protection devices. Sound classification systemstypically extract features from the input sound. Using too many features increases complexity unne

  8. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    Science.gov (United States)

    Wang, Xiaojia; Mao, Qirong; Zhan, Yongzhao

    2008-11-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  9. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    International Nuclear Information System (INIS)

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  10. Feature selection method based on multi-fractal dimension and harmony search algorithm and its application

    Science.gov (United States)

    Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na

    2016-10-01

    Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.

  11. Lasso based feature selection for malaria risk exposure prediction

    OpenAIRE

    Kouwayè, Bienvenue; Fonton, Noël; Rossi, Fabrice

    2015-01-01

    In life sciences, the experts generally use empirical knowledge to recode variables, choose interactions and perform selection by classical approach. The aim of this work is to perform automatic learning algorithm for variables selection which can lead to know if experts can be help in they decision or simply replaced by the machine and improve they knowledge and results. The Lasso method can detect the optimal subset of variables for estimation and prediction under some conditions. In this p...

  12. A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

    Directory of Open Access Journals (Sweden)

    Yong Wang

    2016-02-01

    Full Text Available Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

  13. Dynamic frequency feature selection based approach for classification of motor imageries.

    Science.gov (United States)

    Luo, Jing; Feng, Zuren; Zhang, Jun; Lu, Na

    2016-08-01

    Electroencephalography (EEG) is one of the most popular techniques to record the brain activities such as motor imagery, which is of low signal-to-noise ratio and could lead to high classification error. Therefore, selection of the most discriminative features could be crucial to improve the classification performance. However, the traditional feature selection methods employed in brain-computer interface (BCI) field (e.g. Mutual Information-based Best Individual Feature (MIBIF), Mutual Information-based Rough Set Reduction (MIRSR) and cross-validation) mainly focus on the overall performance on all the trials in the training set, and thus may have very poor performance on some specific samples, which is not acceptable. To address this problem, a novel sequential forward feature selection approach called Dynamic Frequency Feature Selection (DFFS) is proposed in this paper. The DFFS method emphasized the importance of the samples that got misclassified while only pursuing high overall classification performance. In the DFFS based classification scheme, the EEG data was first transformed to frequency domain using Wavelet Packet Decomposition (WPD), which is then employed as the candidate set for further discriminatory feature selection. The features are selected one by one in a boosting manner. After one feature being selected, the importance of the correctly classified samples based on the feature will be decreased, which is equivalent to increasing the importance of the misclassified samples. Therefore, a complement feature to the current features could be selected in the next run. The selected features are then fed to a classifier trained by random forest algorithm. Finally, a time series voting-based method is utilized to improve the classification performance. Comparisons between the DFFS-based approach and state-of-art methods on BCI competition IV data set 2b have been conducted, which have shown the superiority of the proposed algorithm. PMID:27253616

  14. Feature selection based on mutual information and redundancy-synergy coefficient

    Institute of Scientific and Technical Information of China (English)

    杨胜; 顾钧

    2004-01-01

    Mutual information is an important information measure for feature subset. In this paper, a hashing mechanism is proposed to calculate the mutual information on the feature subset. Redundancy-synergy coefficient, a novel redundancy and synergy measure of features to express the class feature, is defined by mutual information. The information maximization rule was applied to derive the heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient. Our experiment results showed the good performance of the new feature selection method.

  15. Regression-Based Feature Selection on Large Scale Human Activity Recognition

    Directory of Open Access Journals (Sweden)

    Hussein Mazaar

    2016-02-01

    Full Text Available In this paper, we present an approach for regression-based feature selection in human activity recognition. Due to high dimensional features in human activity recognition, the model may have over-fitting and can’t learn parameters well. Moreover, the features are redundant or irrelevant. The goal is to select important discriminating features to recognize the human activities in videos. R-Squared regression criterion can identify the best features based on the ability of a feature to explain the variations in the target class. The features are significantly reduced, nearly by 99.33%, resulting in better classification accuracy. Support Vector Machine with a linear kernel is used to classify the activities. The experiments are tested on UCF50 dataset. The results show that the proposed model significantly outperforms state-of-the-art methods.

  16. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

    Science.gov (United States)

    Mortazavi, Atiyeh; Moattar, Mohammad Hossein

    2016-01-01

    High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  17. Different Cortical Mechanisms for Spatial vs. Feature-Based Attentional Selection in Visual Working Memory

    Science.gov (United States)

    Heuer, Anna; Schubö, Anna; Crawford, J. D.

    2016-01-01

    The limited capacity of visual working memory (VWM) necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial vs. feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS) to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information) or their shape (featural information). We found that TMS over the supramarginal gyrus (SMG) selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex (LO) selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in terms of attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations. PMID:27582701

  18. Different cortical mechanisms for spatial vs. feature-based attentional selection in visual working memory

    Directory of Open Access Journals (Sweden)

    Anna Heuer

    2016-08-01

    Full Text Available The limited capacity of visual working memory necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial versus feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information or their shape (featural information. We found that TMS over the supramarginal gyrus (SMG selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations.

  19. Analysis of Different Feature Selection Criteria Based on a Covariance Convergence Perspective for a SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando A. Auat Cheein

    2010-12-01

    Full Text Available This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment  composed by trees, although the results shown herein are not restricted to a special type of features.

  20. A SEMI-OPEN-LOOP CODING MODE SELECTION ALGORITHM BASED ON EFM AND SELECTED AMR-WB+ FEATURES

    Institute of Scientific and Technical Information of China (English)

    Hong Ying; Zhao Shenghui; Kuang Jingming

    2009-01-01

    To solve the problems of the AMR-WB+ (Extended Adaptive Multi-Rate-WideBand) semi-open-loop coding mode selection algorithm, features for ACELP (Algebraic Code Excited Linear Prediction) and TCX (Transform Coded eXcitation) classification are investigated. 11 classifying features in the AMR-WB+ codec are selected and 2 novel classifying features, i.e., EFM (Energy Flatness Measurement) and stdEFM (standard deviation of EFM), are proposed. Consequently, a novel semi-open-loop mode selection algorithm based on EFM and selected AMR-WB+ features is proposed. The results of classifying test and listening test show that the performance of the novel algorithm is much better than that of the AMR-WB+ semi-open-loop coding mode selection algorithm.

  1. Feature subset selection based on mahalanobis distance: a statistical rough set method

    Institute of Scientific and Technical Information of China (English)

    Sun Liang; Han Chongzhao

    2008-01-01

    In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalanobis distance, the effective feature subsets are obtained as generalized attribute reducts. Experiment result shows that the classification performance can be improved by using the selected feature subsets.

  2. Biometric hashing for handwriting: entropy-based feature selection and semantic fusion

    Science.gov (United States)

    Scheidat, Tobias; Vielhauer, Claus

    2008-02-01

    Some biometric algorithms lack of the problem of using a great number of features, which were extracted from the raw data. This often results in feature vectors of high dimensionality and thus high computational complexity. However, in many cases subsets of features do not contribute or with only little impact to the correct classification of biometric algorithms. The process of choosing more discriminative features from a given set is commonly referred to as feature selection. In this paper we present a study on feature selection for an existing biometric hash generation algorithm for the handwriting modality, which is based on the strategy of entropy analysis of single components of biometric hash vectors, in order to identify and suppress elements carrying little information. To evaluate the impact of our feature selection scheme to the authentication performance of our biometric algorithm, we present an experimental study based on data of 86 users. Besides discussing common biometric error rates such as Equal Error Rates, we suggest a novel measurement to determine the reproduction rate probability for biometric hashes. Our experiments show that, while the feature set size may be significantly reduced by 45% using our scheme, there are marginal changes both in the results of a verification process as well as in the reproducibility of biometric hashes. Since multi-biometrics is a recent topic, we additionally carry out a first study on a pair wise multi-semantic fusion based on reduced hashes and analyze it by the introduced reproducibility measure.

  3. Improving Image steganalysis performance using a graph-based feature selection method

    Directory of Open Access Journals (Sweden)

    Amir Nouri

    2016-05-01

    Full Text Available Steganalysis is the skill of discovering the use of steganography algorithms within an image with low or no information regarding the steganography algorithm or/and its parameters. The high-dimensionality of image data with small number of samples has presented a difficult challenge for the steganalysis task. Several methods have been presented to improve the steganalysis performance by feature selection. Feature selection, also known as variable selection, is one of the fundamental problems in the fields of machine learning, pattern recognition and statistics. The aim of feature selection is to reduce the dimensionality of image data in order to enhance the accuracy of Steganalysis task. In this paper, we have proposed a new graph-based blind steganalysis method for detecting stego images from the cover images in JPEG images using a feature selection technique based on community detection. The experimental results show that the proposed approach is easy to be employed for steganalysis purposes. Moreover, performance of proposed method is better than several recent and well-known feature selection-based Image steganalysis methods.

  4. Artificial immune system based on adaptive clonal selection for feature selection and parameters optimisation of support vector machines

    Science.gov (United States)

    Sadat Hashemipour, Maryam; Soleimani, Seyed Ali

    2016-01-01

    Artificial immune system (AIS) algorithm based on clonal selection method can be defined as a soft computing method inspired by theoretical immune system in order to solve science and engineering problems. Support vector machine (SVM) is a popular pattern classification method with many diverse applications. Kernel parameter setting in the SVM training procedure along with the feature selection significantly impacts on the classification accuracy rate. In this study, AIS based on Adaptive Clonal Selection (AISACS) algorithm has been used to optimise the SVM parameters and feature subset selection without degrading the SVM classification accuracy. Several public datasets of University of California Irvine machine learning (UCI) repository are employed to calculate the classification accuracy rate in order to evaluate the AISACS approach then it was compared with grid search algorithm and Genetic Algorithm (GA) approach. The experimental results show that the feature reduction rate and running time of the AISACS approach are better than the GA approach.

  5. Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation

    CERN Document Server

    Tangaro, Sabina; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Inglese, Paolo; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto

    2015-01-01

    Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic Resonance Imaging (MRI) scans can show these variations and therefore be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach, for each voxel a number of local features were calculated. In this paper we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) Sequential Forward Selection and (iii) Sequential Backward Elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects...

  6. Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification

    CERN Document Server

    Nongmeikapam, Kishorjit; 10.5121/ijcsit.2011.350

    2011-01-01

    This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an important factor in the recognition of Manipuri MWEs using Conditional Random Field (CRF). The disadvantage of manual selection and choosing of the appropriate features for running CRF motivates us to think of Genetic Algorithm (GA). Using GA we are able to find the optimal features to run the CRF. We have tried with fifty generations in feature selection along with three fold cross validation as fitness function. This model demonstrated the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%, showing an improvement over the CRF based Manipuri MWE identification without GA application.

  7. Feature selection by separability assessment of input spaces for transient stability classification based on neural networks

    Energy Technology Data Exchange (ETDEWEB)

    Tso, S.K. [City University of Hong Kong (China). Dept. of Manufacturing Engineering; Gu, X.P. [North China Electric Power University, Baoding (China). Dept. of Electrical Engineering

    2004-03-01

    Power system transient-stability assessment based on neural networks can usually be treated as a two-pattern classification problem separating the stable class from the unstable class. In such a classification problem, the feature extraction and selection is the first important task to be carried out. A new approach of feature selection is presented using a new separability measure in this paper. Through finding the 'inconsistent cases' in a sample set, a separability index of input spaces is defined. Using the defined separability index as criterion, the breadth-first searching technique is employed to find the minimal or optimal subsets of the initial feature set. The numerical results based on extensive data obtained for the 10-unit 39-bus New England power system demonstrate the effectiveness of the proposed approach in extracting the 'best combination' of features for improving the quality of transient-stability classification. (author)

  8. DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm

    KAUST Repository

    Soufan, Othman

    2015-02-26

    Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem\\'s dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filteringmethods thatmay be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.

  9. Attentional spreading to task-irrelevant object features: Experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation

    Directory of Open Access Journals (Sweden)

    Detlef eWegener

    2014-06-01

    Full Text Available Directing attention to a specific feature of an object has been linked to different forms of attentional modulation. Object-based attention theory founds on the finding that even task-irrelevant features at the selected object are subject to attentional modulation, while feature-based attention theory proposes a global processing benefit for the selected feature even at other objects. Most studies investigated either the one or the other form of attention, leaving open the possibility that both object- and feature-specific attentional effects do occur at the same time and may just represent two sides of a single attention system. We here investigate this issue by testing attentional spreading within and across objects, using reaction time measurements to changes of attended and unattended features on both attended and unattended objects. We asked subjects to report color and speed changes occurring on one of two overlapping random dot patterns, presented at the center of gaze. The key property of the stimulation was that only one of the features (e.g. motion direction was unique for each object, whereas the other feature (e.g. color was shared by both. The results of two experiments show that co-selection of unattended features even occurs when those features have no means for selecting the object. At the same time, they demonstrate that this processing benefit is not restricted to the selected object but spreads to the task-irrelevant one. We conceptualize these findings by a 3-step model of attention that assumes a task-dependent top-down gain, object-specific feature selection based on task- and binding characteristics, and a global feature-specific processing enhancement. The model allows for the unification of a vast amount of experimental results into a single model, and makes various experimentally testable predictions for the interaction of object- and feature-specific processes.

  10. An ant colony optimization based feature selection for web page classification.

    Science.gov (United States)

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678

  11. A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms.

    Science.gov (United States)

    Zhou, Hongfang; Guo, Jie; Wang, Yinghui; Zhao, Minghua

    2016-01-01

    Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF, t-Test, and CMFS algorithms.

  12. A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms

    Science.gov (United States)

    Wang, Yinghui; Zhao, Minghua

    2016-01-01

    Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF, t-Test, and CMFS algorithms.

  13. MINT: Mutual Information Based Transductive Feature Selection for Genetic Trait Prediction.

    Science.gov (United States)

    He, Dan; Rish, Irina; Haws, David; Parida, Laxmi

    2016-01-01

    Whole genome prediction of complex phenotypic traits using high-density genotyping arrays has attracted a lot of attention, as it is relevant to the fields of plant and animal breeding and genetic epidemiology. Since the number of genotypes is generally much bigger than the number of samples, predictive models suffer from the curse of dimensionality. The curse of dimensionality problem not only affects the computational efficiency of a particular genomic selection method, but can also lead to a poor performance, mainly due to possible overfitting, or un-informative features. In this work, we propose a novel transductive feature selection method, called MINT, which is based on the MRMR (Max-Relevance and Min-Redundancy) criterion. We apply MINT on genetic trait prediction problems and show that, in general, MINT is a better feature selection method than the state-of-the-art inductive method MRMR. PMID:27295642

  14. An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model

    KAUST Repository

    Soufan, Othman

    2012-09-01

    Feature selection is the first task of any learning approach that is applied in major fields of biomedical, bioinformatics, robotics, natural language processing and social networking. In feature subset selection problem, a search methodology with a proper criterion seeks to find the best subset of features describing data (relevance) and achieving better performance (optimality). Wrapper approaches are feature selection methods which are wrapped around a classification algorithm and use a performance measure to select the best subset of features. We analyze the proper design of the objective function for the wrapper approach and highlight an objective based on several classification algorithms. We compare the wrapper approaches to different feature selection methods based on distance and information based criteria. Significant improvement in performance, computational time, and selection of minimally sized feature subsets is achieved by combining different objectives for the wrapper model. In addition, considering various classification methods in the feature selection process could lead to a global solution of desirable characteristics.

  15. Feature subset selection based on mahalanobis distance: a statistical rough set method

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalan...

  16. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    Science.gov (United States)

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-01-01

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346

  17. Feature selection in bioinformatics

    Science.gov (United States)

    Wang, Lipo

    2012-06-01

    In bioinformatics, there are often a large number of input features. For example, there are millions of single nucleotide polymorphisms (SNPs) that are genetic variations which determine the dierence between any two unrelated individuals. In microarrays, thousands of genes can be proled in each test. It is important to nd out which input features (e.g., SNPs or genes) are useful in classication of a certain group of people or diagnosis of a given disease. In this paper, we investigate some powerful feature selection techniques and apply them to problems in bioinformatics. We are able to identify a very small number of input features sucient for tasks at hand and we demonstrate this with some real-world data.

  18. Video Stabilization Based on Feature Trajectory Augmentation and Selection and Robust Mesh Grid Warping.

    Science.gov (United States)

    Koh, Yeong Jun; Lee, Chulwoo; Kim, Chang-Su

    2015-12-01

    We propose a video stabilization algorithm, which extracts a guaranteed number of reliable feature trajectories for robust mesh grid warping. We first estimate feature trajectories through a video sequence and transform the feature positions into rolling-free smoothed positions. When the number of the estimated trajectories is insufficient, we generate virtual trajectories by augmenting incomplete trajectories using a low-rank matrix completion scheme. Next, we detect feature points on a large moving object and exclude them so as to stabilize camera movements, rather than object movements. With the selected feature points, we set a mesh grid on each frame and warp each grid cell by moving the original feature positions to the smoothed ones. For robust warping, we formulate a cost function based on the reliability weights of each feature point and each grid cell. The cost function consists of a data term, a structure-preserving term, and a regularization term. By minimizing the cost function, we determine the robust mesh grid warping and achieve the stabilization. Experimental results demonstrate that the proposed algorithm reconstructs videos more stably than the conventional algorithms. PMID:26394425

  19. Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface

    Directory of Open Access Journals (Sweden)

    Michael H. Thaut

    2005-11-01

    Full Text Available Most EEG-based BCI systems make use of well-studied patterns of brain activity. However, those systems involve tasks that indirectly map to simple binary commands such as “yes” or “no” or require many weeks of biofeedback training. We hypothesized that signal processing and machine learning methods can be used to discriminate EEG in a direct “yes”/“no” BCI from a single session. Blind source separation (BSS and spectral transformations of the EEG produced a 180-dimensional feature space. We used a modified genetic algorithm (GA wrapped around a support vector machine (SVM classifier to search the space of feature subsets. The GA-based search found feature subsets that outperform full feature sets and random feature subsets. Also, BSS transformations of the EEG outperformed the original time series, particularly in conjunction with a subset search of both spaces. The results suggest that BSS and feature selection can be used to improve the performance of even a “direct,” single-session BCI.

  20. A kernel-based multivariate feature selection method for microarray data classification.

    Directory of Open Access Journals (Sweden)

    Shiquan Sun

    Full Text Available High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, [Formula: see text]-nearest neighbor on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.

  1. Selection of Entropy Based Features for Automatic Analysis of Essential Tremor

    Directory of Open Access Journals (Sweden)

    Karmele López-de-Ipiña

    2016-05-01

    Full Text Available Biomedical systems produce biosignals that arise from interaction mechanisms. In a general form, those mechanisms occur across multiple scales, both spatial and temporal, and contain linear and non-linear information. In this framework, entropy measures are good candidates in order provide useful evidence about disorder in the system, lack of information in time-series and/or irregularity of the signals. The most common movement disorder is essential tremor (ET, which occurs 20 times more than Parkinson’s disease. Interestingly, about 50%–70% of the cases of ET have a genetic origin. One of the most used standard tests for clinical diagnosis of ET is Archimedes’ spiral drawing. This work focuses on the selection of non-linear biomarkers from such drawings and handwriting, and it is part of a wider cross study on the diagnosis of essential tremor, where our piece of research presents the selection of entropy features for early ET diagnosis. Classic entropy features are compared with features based on permutation entropy. Automatic analysis system settled on several Machine Learning paradigms is performed, while automatic features selection is implemented by means of ANOVA (analysis of variance test. The obtained results for early detection are promising and appear applicable to real environments.

  2. A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure.

    Science.gov (United States)

    Naghibi, Tofigh; Hoffmann, Sarah; Pfister, Beat

    2015-08-01

    Feature subset selection, as a special case of the general subset selection problem, has been the topic of a considerable number of studies due to the growing importance of data-mining applications. In the feature subset selection problem there are two main issues that need to be addressed: (i) Finding an appropriate measure function than can be fairly fast and robustly computed for high-dimensional data. (ii) A search strategy to optimize the measure over the subset space in a reasonable amount of time. In this article mutual information between features and class labels is considered to be the measure function. Two series expansions for mutual information are proposed, and it is shown that most heuristic criteria suggested in the literature are truncated approximations of these expansions. It is well-known that searching the whole subset space is an NP-hard problem. Here, instead of the conventional sequential search algorithms, we suggest a parallel search strategy based on semidefinite programming (SDP) that can search through the subset space in polynomial time. By exploiting the similarities between the proposed algorithm and an instance of the maximum-cut problem in graph theory, the approximation ratio of this algorithm is derived and is compared with the approximation ratio of the backward elimination method. The experiments show that it can be misleading to judge the quality of a measure solely based on the classification accuracy, without taking the effect of the non-optimum search strategy into account. PMID:26352993

  3. Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection.

    Science.gov (United States)

    Kim, Sungho; Song, Woo-Jin; Kim, So-Hyun

    2016-01-01

    Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR) images or infrared (IR) images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT) and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter) and an asymmetric morphological closing filter (AMCF, post-filter) into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC)-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic database generated

  4. Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Sungho Kim

    2016-07-01

    Full Text Available Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR images or infrared (IR images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter and an asymmetric morphological closing filter (AMCF, post-filter into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic

  5. Feature selection of fMRI data based on normalized mutual information and fisher discriminant ratio.

    Science.gov (United States)

    Wang, Yanbin; Ji, Junzhong; Liang, Peipeng

    2016-03-17

    Pattern classification has been increasingly used in functional magnetic resonance imaging (fMRI) data analysis. However, the classification performance is restricted by the high dimensional property and noises of the fMRI data. In this paper, a new feature selection method (named as "NMI-F") was proposed by sequentially combining the normalized mutual information (NMI) and fisher discriminant ratio. In NMI-F, the normalized mutual information was firstly used to evaluate the relationships between features, and fisher discriminant ratio was then applied to calculate the importance of each feature involved. Two fMRI datasets (task-related and resting state) were used to test the proposed method. It was found that classification base on the NMI-F method could differentiate the brain cognitive and disease states effectively, and the proposed NMI-F method was prior to the other related methods. The current results also have implications to the future studies. PMID:27257882

  6. Adaptive Feature Selection and Extraction Approaches for Image Retrieval based on Region

    Directory of Open Access Journals (Sweden)

    Haiyu Song

    2010-02-01

    Full Text Available Image retrieval based on region is one of the most promising and active research directions in recent year's CBIR, while region segmentation, feature selection and feature extraction of region are key issues. However, the existing approaches always adopt a uniform approach of segmentation and feature extraction for all images in the same system. In this paper, we propose adaptive image segmentation and feature extraction approach according to different category image for image retrieval system. To improve performance, we propose adaptive segmentation approach according to different category image. Textured image is segmented by Gaussian Mixture Models (GMM, while non-textured image is segmented by our proposed block-based normalized cut. To accurately describe feature of region, we propose weight assignment method for centroid pixel and its neighbor by convolution with normal distribution when image segmentation by GMM. To improve generalization, we propose adaptive number of Fourier descriptors of shape signature which depends on the energy distribution of Fourier descriptors, instead of fixed number by experience. To simply and efficiently describe the spatial relationships of multi-object or multi-region in same image, we apply simplified topological relationships. The experiments demonstrate that proposed approaches are superior to the traditional approaches.

  7. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    Directory of Open Access Journals (Sweden)

    Liao Li

    2010-10-01

    Full Text Available Abstract Background Protein-protein interaction (PPI plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs, based on domains represented as interaction profile hidden Markov models (ipHMM where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB. Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD. Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure, an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on

  8. Data Visualization and Feature Selection Methods in Gel-based Proteomics

    DEFF Research Database (Denmark)

    Silva, Tomé Santos; Richard, Nadege; Dias, Jorge P.;

    2014-01-01

    Despite the increasing popularity of gel-free proteomic strategies, two-dimensional gel electrophoresis (2DE) is still the most widely used approach in top-down proteomic studies, for all sorts of biological models. In order to achieve meaningful biological insight using 2DE approaches, importance......-based proteomics, summarizing the current state of research within this field. Particular focus is given on discussing the usefulness of available multivariate analysis tools both for data visualization and feature selection purposes. Visual examples are given using a real gel-based proteomic dataset as basis....

  9. GalNAc-transferase specificity prediction based on feature selection method.

    Science.gov (United States)

    Lu, Lin; Niu, Bing; Zhao, Jun; Liu, Liang; Lu, Wen-Cong; Liu, Xiao-Jun; Li, Yi-Xue; Cai, Yu-Dong

    2009-02-01

    GalNAc-transferase can catalyze the biosynthesis of O-linked oligosaccharides. The specificity of GalNAc-transferase is composed of nine amino acid residues denoted by R4, R3, R2, R1, R0, R1', R2', R3', R4'. To predict whether the reducing monosaccharide will be covalently linked to the central residue R0(Ser or Thr), a new method based on feature selection has been proposed in our work. 277 nonapeptides from reference [Chou KC. A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase. Protein Sci 1995;4:1365-83] are chosen for training set. Each nonapeptide is represented by hundreds of amino acid properties collected by Amino Acid Index database (http://www.genome.jp/aaindex) and transformed into a numeric vector with 4554 features. The Maximum Relevance Minimum Redundancy (mRMR) method combining with Incremental Feature Selection (IFS) and Feature Forward Selection (FFS) are then applied for feature selection. Nearest Neighbor Algorithm (NNA) is used to build prediction models. The optimal model contains 54 features and its correct rate tested by Jackknife cross-validation test reaches 91.34%. Final feature analysis indicates that amino acid residues at position R3' play the most important role in the recognition of GalNAc-transferase specificity, which were confirmed by the experiments [Elhammer AP, Poorman RA, Brown E, Maggiora LL, Hoogerheide JG, Kezdy FJ. The specificity of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase as inferred from a database of in vivo substrates and from the in vitro glycosylation of proteins and peptides. J Biol Chem 1993;268:10029-38; O'Connell BC, Hagen FK, Tabak LA. The influence of flanking sequence on the O-glycosylation of threonine in vitro. J Biol Chem 1992;267:25010-8; Yoshida A, Suzuki M, Ikenaga H, Takeuchi M. Discovery of the shortest sequence motif for high level mucin-type O-glycosylation. J Biol Chem 1997;272:16884-8]. Our method can be used as a tool for predicting O

  10. Feature and Score Fusion Based Multiple Classifier Selection for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Md. Rabiul Islam

    2014-01-01

    Full Text Available The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al.

  11. Feature and score fusion based multiple classifier selection for iris recognition.

    Science.gov (United States)

    Islam, Md Rabiul

    2014-01-01

    The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al. PMID:25114676

  12. Rough-fuzzy clustering and unsupervised feature selection for wavelet based MR image segmentation.

    Directory of Open Access Journals (Sweden)

    Pradipta Maji

    Full Text Available Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of brain magnetic resonance (MR images. For many human experts, manual segmentation is a difficult and time consuming task, which makes an automated brain MR image segmentation method desirable. In this regard, this paper presents a new segmentation method for brain MR images, integrating judiciously the merits of rough-fuzzy computing and multiresolution image analysis technique. The proposed method assumes that the major brain tissues, namely, gray matter, white matter, and cerebrospinal fluid from the MR images are considered to have different textural properties. The dyadic wavelet analysis is used to extract the scale-space feature vector for each pixel, while the rough-fuzzy clustering is used to address the uncertainty problem of brain MR image segmentation. An unsupervised feature selection method is introduced, based on maximum relevance-maximum significance criterion, to select relevant and significant textural features for segmentation problem, while the mathematical morphology based skull stripping preprocessing step is proposed to remove the non-cerebral tissues like skull. The performance of the proposed method, along with a comparison with related approaches, is demonstrated on a set of synthetic and real brain MR images using standard validity indices.

  13. Multi-Stage Feature Selection Based Intelligent Classifier for Classification of Incipient Stage Fire in Building

    Directory of Open Access Journals (Sweden)

    Allan Melvin Andrew

    2016-01-01

    Full Text Available In this study, an early fire detection algorithm has been proposed based on low cost array sensing system, utilising off- the shelf gas sensors, dust particles and ambient sensors such as temperature and humidity sensor. The odour or “smellprint” emanated from various fire sources and building construction materials at early stage are measured. For this purpose, odour profile data from five common fire sources and three common building construction materials were used to develop the classification model. Normalised feature extractions of the smell print data were performed before subjected to prediction classifier. These features represent the odour signals in the time domain. The obtained features undergo the proposed multi-stage feature selection technique and lastly, further reduced by Principal Component Analysis (PCA, a dimension reduction technique. The hybrid PCA-PNN based approach has been applied on different datasets from in-house developed system and the portable electronic nose unit. Experimental classification results show that the dimension reduction process performed by PCA has improved the classification accuracy and provided high reliability, regardless of ambient temperature and humidity variation, baseline sensor drift, the different gas concentration level and exposure towards different heating temperature range.

  14. QSAR modeling for quinoxaline derivatives using genetic algorithm and simulated annealing based feature selection.

    Science.gov (United States)

    Ghosh, P; Bagchi, M C

    2009-01-01

    With a view to the rational design of selective quinoxaline derivatives, 2D and 3D-QSAR models have been developed for the prediction of anti-tubercular activities. Successful implementation of a predictive QSAR model largely depends on the selection of a preferred set of molecular descriptors that can signify the chemico-biological interaction. Genetic algorithm (GA) and simulated annealing (SA) are applied as variable selection methods for model development. 2D-QSAR modeling using GA or SA based partial least squares (GA-PLS and SA-PLS) methods identified some important topological and electrostatic descriptors as important factor for tubercular activity. Kohonen network and counter propagation artificial neural network (CP-ANN) considering GA and SA based feature selection methods have been applied for such QSAR modeling of Quinoxaline compounds. Out of a variable pool of 380 molecular descriptors, predictive QSAR models are developed for the training set and validated on the test set compounds and a comparative study of the relative effectiveness of linear and non-linear approaches has been investigated. Further analysis using 3D-QSAR technique identifies two models obtained by GA-PLS and SA-PLS methods leading to anti-tubercular activity prediction. The influences of steric and electrostatic field effects generated by the contribution plots are discussed. The results indicate that SA is a very effective variable selection approach for such 3D-QSAR modeling.

  15. Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

    Science.gov (United States)

    Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

    2016-06-01

    Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.

  16. Genetic Particle Swarm Optimization-Based Feature Selection for Very-High-Resolution Remotely Sensed Imagery Object Change Detection.

    Science.gov (United States)

    Chen, Qiang; Chen, Yunhao; Jiang, Weiguo

    2016-01-01

    In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm. PMID:27483285

  17. Research on Methods for Discovering and Selecting Cloud Infrastructure Services Based on Feature Modeling

    Directory of Open Access Journals (Sweden)

    Huamin Zhu

    2016-01-01

    Full Text Available Nowadays more and more cloud infrastructure service providers are providing large numbers of service instances which are a combination of diversified resources, such as computing, storage, and network. However, for cloud infrastructure services, the lack of a description standard and the inadequate research of systematic discovery and selection methods have exposed difficulties in discovering and choosing services for users. First, considering the highly configurable properties of a cloud infrastructure service, the feature model method is used to describe such a service. Second, based on the description of the cloud infrastructure service, a systematic discovery and selection method for cloud infrastructure services are proposed. The automatic analysis techniques of the feature model are introduced to verify the model’s validity and to perform the matching of the service and demand models. Finally, we determine the critical decision metrics and their corresponding measurement methods for cloud infrastructure services, where the subjective and objective weighting results are combined to determine the weights of the decision metrics. The best matching instances from various providers are then ranked by their comprehensive evaluations. Experimental results show that the proposed methods can effectively improve the accuracy and efficiency of cloud infrastructure service discovery and selection.

  18. A Soft Computing Based Approach Using Modified Selection Strategy for Feature Reduction of Medical Systems

    Directory of Open Access Journals (Sweden)

    Kursat Zuhtuogullari

    2013-01-01

    Full Text Available The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  19. A soft computing based approach using modified selection strategy for feature reduction of medical systems.

    Science.gov (United States)

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  20. A soft computing based approach using modified selection strategy for feature reduction of medical systems.

    Science.gov (United States)

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data. PMID:23573172

  1. AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE SELECTION

    Directory of Open Access Journals (Sweden)

    Ashalata Panigrahi

    2015-06-01

    Full Text Available With the increase in Internet users the number of malicious users are also growing day-by-day posing a serious problem in distinguishing between normal and abnormal behavior of users in the network. This has led to the research area of intrusion detection which essentially analyzes the network traffic and tries to determine normal and abnormal patterns of behavior.In this paper, we have analyzed the standard NSL-KDD intrusion dataset using some neural network based techniques for predicting possible intrusions. Four most effective classification methods, namely, Radial Basis Function Network, SelfOrganizing Map, Sequential Minimal Optimization, and Projective Adaptive Resonance Theory have been applied. In order to enhance the performance of the classifiers, three entropy based feature selection methods have been applied as preprocessing of data. Performances of different combinations of classifiers and attribute reduction methods have also been compared.

  2. Electrocardiogram Based Identification using a New Effective Intelligent Selection of Fused Features

    OpenAIRE

    Abbaspour, Hamidreza; Razavi, Seyyed Mohammad; Mehrshad, Nasser

    2015-01-01

    Over the years, the feasibility of using Electrocardiogram (ECG) signal for human identification issue has been investigated, and some methods have been suggested. In this research, a new effective intelligent feature selection method from ECG signals has been proposed. This method is developed in such a way that it is able to select important features that are necessary for identification using analysis of the ECG signals. For this purpose, after ECG signal preprocessing, its characterizing ...

  3. A New Feature Selection Algorithm Based on the Mean Impact Variance

    Directory of Open Access Journals (Sweden)

    Weidong Cheng

    2014-01-01

    Full Text Available The selection of fewer or more representative features from multidimensional features is important when the artificial neural network (ANN algorithm is used as a classifier. In this paper, a new feature selection method called the mean impact variance (MIVAR method is proposed to determine the feature that is more suitable for classification. Moreover, this method is constructed on the basis of the training process of the ANN algorithm. To verify the effectiveness of the proposed method, the MIVAR value is used to rank the multidimensional features of the bearing fault diagnosis. In detail, (1 70-dimensional all waveform features are extracted from a rolling bearing vibration signal with four different operating states, (2 the corresponding MIVAR values of all 70-dimensional features are calculated to rank all features, (3 14 groups of 10-dimensional features are separately generated according to the ranking results and the principal component analysis (PCA algorithm and a back propagation (BP network is constructed, and (4 the validity of the ranking result is proven by training this BP network with these seven groups of 10-dimensional features and by comparing the corresponding recognition rates. The results prove that the features with larger MIVAR value can lead to higher recognition rates.

  4. Genetic Fuzzy System (GFS based wavelet co-occurrence feature selection in mammogram classification for breast cancer diagnosis

    Directory of Open Access Journals (Sweden)

    Meenakshi M. Pawar

    2016-09-01

    Full Text Available Breast cancer is significant health problem diagnosed mostly in women worldwide. Therefore, early detection of breast cancer is performed with the help of digital mammography, which can reduce mortality rate. This paper presents wrapper based feature selection approach for wavelet co-occurrence feature (WCF using Genetic Fuzzy System (GFS in mammogram classification problem. The performance of GFS algorithm is explained using mini-MIAS database. WCF features are obtained from detail wavelet coefficients at each level of decomposition of mammogram image. At first level of decomposition, 18 features are applied to GFS algorithm, which selects 5 features with an average classification success rate of 39.64%. Subsequently, at second level it selects 9 features from 36 features and the classification success rate is improved to 56.75%. For third level, 16 features are selected from 54 features and average success rate is improved to 64.98%. Lastly, at fourth level 72 features are applied to GFS, which selects 16 features and thereby increasing average success rate to 89.47%. Hence, GFS algorithm is the effective way of obtaining optimal set of feature in breast cancer diagnosis.

  5. TOPSIS Based Multi-Criteria Decision Making of Feature Selection Techniques for Network Traffic Dataset

    Directory of Open Access Journals (Sweden)

    Raman Singh

    2014-01-01

    Full Text Available Intrusion detection systems (IDS have to process millions of packets with many features, which delay the detection of anomalies. Sampling and feature selection may be used to reduce computation time and hence minimizing intrusion detection time. This paper aims to suggest some feature selection algorithm on the basis of The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS. TOPSIS is used to suggest one or more choice(s among some alternatives, having many attributes. Total ten feature selection techniques have been used for the analysis of KDD network dataset. Three classifiers namely Naïve Bayes, J48 and PART have been considered for this experiment using Weka data mining tool. Ranking of the techniques using TOPSIS have been calculated by using MATLAB as a tool. Out of these techniques Filtered Subset Evaluation has been found suitable for intrusion detection in terms of very less computational time with acceptable accuracy.

  6. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    . While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative......Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu...

  7. Improving Bee Algorithm Based Feature Selection in Intrusion Detection System Using Membrane Computing

    Directory of Open Access Journals (Sweden)

    Kazeem I. Rufai

    2014-03-01

    Full Text Available Despite the great benefits accruable from the debut of computer and the internet, efforts are constantly being put up by fraudulent and mischievous individuals to compromise the integrity, confidentiality or availability of electronic information systems. In Cyber-security parlance, this is termed ‘intrusion’. Hence, this has necessitated the introduction of Intrusion Detection Systems (IDS to help detect and curb different types of attack. However, based on the high volume of data traffic involved in a network system, effects of redundant and irrelevant data should be minimized if a qualitative intrusion detection mechanism is genuinely desirous. Several attempts, especially feature subset selection approach using Bee Algorithm (BA, Linear Genetic Programming (LGP, Support Vector Decision Function Ranking (SVDF, Rough, Rough-DPSO, and Mutivariate Regression Splines (MARS have been advanced in the past to measure the dependability and quality of a typical IDS. The observed problem among these approaches has to do with their general performance. This has therefore motivated this research work. We hereby propose a new but robust algorithm called membrane algorithm to improve the Bee Algorithm based feature subset selection technique. This Membrane computing paradigm is a class of parallel computing devices. Data used were taken from KDD-Cup 99 Dataset which is the acceptable standard benchmark for intrusion detection. When the final results were compared to those of the existing approaches, using the three standard IDS measurements-Attack Detection, False Alarm and Classification Accuracy Rates, it was discovered that Bee Algorithm-Membrane Computing (BA-MC approach is a better technique. This is because our approach produced very high attack detection rate of 89.11%, classification accuracy of 95.60% and also generated a reasonable decrease in false alarm rate of 0.004. Receiver Operating Characteristic (ROC curve was used for results

  8. Electrocardiogram Based Identification using a New Effective Intelligent Selection of Fused Features

    Science.gov (United States)

    Abbaspour, Hamidreza; Razavi, Seyyed Mohammad; Mehrshad, Nasser

    2015-01-01

    Over the years, the feasibility of using Electrocardiogram (ECG) signal for human identification issue has been investigated, and some methods have been suggested. In this research, a new effective intelligent feature selection method from ECG signals has been proposed. This method is developed in such a way that it is able to select important features that are necessary for identification using analysis of the ECG signals. For this purpose, after ECG signal preprocessing, its characterizing features were extracted and then compressed using the cosine transform. The more effective features in the identification, among the characterizing features, are selected using a combination of the genetic algorithm and artificial neural networks. The proposed method was tested on three public ECG databases, namely, MIT-BIH Arrhythmias Database, MITBIH Normal Sinus Rhythm Database and The European ST-T Database, in order to evaluate the proposed subject identification method on normal ECG signals as well as ECG signals with arrhythmias. Identification rates of 99.89% and 99.84% and 99.99% are obtained for these databases respectively. The proposed algorithm exhibits remarkable identification accuracies not only with normal ECG signals, but also in the presence of various arrhythmias. Simulation results showed that the proposed method despite the low number of selected features has a high performance in identification task. PMID:25709939

  9. Different Classification Algorithms Based on Arabic Text Classification: Feature Selection Comparative Study

    Directory of Open Access Journals (Sweden)

    Ghazi Raho

    2015-02-01

    Full Text Available Feature selection is necessary for effective text classification. Dataset preprocessing is essential to make upright result and effective performance. This paper investigates the effectiveness of using feature selection. In this paper we have been compared the performance between different classifiers in different situations using feature selection with stemming, and without stemming.Evaluation used a BBC Arabic dataset, different classification algorithms such as decision tree (D.T, K-nearest neighbors (KNN, Naïve Bayesian (NB method and Naïve Bayes Multinomial(NBM classifier were used. The experimental results are presented in term of precision, recall, F-Measures, accuracy and time to build model.

  10. A Soft Computing Based Approach Using Modified Selection Strategy for Feature Reduction of Medical Systems

    OpenAIRE

    Kursat Zuhtuogullari; Novruz Allahverdi; Nihat Arikan

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input var...

  11. ECG Signal Feature Selection for Emotion Recognition

    OpenAIRE

    Lichen Xun; Gang Zheng

    2013-01-01

    This paper aims to study the selection of features based on ECG in emotion recognition. In the process of features selection, we start from existing feature selection algorithm, and pay special attention to some of the intuitive value on ECG waveform as well. Through the use of ANOVA and heuristic search, we picked out the different features to distinguish joy and pleasure these two emotions, then we combine this with pathological analysis of ECG signals by the view of the medical experts to ...

  12. Feature Selection: A Data Perspective

    OpenAIRE

    Li, Jundong; Cheng, Kewei; Wang, Suhang; Morstatter, Fred; Trevino, Robert P.; Tang, Jiliang; Liu, Huan

    2016-01-01

    Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing high-dimensional data for data mining and machine learning problems. The objectives of feature selection include: building simpler and more comprehensible models, improving data mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities of feature selection algorithms. In this survey,...

  13. Feature Selection and Parameter Optimization of Support Vector Machines Based on Modified Artificial Fish Swarm Algorithms

    Directory of Open Access Journals (Sweden)

    Kuan-Cheng Lin

    2015-01-01

    Full Text Available Rapid advances in information and communication technology have made ubiquitous computing and the Internet of Things popular and practicable. These applications create enormous volumes of data, which are available for analysis and classification as an aid to decision-making. Among the classification methods used to deal with big data, feature selection has proven particularly effective. One common approach involves searching through a subset of the features that are the most relevant to the topic or represent the most accurate description of the dataset. Unfortunately, searching through this kind of subset is a combinatorial problem that can be very time consuming. Meaheuristic algorithms are commonly used to facilitate the selection of features. The artificial fish swarm algorithm (AFSA employs the intelligence underlying fish swarming behavior as a means to overcome optimization of combinatorial problems. AFSA has proven highly successful in a diversity of applications; however, there remain shortcomings, such as the likelihood of falling into a local optimum and a lack of multiplicity. This study proposes a modified AFSA (MAFSA to improve feature selection and parameter optimization for support vector machine classifiers. Experiment results demonstrate the superiority of MAFSA in classification accuracy using subsets with fewer features for given UCI datasets, compared to the original FASA.

  14. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    Science.gov (United States)

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM.

  15. Multi-Stage Feature Selection by Using Genetic Algorithms for Fault Diagnosis in Gearboxes Based on Vibration Signal

    Directory of Open Access Journals (Sweden)

    Mariela Cerrada

    2015-09-01

    Full Text Available There are growing demands for condition-based monitoring of gearboxes, and techniques to improve the reliability, effectiveness and accuracy for fault diagnosis are considered valuable contributions. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance in the diagnosis system. The main aim of this research is to propose a multi-stage feature selection mechanism for selecting the best set of condition parameters on the time, frequency and time-frequency domains, which are extracted from vibration signals for fault diagnosis purposes in gearboxes. The selection is based on genetic algorithms, proposing in each stage a new subset of the best features regarding the classifier performance in a supervised environment. The selected features are augmented at each stage and used as input for a neural network classifier in the next step, while a new subset of feature candidates is treated by the selection process. As a result, the inherent exploration and exploitation of the genetic algorithms for finding the best solutions of the selection problem are locally focused. The Sensors 2015, 15 23904 approach is tested on a dataset from a real test bed with several fault classes under different running conditions of load and velocity. The model performance for diagnosis is over 98%.

  16. Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

    Directory of Open Access Journals (Sweden)

    Nicole A Capela

    Full Text Available Human activity recognition (HAR, using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter. The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree. Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

  17. Feature selection for wearable smartphone-based human activity recognition with able bodied, elderly, and stroke patients.

    Science.gov (United States)

    Capela, Nicole A; Lemaire, Edward D; Baddour, Natalie

    2015-01-01

    Human activity recognition (HAR), using wearable sensors, is a growing area with the potential to provide valuable information on patient mobility to rehabilitation specialists. Smartphones with accelerometer and gyroscope sensors are a convenient, minimally invasive, and low cost approach for mobility monitoring. HAR systems typically pre-process raw signals, segment the signals, and then extract features to be used in a classifier. Feature selection is a crucial step in the process to reduce potentially large data dimensionality and provide viable parameters to enable activity classification. Most HAR systems are customized to an individual research group, including a unique data set, classes, algorithms, and signal features. These data sets are obtained predominantly from able-bodied participants. In this paper, smartphone accelerometer and gyroscope sensor data were collected from populations that can benefit from human activity recognition: able-bodied, elderly, and stroke patients. Data from a consecutive sequence of 41 mobility tasks (18 different tasks) were collected for a total of 44 participants. Seventy-six signal features were calculated and subsets of these features were selected using three filter-based, classifier-independent, feature selection methods (Relief-F, Correlation-based Feature Selection, Fast Correlation Based Filter). The feature subsets were then evaluated using three generic classifiers (Naïve Bayes, Support Vector Machine, j48 Decision Tree). Common features were identified for all three populations, although the stroke population subset had some differences from both able-bodied and elderly sets. Evaluation with the three classifiers showed that the feature subsets produced similar or better accuracies than classification with the entire feature set. Therefore, since these feature subsets are classifier-independent, they should be useful for developing and improving HAR systems across and within populations.

  18. BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    KAUST Repository

    Kandaswamy, Krishna Kumar

    2011-08-17

    Background: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.

  19. A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms.

    Science.gov (United States)

    Şen, Baha; Peker, Musa; Çavuşoğlu, Abdullah; Çelebi, Fatih V

    2014-03-01

    Sleep scoring is one of the most important diagnostic methods in psychiatry and neurology. Sleep staging is a time consuming and difficult task undertaken by sleep experts. This study aims to identify a method which would classify sleep stages automatically and with a high degree of accuracy and, in this manner, will assist sleep experts. This study consists of three stages: feature extraction, feature selection from EEG signals, and classification of these signals. In the feature extraction stage, it is used 20 attribute algorithms in four categories. 41 feature parameters were obtained from these algorithms. Feature selection is important in the elimination of irrelevant and redundant features and in this manner prediction accuracy is improved and computational overhead in classification is reduced. Effective feature selection algorithms such as minimum redundancy maximum relevance (mRMR); fast correlation based feature selection (FCBF); ReliefF; t-test; and Fisher score algorithms are preferred at the feature selection stage in selecting a set of features which best represent EEG signals. The features obtained are used as input parameters for the classification algorithms. At the classification stage, five different classification algorithms (random forest (RF); feed-forward neural network (FFNN); decision tree (DT); support vector machine (SVM); and radial basis function neural network (RBF)) classify the problem. The results, obtained from different classification algorithms, are provided so that a comparison can be made between computation times and accuracy rates. Finally, it is obtained 97.03 % classification accuracy using the proposed method. The results show that the proposed method indicate the ability to design a new intelligent assistance sleep scoring system.

  20. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning.

    Science.gov (United States)

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context

  1. Improving mass detection using combined feature representations from projection views and reconstructed volume of DBT and boosting based classification with feature selection

    International Nuclear Information System (INIS)

    In digital breast tomosynthesis (DBT), image characteristics of projection views and reconstructed volume are different and both have the advantage of detecting breast masses, e.g. reconstructed volume mitigates a tissue overlap, while projection views have less reconstruction blur artifacts. In this paper, an improved mass detection is proposed by using combined feature representations from projection views and reconstructed volume in the DBT. To take advantage of complementary effects on different image characteristics of both data, combined feature representations are extracted from both projection views and reconstructed volume concurrently. An indirect region-of-interest segmentation in projection views, which projects volume-of-interest in reconstructed volume into the corresponding projection views, is proposed to extract combined feature representations. In addition, a boosting based classification with feature selection has been employed for selecting effective feature representations among a large number of combined feature representations, and for reducing false positives. Experiments have been conducted on a clinical data set that contains malignant masses. Experimental results demonstrate that the proposed mass detection can achieve high sensitivity with a small number of false positives. In addition, the experimental results demonstrate that the selected feature representations for classifying masses complementarily come from both projection views and reconstructed volume. (paper)

  2. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Liu Yongguo; Li Xueming; Wu Zhongfu

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing complexity, are included in the very large database in general.So, Feature Subset Selection (FSS) becomes one important issue in the field of data mining. In this letter, an FSS model based on the filter approach is built, which uses the simulated annealing genetic algorithm. Experimental results show that convergence and stability of this algorithm are adequately achieved.

  3. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    LiuYongguo; LiXueming; 等

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database.However,some redundant irrelevant attributes,which result in low performance and high computing complexity,are included in the very large database in general.So,Feature Selection(FSS)becomes one important issue in the field of data mining.In this letter,an Fss model based on the filter approach is built,which uses the simulated annealing gentic algorithm.Experimental results show that convergence and stability of this algorithm are adequately achieved.

  4. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built. The met......Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built...

  5. A Parallel Genetic Algorithm Based Feature Selection and Parameter Optimization for Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Zhi Chen

    2016-01-01

    Full Text Available The extensive applications of support vector machines (SVMs require efficient method of constructing a SVM classifier with high classification ability. The performance of SVM crucially depends on whether optimal feature subset and parameter of SVM can be efficiently obtained. In this paper, a coarse-grained parallel genetic algorithm (CGPGA is used to simultaneously optimize the feature subset and parameters for SVM. The distributed topology and migration policy of CGPGA can help find optimal feature subset and parameters for SVM in significantly shorter time, so as to increase the quality of solution found. In addition, a new fitness function, which combines the classification accuracy obtained from bootstrap method, the number of chosen features, and the number of support vectors, is proposed to lead the search of CGPGA to the direction of optimal generalization error. Experiment results on 12 benchmark datasets show that our proposed approach outperforms genetic algorithm (GA based method and grid search method in terms of classification accuracy, number of chosen features, number of support vectors, and running time.

  6. Feature Selection: Algorithms and Challenges

    Institute of Scientific and Technical Information of China (English)

    Xindong Wu; Yanglan Gan; Hao Wang; Xuegang Hu

    2006-01-01

    Feature selection is an active area in data mining research and development. It consists of efforts and contributions from a wide variety of communities, including statistics, machine learning, and pattern recognition. The diversity, on one hand, equips us with many methods and tools. On the other hand, the profusion of options causes confusion. This paper reviews various feature selection methods and identifies research challenges that are at the forefront of this exciting area.

  7. A fast separability-based feature selection method for high-dimensional remotely-sensed image classification

    OpenAIRE

    Guo, B.; Damper, R.I.; Gunn, S.R.; Nelson, J. D. B.

    2008-01-01

    Because of the difficulty of obtaining an analytic expression for Bayes error, a wide variety of separability measures has been proposed for feature selection. In this paper, we show that there is a general framework based on the criterion of mutual information (MI) that can provide a realistic solution to the problem of feature selection for high-dimensional data. We give a theoretical argument showing that the MI of multi-dimensional data can be broken down into several one-dimensional comp...

  8. A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification

    Directory of Open Access Journals (Sweden)

    Nadir Omer Fadl Elssied

    2014-01-01

    Full Text Available Spam is commonly defined as unwanted e-mails and it became a global threat against e-mail users. Although, Support Vector Machine (SVM has been commonly used in e-mail spam classification, yet the problem of high data dimensionality of the feature space due to the massive number of e-mail dataset and features still exist. To improve the limitation of SVM, reduce the computational complexity (efficiency and enhancing the classification accuracy (effectiveness. In this study, feature selection based on one-way ANOVA F-test statistics scheme was applied to determine the most important features contributing to e-mail spam classification. This feature selection based on one-way ANOVA F-test is used to reduce the high data dimensionality of the feature space before the classification process. The experiment of the proposed scheme was carried out using spam base well-known benchmarking dataset to evaluate the feasibility of the proposed method. The comparison is achieved for different datasets, categorization algorithm and success measures. In addition, experimental results on spam base English datasets showed that the enhanced SVM (FSSVM significantly outperforms SVM and many other recent spam classification methods for English dataset in terms of computational complexity and dimension reduction.

  9. Extremely high-dimensional feature selection via feature generating samplings.

    Science.gov (United States)

    Li, Shutao; Wei, Dan

    2014-06-01

    To select informative features on extremely high-dimensional problems, in this paper, a sampling scheme is proposed to enhance the efficiency of recently developed feature generating machines (FGMs). Note that in FGMs O(mlogr) time complexity should be taken to order the features by their scores; the entire computational cost of feature ordering will become unbearable when m is very large, for example, m > 10(11) , where m is the feature dimensionality and r is the size of the selected feature subset. To solve this problem, in this paper, we propose a feature generating sampling method, which can reduce this computational complexity to O(Gslog(G)+G(G+log(G))) while preserving the most informative features in a feature buffer, where Gs is the maximum number of nonzero features for each instance and G is the buffer size. Moreover, we show that our proposed sampling scheme can be deemed as the birth-death process based on random processes theory, which guarantees to include most of the informative features for feature selections. Empirical studies on real-world datasets show the effectiveness of the proposed sampling method. PMID:23864272

  10. Vinegar Classification Based on Feature Extraction and Selection From Tin Oxide Gas Sensor Array Data

    Directory of Open Access Journals (Sweden)

    Huang Xingyi

    2003-03-01

    Full Text Available Tin oxide gas sensor array based devices were often cited in publications dealing with food products. However, during the process of using a tin oxide gas sensor array to analysis and identify different gas, the most important and difficult was how to get useful parameters from the sensors and how to optimize the parameters. Which can make the sensor array can identify the gas rapidly and accuracy, and there was not a comfortable method. For this reason we developed a device which satisfied the gas sensor array act with the gas from vinegar. The parameters of the sensor act with gas were picked up after getting the whole acting process data. In order to assure whether the feature parameter was optimum or not, in this paper a new method called “distinguish index”(DI has been proposed. Thus we can assure the feature parameter was useful in the later pattern recognition process. Principal component analysis (PCA and artificial neural network (ANN were used to combine the optimum feature parameters. Good separation among the gases with different vinegar is obtained using principal component analysis. The recognition probability of the ANN is 98 %. The new method can also be applied to other pattern recognition problems.

  11. Application of image visual characterization and soft feature selection in content-based image retrieval

    Science.gov (United States)

    Jarrah, Kambiz; Kyan, Matthew; Lee, Ivan; Guan, Ling

    2006-01-01

    Fourier descriptors (FFT) and Hu's seven moment invariants (HSMI) are among the most popular shape-based image descriptors and have been used in various applications, such as recognition, indexing, and retrieval. In this work, we propose to use the invariance properties of Hu's seven moment invariants, as shape feature descriptors, for relevance identification in content-based image retrieval (CBIR) systems. The purpose of relevance identification is to find a collection of images that are statistically similar to, or match with, an original query image from within a large visual database. An automatic relevance identification module in the search engine is structured around an unsupervised learning algorithm, the self-organizing tree map (SOTM). In this paper we also proposed a new ranking function in the structure of the SOTM that exponentially ranks the retrieved images based on their similarities with respect to the query image. Furthermore, we propose to extend our studies to optimize the contribution of individual feature descriptors for enhancing the retrieval results. The proposed CBIR system is compatible with the different architectures of other CBIR systems in terms of its ability to adapt to different similarity matching algorithms for relevance identification purposes, whilst offering flexibility of choice for alternative optimization and weight estimation techniques. Experimental results demonstrate the satisfactory performance of the proposed CBIR system.

  12. Feature selection in wind speed prediction systems based on a hybrid coral reefs optimization – Extreme learning machine approach

    International Nuclear Information System (INIS)

    Highlights: • A novel approach for short-term wind speed prediction is presented. • The system is formed by a coral reefs optimization algorithm and an extreme learning machine. • Feature selection is carried out with the CRO to improve the ELM performance. • The method is tested in real wind farm data in USA, for the period 2007–2008. - Abstract: This paper presents a novel approach for short-term wind speed prediction based on a Coral Reefs Optimization algorithm (CRO) and an Extreme Learning Machine (ELM), using meteorological predictive variables from a physical model (the Weather Research and Forecast model, WRF). The approach is based on a Feature Selection Problem (FSP) carried out with the CRO, that must obtain a reduced number of predictive variables out of the total available from the WRF. This set of features will be the input of an ELM, that finally provides the wind speed prediction. The CRO is a novel bio-inspired approach, based on the simulation of reef formation and coral reproduction, able to obtain excellent results in optimization problems. On the other hand, the ELM is a new paradigm in neural networks’ training, that provides a robust and extremely fast training of the network. Together, these algorithms are able to successfully solve this problem of feature selection in short-term wind speed prediction. Experiments in a real wind farm in the USA show the excellent performance of the CRO–ELM approach in this FSP wind speed prediction problem

  13. Selective ensemble modeling load parameters of ball mill based on multi-scale frequency spectral features and sphere criterion

    Science.gov (United States)

    Tang, Jian; Yu, Wen; Chai, Tianyou; Liu, Zhuo; Zhou, Xiaojie

    2016-01-01

    It is difficult to model multi-frequency signal, such as mechanical vibration and acoustic signals of wet ball mill in the mineral grinding process. In this paper, these signals are decomposed into multi-scale intrinsic mode functions (IMFs) by the empirical mode decomposition (EMD) technique. A new adaptive multi-scale spectral features selection approach based on sphere criterion (SC) is applied to these IMFs frequency spectra. The candidate sub-models are constructed by the partial least squares (PLS) with the selected features. Finally, the branch and bound based selective ensemble (BBSEN) algorithm is applied to select and combine these ensemble sub-models. This method can be easily extended to regression and classification problems with multi-time scale signal. We successfully apply this approach to a laboratory-scale ball mill. The shell vibration and acoustic signals are used to model mill load parameters. The experimental results demonstrate that this novel approach is more effective than the other modeling methods based on multi-scale frequency spectral features.

  14. Prominent feature selection of microarray data

    Institute of Scientific and Technical Information of China (English)

    Yihui Liu

    2009-01-01

    For wavelet transform, a set of orthogonal wavelet basis aims to detect the localized changing features contained in microarray data. In this research, we investigate the performance of the selected wavelet features based on wavelet detail coefficients at the second level and the third level. The genetic algorithm is performed to optimize wavelet detail coefficients to select the best discriminant features. Exper-iments are carried out on four microarray datasets to evaluate the performance of classification. Experimental results prove that wavelet features optimized from detail coefficients efficiently characterize the differences between normal tissues and cancer tissues.

  15. A Modified Feature Selection and Artificial Neural Network-Based Day-Ahead Load Forecasting Model for a Smart Grid

    Directory of Open Access Journals (Sweden)

    Ashfaq Ahmad

    2015-12-01

    Full Text Available In the operation of a smart grid (SG, day-ahead load forecasting (DLF is an important task. The SG can enhance the management of its conventional and renewable resources with a more accurate DLF model. However, DLF model development is highly challenging due to the non-linear characteristics of load time series in SGs. In the literature, DLF models do exist; however, these models trade off between execution time and forecast accuracy. The newly-proposed DLF model will be able to accurately predict the load of the next day with a fair enough execution time. Our proposed model consists of three modules; the data preparation module, feature selection and the forecast module. The first module makes the historical load curve compatible with the feature selection module. The second module removes redundant and irrelevant features from the input data. The third module, which consists of an artificial neural network (ANN, predicts future load on the basis of selected features. Moreover, the forecast module uses a sigmoid function for activation and a multi-variate auto-regressive model for weight updating during the training process. Simulations are conducted in MATLAB to validate the performance of our newly-proposed DLF model in terms of accuracy and execution time. Results show that our proposed modified feature selection and modified ANN (m(FS + ANN-based model for SGs is able to capture the non-linearity(ies in the history load curve with 97 . 11 % accuracy. Moreover, this accuracy is achieved at the cost of a fair enough execution time, i.e., we have decreased the average execution time of the existing FS + ANN-based model by 38 . 50 % .

  16. Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection

    Directory of Open Access Journals (Sweden)

    Shengyu Liu

    2015-01-01

    Full Text Available Drug name recognition (DNR is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  17. A wrapper-based approach for feature selection and classification of major depressive disorder-bipolar disorders.

    Science.gov (United States)

    Tekin Erguzel, Turker; Tas, Cumhur; Cebi, Merve

    2015-09-01

    Feature selection (FS) and classification are consecutive artificial intelligence (AI) methods used in data analysis, pattern classification, data mining and medical informatics. Beside promising studies in the application of AI methods to health informatics, working with more informative features is crucial in order to contribute to early diagnosis. Being one of the prevalent psychiatric disorders, depressive episodes of bipolar disorder (BD) is often misdiagnosed as major depressive disorder (MDD), leading to suboptimal therapy and poor outcomes. Therefore discriminating MDD and BD at earlier stages of illness could help to facilitate efficient and specific treatment. In this study, a nature inspired and novel FS algorithm based on standard Ant Colony Optimization (ACO), called improved ACO (IACO), was used to reduce the number of features by removing irrelevant and redundant data. The selected features were then fed into support vector machine (SVM), a powerful mathematical tool for data classification, regression, function estimation and modeling processes, in order to classify MDD and BD subjects. Proposed method used coherence, a promising quantitative electroencephalography (EEG) biomarker, values calculated from alpha, theta and delta frequency bands. The noteworthy performance of novel IACO-SVM approach stated that it is possible to discriminate 46 BD and 55 MDD subjects using 22 of 48 features with 80.19% overall classification accuracy. The performance of IACO algorithm was also compared to the performance of standard ACO, genetic algorithm (GA) and particle swarm optimization (PSO) algorithms in terms of their classification accuracy and number of selected features. In order to provide an almost unbiased estimate of classification error, the validation process was performed using nested cross-validation (CV) procedure. PMID:26164033

  18. Neural network based approach to study the effect of feature selection on document summarization

    Directory of Open Access Journals (Sweden)

    Dipti Y. Sakhare

    2013-06-01

    Full Text Available As the amount of textual Information increases, we experience a need for Automatic Text Summarizers. In Automatic summarization a text document or a larger corpus of multiple documents are reduced to a short set of words or paragraph that conveys the main meaning of the text. In this paper we proposed various features of Summary extraction. In the proposed approach during training phase, the feature vector is computed for a set of sentences using the feature extraction technique. After that, the feature vector and their corresponding feature scores are used to train the neural network optimally. Later in the testing phase, the input document is subjectedto pre-processing and feature extraction techniques. Finally, by making use of sentence score, the most important sentences are extracted from the input document. The experimentation is performed with the DUC 2002 dataset. The features that are to be applied depending upon the size of the Document are also analyzed. The comparative results of the proposed approach and that of MS-Word are also presented here.

  19. Feature Selection in Scientific Applications

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E; Newsam, S; Kamath, C

    2004-02-27

    Numerous applications of data mining to scientific data involve the induction of a classification model. In many cases, the collection of data is not performed with this task in mind, and therefore, the data might contain irrelevant or redundant features that affect negatively the accuracy of the induction algorithms. The size and dimensionality of typical scientific data make it difficult to use any available domain information to identify features that discriminate between the classes of interest. Similarly, exploratory data analysis techniques have limitations on the amount and dimensionality of the data that can be effectively processed. In this paper, we describe applications of efficient feature selection methods to data sets from astronomy, plasma physics, and remote sensing. We use variations of recently proposed filter methods as well as traditional wrapper approaches where practical. We discuss the importance of these applications, the general challenges of feature selection in scientific datasets, the strategies for success that were common among our diverse applications, and the lessons learned in solving these problems.

  20. ECG Signal Feature Selection for Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Lichen Xun

    2013-01-01

    Full Text Available This paper aims to study the selection of features based on ECG in emotion recognition. In the process of features selection, we start from existing feature selection algorithm, and pay special attention to some of the intuitive value on ECG waveform as well. Through the use of ANOVA and heuristic search, we picked out the different features to distinguish joy and pleasure these two emotions, then we combine this with pathological analysis of ECG signals by the view of the medical experts to discuss the logic corresponding relation between ECG waveform and emotion distinguish. Through experiment, using the method in this paper we only picked out five features and reached 92% of accuracy rate in the recognition of joy and pleasure.

  1. Identifying Metabolite and Protein Biomarkers in Unstable Angina In-patients by Feature Selection Based Data Mining Method

    Institute of Scientific and Technical Information of China (English)

    SHI Cheng-he; YANG Yi; WANG Wei; ZHAO Hui-hui; HOU Na; CHEN Jian-xin; SHI Qi; XU Xue-gong; WANG Juan; ZHENG Cheng-long; ZHAO Ling-yan

    2011-01-01

    Unstable angina(UA) is the most dangerous type of Coronary Heart Disease(CHD) to cause more and more mortal and morbid world wide. Identification of biomarkers for UA at the level of proteomics and metaboiomics is a better avenue to understand the inner mechanism of it. Feature selection based data mining method is better suited to identify biomarkers of UA. In this study, we carried out clinical epidemiology to collect plasmas of UA in-patients and controls. Proteomics and metabolomics data were obtained via two-dimensional difference gel electrophoresis and gas chromatography techniques. We presented a novel computational strategy to select biomarkers as few as possible for UA in the two groups of data. Firstly, decision tree was used to select biomarkers for UA and 3-fold cross validation was used to evaluate computational performances for the three methods. Alternatively, we combined independent t test and classification based data mining method as well as backward elimination technique to select, as few as possible, protein and metabolite biomarkers with best classification performances. By the method, we selected 6 proteins and 5 metabolites for UA. The novel method presented here provides a better insight into the pathology of a disease.

  2. Trains Trouble Shooting Based on Wavelet Analysis and Joint Selection Feature Classifier

    Directory of Open Access Journals (Sweden)

    Bo Yu

    2014-02-01

    Full Text Available According to urban train running status, this paper adjusts constraints, air spring and lateral damper components running status and vibration signals of vertical acceleration of the vehicle body, combined with characteristics of urban train operation, we build an optimized train operation adjustment model and put forward corresponding estimation method-- wavelet packet energy moment, for the train state. First, we analyze characteristics of the body vertical vibration, conduct wavelet packet decomposition of signals according to different conditions and different speeds, and reconstruct the band signal which with larger energy; we introduce the hybrid ideas of particle swarm algorithm, establish fault diagnosis model and use improved particle swarm algorithm to solve this model; the algorithm also gives specific steps for solution; then calculate features of each band wavelet packet energy moment. Changes of wavelet packet energy moment with different frequency bands reflect changes of the train operation state; finally, wavelet packet energy moments with different frequency band are composed as feature vector to support vector machines for fault identification

  3. Intrusion Detection In Mobile Ad Hoc Networks Using GA Based Feature Selection

    CERN Document Server

    Nallusamy, R; Duraiswamy, K

    2009-01-01

    Mobile ad hoc networking (MANET) has become an exciting and important technology in recent years because of the rapid proliferation of wireless devices. MANETs are highly vulnerable to attacks due to the open medium, dynamically changing network topology and lack of centralized monitoring point. It is important to search new architecture and mechanisms to protect the wireless networks and mobile computing application. IDS analyze the network activities by means of audit data and use patterns of well-known attacks or normal profile to detect potential attacks. There are two methods to analyze: misuse detection and anomaly detection. Misuse detection is not effective against unknown attacks and therefore, anomaly detection method is used. In this approach, the audit data is collected from each mobile node after simulating the attack and compared with the normal behavior of the system. If there is any deviation from normal behavior then the event is considered as an attack. Some of the features of collected audi...

  4. Influence of Feature Selection Methods on Classification Sensitivity Based on the Example of A Study of Polish Voivodship Tourist Attractiveness

    Directory of Open Access Journals (Sweden)

    Bąk Iwona

    2014-07-01

    Full Text Available The purpose of this article is to determine the influence of various methods of selection of diagnostic features on the sensitivity of classification. Three options of feature selection are presented: a parametric feature selection method with a sum (option I, a median of the correlation coefficients matrix column elements (option II and the method of a reversed matrix (option III. Efficiency of the groupings was verified by the indicators of homogeneity, heterogeneity and the correctness of grouping. In the assessment of group efficiency the approach with the Weber median was used. The undertaken problem was illustrated with a research into the tourist attractiveness of voivodships in Poland in 2011.

  5. Arc-Welding Spectroscopic Monitoring based on Feature Selection and Neural Networks

    Directory of Open Access Journals (Sweden)

    Jose M. Lopez- Higuera

    2008-10-01

    Full Text Available A new spectral processing technique designed for application in the on-line detection and classification of arc-welding defects is presented in this paper. A noninvasive fiber sensor embedded within a TIG torch collects the plasma radiation originated during the welding process. The spectral information is then processed in two consecutive stages. A compression algorithm is first applied to the data, allowing real-time analysis. The selected spectral bands are then used to feed a classification algorithm, which will be demonstrated to provide an efficient weld defect detection and classification. The results obtained with the proposed technique are compared to a similar processing scheme presented in previous works, giving rise to an improvement in the performance of the monitoring system.

  6. Feature Selection Approach in Animal Classification

    OpenAIRE

    Y H Sharath Kumar; C D Divya

    2014-01-01

    In this paper, we propose a model for automatic classification of Animals using different classifiers Nearest Neighbour, Probabilistic Neural Network and Symbolic. Animal images are segmented using maximal region merging segmentation. The Gabor features are extracted from segmented animal images. Discriminative texture features are then selected using the different feature selection algorithm like Sequential Forward Selection, Sequential Floating Forward Selection, Sequential B...

  7. The Importance of Feature Selection in Classification

    Directory of Open Access Journals (Sweden)

    Mrs.K. Moni Sushma Deep

    2014-01-01

    Full Text Available Feature Selection is an important technique for classification for reducing the dimensionality of feature space and it removes redundant, irrelevant, or noisy data. In this paper the feature are selected based on the ranking methods.(1 Information Gain (IG attribute evaluation, (2 Gain Ratio (GR attribute evaluation, (3 Symmetrical Uncertainty (SU attribute evaluation. This paper evaluates the features which are derived from the 3 methods using supervised learning algorithms K-Nearest Neighbor and Naïve Bayes. The measures used for the classifier are True Positive, False Positive, Accuracy and they compared between the algorithm for experimental results. we have taken 2 data sets Pima and Wine from UCI Repository database.

  8. Wavelength Selection of Hyperspectral LIDAR Based on Feature Weighting for Estimation of Leaf Nitrogen Content in Rice

    Science.gov (United States)

    Du, Lin; Shi, Shuo; Gong, Wei; Yang, Jian; Sun, Jia; Mao, Feiyue

    2016-06-01

    Hyperspectral LiDAR (HSL) is a novel tool in the field of active remote sensing, which has been widely used in many domains because of its advantageous ability of spectrum-gained. Especially in the precise monitoring of nitrogen in green plants, the HSL plays a dispensable role. The exiting HSL system used for nitrogen status monitoring has a multi-channel detector, which can improve the spectral resolution and receiving range, but maybe result in data redundancy, difficulty in system integration and high cost as well. Thus, it is necessary and urgent to pick out the nitrogen-sensitive feature wavelengths among the spectral range. The present study, aiming at solving this problem, assigns a feature weighting to each centre wavelength of HSL system by using matrix coefficient analysis and divergence threshold. The feature weighting is a criterion to amend the centre wavelength of the detector to accommodate different purpose, especially the estimation of leaf nitrogen content (LNC) in rice. By this way, the wavelengths high-correlated to the LNC can be ranked in a descending order, which are used to estimate rice LNC sequentially. In this paper, a HSL system which works based on a wide spectrum emission and a 32-channel detector is conducted to collect the reflectance spectra of rice leaf. These spectra collected by HSL cover a range of 538 nm - 910 nm with a resolution of 12 nm. These 32 wavelengths are strong absorbed by chlorophyll in green plant among this range. The relationship between the rice LNC and reflectance-based spectra is modeled using partial least squares (PLS) and support vector machines (SVMs) based on calibration and validation datasets respectively. The results indicate that I) wavelength selection method of HSL based on feature weighting is effective to choose the nitrogen-sensitive wavelengths, which can also be co-adapted with the hardware of HSL system friendly. II) The chosen wavelength has a high correlation with rice LNC which can be

  9. Pneumonia identification using statistical feature selection

    Science.gov (United States)

    Xia, Fei; Vanderwende, Lucy; Wurfel, Mark M; Yetisgen-Yildiz, Meliha

    2012-01-01

    Objective This paper describes a natural language processing system for the task of pneumonia identification. Based on the information extracted from the narrative reports associated with a patient, the task is to identify whether or not the patient is positive for pneumonia. Design A binary classifier was employed to identify pneumonia from a dataset of multiple types of clinical notes created for 426 patients during their stay in the intensive care unit. For this purpose, three types of features were considered: (1) word n-grams, (2) Unified Medical Language System (UMLS) concepts, and (3) assertion values associated with pneumonia expressions. System performance was greatly increased by a feature selection approach which uses statistical significance testing to rank features based on their association with the two categories of pneumonia identification. Results Besides testing our system on the entire cohort of 426 patients (unrestricted dataset), we also used a smaller subset of 236 patients (restricted dataset). The performance of the system was compared with the results of a baseline previously proposed for these two datasets. The best results achieved by the system (85.71 and 81.67 F1-measure) are significantly better than the baseline results (50.70 and 49.10 F1-measure) on the restricted and unrestricted datasets, respectively. Conclusion Using a statistical feature selection approach that allows the feature extractor to consider only the most informative features from the feature space significantly improves the performance over a baseline that uses all the features from the same feature space. Extracting the assertion value for pneumonia expressions further improves the system performance. PMID:22539080

  10. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  11. Optimized features selection for gender classification using optimization algorithms

    OpenAIRE

    KHAN, Sajid Ali; Nazir, Muhammad; RIAZ, Naveed

    2013-01-01

    Optimized feature selection is an important task in gender classification. The optimized features not only reduce the dimensions, but also reduce the error rate. In this paper, we have proposed a technique for the extraction of facial features using both appearance-based and geometric-based feature extraction methods. The extracted features are then optimized using particle swarm optimization (PSO) and the bee algorithm. The geometric-based features are optimized by PSO with ensem...

  12. A New Approach of Feature Selection for Text Categorization

    Institute of Scientific and Technical Information of China (English)

    CUI Zifeng; XU Baowen; ZHANG Weifeng; XU Junling

    2006-01-01

    This paper proposes a new approach of feature selection based on the independent measure between features for text categorization.A fundamental hypothesis that occurrence of the terms in documents is independent of each other,widely used in the probabilistic models for text categorization (TC), is discussed.However, the basic hypothesis is incomplete for independence of feature set.From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to obtain a feature subset.The selected subset is high in relevance with category and strong in independence between features,satisfies the basic hypothesis at maximum degree.Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.

  13. Feature Selection Approaches In Antibody Display

    OpenAIRE

    Polaka, Inese

    2015-01-01

    Molecular diagnostics tools provide specific data that have high dimensionality due to many factors analyzed in one experiment and few records due to high costs of the experiments. This study addresses the problem of dimensionality in melanoma patient antibody display data by applying data mining feature selection techniques. The article describes feature selection ranking and subset selection approaches and analyzes the performance of various methods evaluating selected feature subsets using...

  14. Detecting Lo cal Manifold Structure for Unsup ervised Feature Selection

    Institute of Scientific and Technical Information of China (English)

    FENG Ding-Cheng; CHEN Feng; XU Wen-Li

    2014-01-01

    Unsupervised feature selection is fundamental in statistical pattern recognition, and has drawn persistent attention in the past several decades. Recently, much work has shown that feature selection can be formulated as nonlinear dimensionality reduction with discrete constraints. This line of research emphasizes utilizing the manifold learning techniques, where feature selection and learning can be studied based on the manifold assumption in data distribution. Many existing feature selection methods such as Laplacian score, SPEC (spectrum decomposition of graph Laplacian), TR (trace ratio) criterion, MSFS (multi-cluster feature selection) and EVSC (eigenvalue sensitive criterion) apply the basic properties of graph Laplacian, and select the optimal feature subsets which best preserve the manifold structure defined on the graph Laplacian. In this paper, we propose a new feature selection perspective from locally linear embedding (LLE), which is another popular manifold learning method. The main difficulty of using LLE for feature selection is that its optimization involves quadratic programming and eigenvalue decomposition, both of which are continuous procedures and different from discrete feature selection. We prove that the LLE objective can be decomposed with respect to data dimensionalities in the subset selection problem, which also facilitates constructing better coordinates from data using the principal component analysis (PCA) technique. Based on these results, we propose a novel unsupervised feature selection algorithm, called locally linear selection (LLS), to select a feature subset representing the underlying data manifold. The local relationship among samples is computed from the LLE formulation, which is then used to estimate the contribution of each individual feature to the underlying manifold structure. These contributions, represented as LLS scores, are ranked and selected as the candidate solution to feature selection. We further develop a

  15. Adversarial Feature Selection Against Evasion Attacks.

    Science.gov (United States)

    Zhang, Fei; Chan, Patrick P K; Biggio, Battista; Yeung, Daniel S; Roli, Fabio

    2016-03-01

    Pattern recognition and machine learning techniques have been increasingly adopted in adversarial settings such as spam, intrusion, and malware detection, although their security against well-crafted attacks that aim to evade detection by manipulating data at test time has not yet been thoroughly assessed. While previous work has been mainly focused on devising adversary-aware classification algorithms to counter evasion attempts, only few authors have considered the impact of using reduced feature sets on classifier security against the same attacks. An interesting, preliminary result is that classifier security to evasion may be even worsened by the application of feature selection. In this paper, we provide a more detailed investigation of this aspect, shedding some light on the security properties of feature selection against evasion attacks. Inspired by previous work on adversary-aware classifiers, we propose a novel adversary-aware feature selection model that can improve classifier security against evasion attacks, by incorporating specific assumptions on the adversary's data manipulation strategy. We focus on an efficient, wrapper-based implementation of our approach, and experimentally validate its soundness on different application examples, including spam and malware detection. PMID:25910268

  16. Filter-based feature selection and support vector machine for false positive reduction in computer-aided mass detection in mammograms

    Science.gov (United States)

    Nguyen, V. D.; Nguyen, D. T.; Nguyen, T. D.; Phan, V. A.; Truong, Q. D.

    2015-02-01

    In this paper, a method for reducing false positive in computer-aided mass detection in screening mammograms is proposed. A set of 32 features, including First Order Statistics (FOS) features, Gray-Level Occurrence Matrix (GLCM) features, Block Difference Inverse Probability (BDIP) features, and Block Variation of Local Correlation coefficients (BVLC) are extracted from detected Regions-Of-Interest (ROIs). An optimal subset of 8 features is selected from the full feature set by mean of a filter-based Sequential Backward Selection (SBS). Then, Support Vector Machine (SVM) is utilized to classify the ROIs into massive regions or normal regions. The method's performance is evaluated using the area under the Receiver Operating Characteristic (ROC) curve (AUC or AZ). On a dataset consisting about 2700 ROIs detected from mini-MIAS database of mammograms, the proposed method achieves AZ=0.938.

  17. Feature dimensionality reduction for myoelectric pattern recognition: a comparison study of feature selection and feature projection methods.

    Science.gov (United States)

    Liu, Jie

    2014-12-01

    This study investigates the effect of the feature dimensionality reduction strategies on the classification of surface electromyography (EMG) signals toward developing a practical myoelectric control system. Two dimensionality reduction strategies, feature selection and feature projection, were tested on both EMG feature sets, respectively. A feature selection based myoelectric pattern recognition system was introduced to select the features by eliminating the redundant features of EMG recordings instead of directly choosing a subset of EMG channels. The Markov random field (MRF) method and a forward orthogonal search algorithm were employed to evaluate the contribution of each individual feature to the classification, respectively. Our results from 15 healthy subjects indicate that, with a feature selection analysis, independent of the type of feature set, across all subjects high overall accuracies can be achieved in classification of seven different forearm motions with a small number of top ranked original EMG features obtained from the forearm muscles (average overall classification accuracy >95% with 12 selected EMG features). Compared to various feature dimensionality reduction techniques in myoelectric pattern recognition, the proposed filter-based feature selection approach is independent of the type of classification algorithms and features, which can effectively reduce the redundant information not only across different channels, but also cross different features in the same channel. This may enable robust EMG feature dimensionality reduction without needing to change ongoing, practical use of classification algorithms, an important step toward clinical utility.

  18. Novel Feature Selection by Differential Evolution Algorithm

    Directory of Open Access Journals (Sweden)

    Ali Ghareaghaji

    2013-11-01

    Full Text Available Iris scan biometrics employs the unique characteristic and features of the human iris in order to verify the identity of in individual. In today's world, where terrorist attacks are on the rise employment of infallible security systems is a must. This makes Iris recognition systems unavoidable in emerging security. Authentication the objective function is minimized using Differential Evolutionary (DE Algorithm where the population vector is encoded using Binary Encoded Decimal to avoid the float number optimization problem. An automatic clustering of the possible values of the Lagrangian multiplier provides a detailed insight of the selected features during the proposed DE based optimization process. The classification accuracy of Support Vector Machine (SVM is used to measure the performance of the selected features. The proposed algorithm outperforms the existing DE based approaches when tested on IRIS, Wine, Wisconsin Breast Cancer, Sonar and Ionosphere datasets. The same algorithm when applied on gait based people identification, using skeleton data points obtained from Microsoft Kinect sensor, exceeds the previously reported accuracies.

  19. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention

    OpenAIRE

    Yuanqing Li; Jinyi Long; Biao Huang; Tianyou Yu; Wei Wu; Peijun Li; Fang Fang; Pei Sun

    2016-01-01

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how thes...

  20. Fish recognition based on the combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree

    CERN Document Server

    Alsmadi, Mutasem Khalil Sari; Noah, Shahrul Azman; Almarashdah, Ibrahim

    2009-01-01

    We presents in this paper a novel fish classification methodology based on a combination between robust feature selection, image segmentation and geometrical parameter techniques using Artificial Neural Network and Decision Tree. Unlike existing works for fish classification, which propose descriptors and do not analyze their individual impacts in the whole classification task and do not make the combination between the feature selection, image segmentation and geometrical parameter, we propose a general set of features extraction using robust feature selection, image segmentation and geometrical parameter and their correspondent weights that should be used as a priori information by the classifier. In this sense, instead of studying techniques for improving the classifiers structure itself, we consider it as a black box and focus our research in the determination of which input information must bring a robust fish discrimination.The main contribution of this paper is enhancement recognize and classify fishes...

  1. Feature Selection Based on Adaptive Fuzzy Membership Functions%基于自适应隶属度函数的特征选择

    Institute of Scientific and Technical Information of China (English)

    谢衍涛; 桑农; 张天序

    2006-01-01

    Neuro-fuzzy (NF) networks are adaptive fuzzy inference systems (FIS) and have been applied to feature selection by some researchers. However, their rule number will grow exponentially as the data dimension increases. On the other hand, feature selection algorithms with artificial neural networks (ANN) usually require normalization of input data, which will probably change some characteristics of original data that are important for classification. To overcome the problems mentioned above, this paper combines the fuzzification layer of the neuro-fuzzy system with the multi-layer perceptron (MLP) to form a new artificial neural network. Furthermore, fuzzification strategy and feature measurement based on membership space are proposed for feature selection.Finally, experiments with both natural and artificial data are carried out to compare with other methods, and the results approve the validity of the algorithm.

  2. 基于互信息的顺序向前特征选择算法%Forward order feature selection algorithm based on mutual information

    Institute of Scientific and Technical Information of China (English)

    袁帅; 杨宏晖; 申昇

    2014-01-01

    特征选择是水声目标识别领域的重要环节之一。提出基于互信息的顺序向前特征选择算法,通过计算特征之间的互信息和特征与类别间的互信息对所有特征的分类能力进行排序。提取了实测4类水声目标进行特征选择和分类实验,结果表明:该算法能够选择有效特征子集,得到较高的正确识别率,并且运行速度快,稳定性强。%Feature selection is important in classifying underwater acoustic target. In this paper, an algorithm of forward order feature selection based on mutual information (SFFSMI algorithm) is proposed. This algorithm sorts the classi-fying abilities of all features by calculating mutual information between different features and the mutual information between features and classes. The features of 4 classes of underwater targets are extracted and used for feature selection and classification experiment. Experimental results show that, this algorithm can choose effective feature subsets with high correct identification rate and it runs fast with high stability.

  3. NEW FEATURE SELECTION METHOD IN MACHINE FAULT DIAGNOSIS

    Institute of Scientific and Technical Information of China (English)

    Wang Xinfeng; Qiu Jing; Liu Guanjun

    2005-01-01

    Aiming to deficiency of the filter and wrapper feature selection methods, a new method based on composite method of filter and wrapper method is proposed. First the method filters original features to form a feature subset which can meet classification correctness rate, then applies wrapper feature selection method select optimal feature subset. A successful technique for solving optimization problems is given by genetic algorithm (GA). GA is applied to the problem of optimal feature selection. The composite method saves computing time several times of the wrapper method with holding the classification accuracy in data simulation and experiment on bearing fault feature selection. So this method possesses excellent optimization property, can save more selection time, and has the characteristics of high accuracy and high efficiency.

  4. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-13

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.

  5. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-01

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features. PMID:26759193

  6. Stable Feature Selection for Biomarker Discovery

    CERN Document Server

    He, Zengyou

    2010-01-01

    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development.

  7. Identification of apolipoprotein using feature selection technique.

    Science.gov (United States)

    Tang, Hua; Zou, Ping; Zhang, Chunmei; Chen, Rong; Chen, Wei; Lin, Hao

    2016-01-01

    Apolipoprotein is a kind of protein which can transport the lipids through the lymphatic and circulatory systems. The abnormal expression level of apolipoprotein always causes angiocardiopathy. Thus, correct recognition of apolipoprotein from proteomic data is very crucial to the comprehension of cardiovascular system and drug design. This study is to develop a computational model to predict apolipoproteins. In the model, the apolipoproteins and non-apolipoproteins were collected to form benchmark dataset. On the basis of the dataset, we extracted the g-gap dipeptide composition information from residue sequences to formulate protein samples. To exclude redundant information or noise, the analysis of various (ANOVA)-based feature selection technique was proposed to find out the best feature subset. The support vector machine (SVM) was selected as discrimination algorithm. Results show that 96.2% of sensitivity and 99.3% of specificity were achieved in five-fold cross-validation. These findings open new perspectives to improve apolipoproteins prediction by considering the specific dipeptides. We expect that these findings will help to improve drug development in anti-angiocardiopathy disease. PMID:27443605

  8. Feature Selection as a Multiagent Coordination Problem

    OpenAIRE

    Malialis, Kleanthis; Wang, Jun; Brooks, Gary; Frangou, George

    2016-01-01

    Datasets with hundreds to tens of thousands features is the new norm. Feature selection constitutes a central problem in machine learning, where the aim is to derive a representative set of features from which to construct a classification (or prediction) model for a specific task. Our experimental study involves microarray gene expression datasets, these are high-dimensional and noisy datasets that contain genetic data typically used for distinguishing between benign or malicious tissues or ...

  9. An Evident Theoretic Feature Selection Approach for Text Categorization

    Directory of Open Access Journals (Sweden)

    UMARSATHIC ALI

    2012-06-01

    Full Text Available With the exponential growth of textual documents available in unstructured form on the Internet, feature selection approaches are increasingly significant for the preprocessing of textual documents for automatic text categorization. Feature selection, which focuses on identifying relevant and informative features, can help reduce the computational cost of processing voluminous amounts of data as well asincrease the effectiveness for the subsequent text categorization tasks. In this paper, we propose a new evident theoretic feature selection approach for text categorization based on transferable belief model (TBM. An evaluation on the performance of the proposed evident theoretic feature selection approach on benchmark dataset is also presented. We empirically show the effectiveness of our approach in outperforming the traditional feature selection methods using two standard benchmark datasets.

  10. Hadoop neural network for parallel and distributed feature selection

    OpenAIRE

    Hodge, Victoria Jane; O'Keefe, Simon; Austin, Jim

    2016-01-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms...

  11. Feature Selection Criteria for Real Time EKF-SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando Auat Cheein

    2010-02-01

    Full Text Available This paper presents a seletion procedure for environmet features for the correction stage of a SLAM (Simultaneous Localization and Mapping algorithm based on an Extended Kalman Filter (EKF. This approach decreases the computational time of the correction stage which allows for real and constant-time implementations of the SLAM. The selection procedure consists in chosing the features the SLAM system state covariance is more sensible to. The entire system is implemented on a mobile robot equipped with a range sensor laser. The features extracted from the environment correspond to lines and corners. Experimental results of the real time SLAM algorithm and an analysis of the processing-time consumed by the SLAM with the feature selection procedure proposed are shown. A comparison between the feature selection approach proposed and the classical sequential EKF-SLAM along with an entropy feature selection approach is also performed.

  12. Integrated Clustering and Feature Selection Scheme for Text Documents.

    Directory of Open Access Journals (Sweden)

    M. Thangamani

    2010-01-01

    Full Text Available Problem statement: Text documents are the unstructured databases that contain raw data collection. The clustering techniques are used group up the text documents with reference to its similarity. Approach: The feature selection techniques were used to improve the efficiency and accuracy of clustering process. The feature selection was done by eliminate the redundant and irrelevant items from the text document contents. Statistical methods were used in the text clustering and feature selection algorithm. The cube size is very high and accuracy is low in the term based text clustering and feature selection method. The semantic clustering and feature selection method was proposed to improve the clustering and feature selection mechanism with semantic relations of the text documents. The proposed system was designed to identify the semantic relations using the ontology. The ontology was used to represent the term and concept relationship. Results: The synonym, meronym and hypernym relationships were represented in the ontology. The concept weights were estimated with reference to the ontology. The concept weight was used for the clustering process. The system was implemented in two methods. They were term clustering with feature selection and semantic clustering with feature selection. Conclusion: The performance analysis was carried out with the term clustering and semantic clustering methods. The accuracy and efficiency factors were analyzed in the performance analysis.

  13. Optimized Image Steganalysis through Feature Selection using MBEGA

    CERN Document Server

    Geetha, S

    2010-01-01

    Feature based steganalysis, an emerging branch in information forensics, aims at identifying the presence of a covert communication by employing the statistical features of the cover and stego image as clues/evidences. Due to the large volumes of security audit data as well as complex and dynamic properties of steganogram behaviours, optimizing the performance of steganalysers becomes an important open problem. This paper is focussed at fine tuning the performance of six promising steganalysers in this field, through feature selection. We propose to employ Markov Blanket-Embedded Genetic Algorithm (MBEGA) for stego sensitive feature selection process. In particular, the embedded Markov blanket based memetic operators add or delete features (or genes) from a genetic algorithm (GA) solution so as to quickly improve the solution and fine-tune the search. Empirical results suggest that MBEGA is effective and efficient in eliminating irrelevant and redundant features based on both Markov blanket and predictive pow...

  14. Features Based Text Similarity Detection

    CERN Document Server

    Kent, Chow Kok

    2010-01-01

    As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, in handling some large content articles, there are some weaknesses in fingerprint matching technique especially in space and time consumption issue. In this paper, we propose a new approach to detect plagiarism which integrates the use of fingerprint matching technique with four key features to assist in the detection process. These proposed features are capable to choose the main point or key sentence in the articles to be compared. Those selected sentence will be undergo the fingerprint matching process in order to detect the similarity between the sentences. Hence, time and space usage for the comparison process is r...

  15. A Hybrid Feature Selection Approach for Arabic Documents Classification

    NARCIS (Netherlands)

    Habib, Mena B.; Fayed, Zaki T.; Gharib, Tarek F.; Sarhan, Ahmed A. E.; Salem, Abdel-Badeeh M.

    2006-01-01

    Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge number of features. Feature selection tries to

  16. Hadoop neural network for parallel and distributed feature selection.

    Science.gov (United States)

    Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

    2016-06-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. PMID:26403824

  17. Hadoop neural network for parallel and distributed feature selection.

    Science.gov (United States)

    Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

    2016-06-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop.

  18. Feature and Region Selection for Visual Learning.

    Science.gov (United States)

    Zhao, Ji; Wang, Liantao; Cabral, Ricardo; De la Torre, Fernando

    2016-03-01

    Visual learning problems, such as object classification and action recognition, are typically approached using extensions of the popular bag-of-words (BoWs) model. Despite its great success, it is unclear what visual features the BoW model is learning. Which regions in the image or video are used to discriminate among classes? Which are the most discriminative visual words? Answering these questions is fundamental for understanding existing BoW models and inspiring better models for visual recognition. To answer these questions, this paper presents a method for feature selection and region selection in the visual BoW model. This allows for an intermediate visualization of the features and regions that are important for visual learning. The main idea is to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier (e.g., support vector machine). There are four main benefits of our approach: 1) our approach accommodates non-linear additive kernels, such as the popular χ(2) and intersection kernel; 2) our approach is able to handle both regions in images and spatio-temporal regions in videos in a unified way; 3) the feature selection problem is convex, and both problems can be solved using a scalable reduced gradient method; and 4) we point out strong connections with multiple kernel learning and multiple instance learning approaches. Experimental results in the PASCAL VOC 2007, MSR Action Dataset II and YouTube illustrate the benefits of our approach. PMID:26742135

  19. Coevolution of active vision and feature selection.

    Science.gov (United States)

    Floreano, Dario; Kato, Toshifumi; Marocco, Davide; Sauser, Eric

    2004-03-01

    We show that complex visual tasks, such as position- and size-invariant shape recognition and navigation in the environment, can be tackled with simple architectures generated by a coevolutionary process of active vision and feature selection. Behavioral machines equipped with primitive vision systems and direct pathways between visual and motor neurons are evolved while they freely interact with their environments. We describe the application of this methodology in three sets of experiments, namely, shape discrimination, car driving, and robot navigation. We show that these systems develop sensitivity to a number of oriented, retinotopic, visual-feature-oriented edges, corners, height, and a behavioral repertoire to locate, bring, and keep these features in sensitive regions of the vision system, resembling strategies observed in simple insects.

  20. Memetic Feature Selection: Benchmarking Hybridization Schemata

    Science.gov (United States)

    Esseghir, M. A.; Goncalves, Gilles; Slimani, Yahya

    Feature subset selection is an important preprocessing and guiding step for classification. The combinatorial nature of the problem have made the use of evolutionary and heuristic methods indispensble for the exploration of high dimensional problem search spaces. In this paper, a set of hybridization schemata of genetic algorithm with local search are investigated through a memetic framework. Empirical study compares and discusses the effectiveness of the proposed local search procedure as well as their components.

  1. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HUYue-li; CAOJia-lin; ZHAOQian; FENGXu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92% using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  2. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HU Yue-li; CAO Jia-lin; ZHAO Qian; FENG Xu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92 % using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  3. Ensemble feature selection integrating elitist roles and quantum game model

    Institute of Scientific and Technical Information of China (English)

    Weiping Ding; Jiandong Wang; Zhijin Guan; Quan Shi

    2015-01-01

    To accelerate the selection process of feature subsets in the rough set theory (RST), an ensemble elitist roles based quantum game (EERQG) algorithm is proposed for feature selec-tion. Firstly, the multilevel elitist roles based dynamics equilibrium strategy is established, and both immigration and emigration of elitists are able to be self-adaptive to balance between exploration and exploitation for feature selection. Secondly, the utility matrix of trust margins is introduced to the model of multilevel elitist roles to enhance various elitist roles’ performance of searching the optimal feature subsets, and the win-win utility solutions for feature selec-tion can be attained. Meanwhile, a novel ensemble quantum game strategy is designed as an intriguing exhibiting structure to perfect the dynamics equilibrium of multilevel elitist roles. Final y, the en-semble manner of multilevel elitist roles is employed to achieve the global minimal feature subset, which wil greatly improve the fea-sibility and effectiveness. Experiment results show the proposed EERQG algorithm has superiority compared to the existing feature selection algorithms.

  4. 融合PLS监督特征提取和虚假最近邻点的数据分类特征选择%Feature selection for data classification based on pls supervised feature extraction and false nearest neighbors

    Institute of Scientific and Technical Information of China (English)

    颜克胜; 李太福; 魏正元; 苏盈盈; 姚立忠

    2012-01-01

    The classifier is often led to the problem of low recognition accuracy and time and space overhead, due to the multicollinearity and redundant features and noise in the classification of high dimensional data. A feature selection method based on partial least squares(PLS) and false nearest neighbors(FNN) is proposed. Firstly, the partial least squares method is employed to extract the principal components of high-dimensional data and overcome difficulties encountered with the existing multicollinearity between the original features, and the independent principal components space which carries supervision information could be obtained. Then, the similarity measure based on FNN would be established by calculating the correlation in this space before and after each feature selection, furthermore, gets the original features ranking of interpretation to the dependent variable. Finally, the features which have weak explanatory ability could be removed in turn to construct various classification models, and uses recognition rate of Support Vector Machine(SVM) as a evaluation criterion of models to search out the classification model which not only has the highest recognition rate, but also contains the least number of features, the best feature subset is the just model. A series of experiments from different data models have been conducted. The simulation results show that this method has a good capability to select the best feature subset which is consistent with the nature of classification feature for the data set. Therefore, the research provides a new approach to the feature selection of data classification.%在高维数据分类中,针对多重共线性、冗余特征及噪声易导致分类器识别精度低和时空开销大的问题,提出融合偏最小二乘(Partial Least Squares,PLS)有监督特征提取和虚假最近邻点(False Nearest Neighbors,FNN)的特征选择方法:首先利用偏最小二乘对高维数据提取主元,消除特征之间的多重共

  5. Feature Selection Algorithm Based on Neighborhood Decision Distinguishing Rate%采用邻域决策分辨率的特征选择算法

    Institute of Scientific and Technical Information of China (English)

    诸文智; 司刚全; 张彦斌

    2013-01-01

    针对目前基于粗糙集模型的特征选择算法无法直接应用于数值型数据、必须经过离散化过程而造成决策信息丢失的问题,提出了一种基于邻域决策分辨率的特征选择算法.该算法根据邻域信息粒中决策分布与其分类能力间的关系,提出了邻域决策确定性(Nc)来衡量单个信息粒的决策分辨能力;并根据特征向量空间上所有信息粒所具有的Nc累加值,定义了邻域决策分辨率作为特征子集上决策可分辨性的量度,从而将名义型和数值型数据统一在同一特征选择算法框架下.仿真实验和实际应用的结果表明,该算法性能优于目前主流基于邻域粗糙集的特征选择方法.%The current feature selection algorithms based on the neighborhood rough set (NRS) model are unable to evaluate numerical dataset directly, a discretization procedure becomes necessary to transform the datasets into discrete forms, but inevitably leads to useful decision information loss. To solve this difficulty, a feature selection algorithm based on the neighborhood effective information rate is proposed. In view point of granulated neighborhood, the relation between the decision discernibility and the decision distribution is analyzed, and the neighborhood decision certainty (Nc) is defined to indicate the degree of distinguishing capability in each individual neighborhood granule. The neighborhood decision distinguishing rate (NDDR) of the feature subset, which evaluates the ability of the subspace to approximate decision space, is established based on the sum of the Nc values of the information granules induced by the corresponding feature space. Then the nominal and numerical datasets can be integrated into the same feature selection algorithm framework. The simulation and application illustrate that the proposed algorithm outperforms the other NRS-based ones.

  6. 基于Relief和SVM-RFE的组合式SNP特征选择%Combined SNP feature selection based on relief and SVM-RFE

    Institute of Scientific and Technical Information of China (English)

    吴红霞; 吴悦; 刘宗田; 雷州

    2012-01-01

    The genome-wide association study (CWAS)on SNPs faces two big issues; high dimensional SNP data with small sample characteristics and complex mechanisms of genetic diseases. Tin's paper proposed a combined SNP feature selection method through bring feature selection methods into GWAS. The method included two stages: filter stage, it used Relief algorithm to eliminate irrelevant SNP features, wrapper stage, it used support vector machine based recursive feature reduction (SVM-RFE) algorithm to select the key SNPs set. Experiments show that the proposed method has an obviously better performance than SVM RFE algorithm, and also gains higher classification accuracy than Relief-SVM algorithm, which provides an effective way for SNP genome-wide association analysis.%针对SNP的全基因组关联分析面临SNP数据的高维小样本特性和遗传疾病病理的复杂性两大难点,将特征选择引入SNP全基因组关联分析中,提出基于Relief和SVM-RFE的组合式SNP特征选择方法.该方法包括两个阶段:Filter阶段,使用Relief算法剔除无关SNPs;Wrapper阶段,使用基于支持向量机的特征递归消减方法( SVM-RFE)筛选出与遗传疾病相关的关键SNPs.实验表明,该方法具有明显优于单独使用SVM-RFE算法的性能,优于单独使用Relief-SVM算法的分类准确率,为SNP全基因组关联分析提供了一种有效途径.

  7. A Feature Selection Method Based on Maximal Marginal Relevance%一种基于最大边缘相关的特征选择方法

    Institute of Scientific and Technical Information of China (English)

    刘赫; 张相洪; 刘大有; 李燕军; 尹立军

    2012-01-01

    文本分类的特点是高维的特征空间和高度的特征冗余.针对这两个特点,采用x2统计量处理高维的特征空间,利用信息新颖度的思想处理高度的特征冗余,根据最大边缘相关的定义,将二者有机结合,提出一种基于最大边缘相关的特征选择方法.该方法可以在特征选择过程中减少大量的冗余特征.最后,在Reuters-21578 Topl0和OHSCAL两个文本数据集上进行实验.实验结果表明,基于最大边缘相关的特征选择方法比x2统计量和信息增益两种特征选择方法更高效,并且能够提高naive Bayes,Rocchio和kNN 3种不同分类器的性能.%With the rapid growth of textual information on the Internet, text categorization has already been one of the key research directions in data mining. Text categorization is a supervised learning process, defined as automatically distributing free text into one or more predefined categories. At the present, text categorization is necessary for managing textual information and has been applied into many fields. However, text categorization has two characteristics: high dimensionality of feature space and high level of feature redundancy. For the two characteristics, X is used to deal with high dimensionality of feature space, and information novelty is used to deal with high level of feature redundancy. According to the definition of maximal marginal relevance, a feature selection method based on maximal marginal relevance is proposed, which can reduce redundancy between features in the process of feature selection. Furthermore, the experiments are carried out on two text data sets, namely, Reuters-21578 ToplO and OHSCAL. The results indicate that the featureselection method based on maximal marginal relevance is more efficient than X and information gain. Moveover it can improve the performance of three different categorizers, namely, naive Bayes, Rocchio and k NN.

  8. 一种新的基于特征选择的虹膜识别方法%A Novel Iris Recognition Method Based on Feature Selection

    Institute of Scientific and Technical Information of China (English)

    姚明海; 王娜; 李劲松

    2014-01-01

    为了提高虹膜识别的准确率,提出了一种新的基于特征选择的虹膜识别方法。在虹膜的定位上采用了弹性模板的方法,对虹膜图像进行有效定位。针对虹膜图像的纹理分布特点,采用了多尺度Gabor滤波器对虹膜的不同纹理区域进行有针对性的特征提取;然后利用遗传算法和粒子群优化算法进行特征选择,去除特征向量中的冗余信息;最后利用SVM分类模型进行虹膜的识别。为了检验方法的有效性,在CASIA虹膜数据库上进行验证,实验结果表明该方法具有较高的识别精准度。%In order to improve the accuracy of iris recognition,a novel iris recognition method based on feature selection is proposed.U-sing the elastic template for locating efficiently iris image.Aiming at texture distribution characteristics of iris image,multi-scale Gabor filter is used to extract feature in different texture region of iris.Then,use genetic algorithm and particle swarm optimization algorithm for feature selection,removal of redundant information in the feature vector.Finally,the SVM classification model is used for iris recognition. In order to test the validity of the method,the method is verified in the CASIA iris database,the experimental results show that this meth-od has high recognition accuracy.

  9. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  10. Network Intrusion Detection Based on Features Selecting and Samples Selecting%基于特征选取和样本选择的网络入侵检测

    Institute of Scientific and Technical Information of China (English)

    马世欢; 胡彬

    2015-01-01

    In order to obtain a more ideal network intrusion detection results, according to the network intrusion feature selection and sample selection problem, this paper proposes a network intrusion detection model based on features selecting and samples selecting. Firstly, the features of network intrusion are extracted, and normalized, and secondly kernel principal component analysis is used to select intrusion features, and the samples are selection, finally, extreme learning machine is used to set up network intrusion detection classifier, and the simulation experiments are carried out with KDD Cup99 data. The simulation results show that that the proposed model has been better network intrusion detection results, the detection rate is above 95%, the efficiency of intrusion detection can meet the requirements of network security protection.%为了获得更加理想的网络入侵检测结果,针对网络入侵特征选取和样本选择问题,提出一种基于特征选取和样本选择的网络入侵检测模型。首先提取网络入侵特征,并进行归一化处理,然后采用核主成分分析选择入侵特征,并对样本进行选择,最后采用极限学习机建立网络入侵检测分类器,并采用KDD Cup99数据集进行仿真实验。仿真结果表明,本文模型得到了理想的网络入侵检测结果,检测率超过95%以上,入侵检测效率可以满足网络安全实际应用要求。

  11. Fibre selection based on an overall analytical feature comparison for the solid-phase microextraction of trihalomethanes from drinking water.

    Science.gov (United States)

    San Juan, Pedro Manuel; Carrillo, José David; Tena, María Teresa

    2007-01-12

    This paper describes the optimization of solid-phase microextraction (SPME) conditions for three different fibres (Carboxen-polydimethylsiloxane (CAR-PDMS), divinylbenzene-Carboxen-polydimethylsiloxane (DVB-CAR-PDMS) and polydimethylsiloxane-divinylbenzene (PDMS-DVB)) used to determine trihalomethanes (THMs) in water by headspace solid-phase microextraction and gas chromatography (HS-SPME-GC). The influence of temperature and salting-out effect was examined using a central composite design for each fibre. Extraction time was studied separately at the optimum values found for temperature and sodium chloride concentration (40 degrees C and 0.36g mL-1). The HS-SPME-GC-MS method for each fibre was characterised in terms of linearity, detection (LOD) and quantification (LOQ) limits and repeatability. The fibre PDMS-DVB was selected as it provided a broader linear range, better repeatability and lower detection and quantification limits than the others, particularly CAR-PDMS fibre. The accuracy of the proposed method using the PDMS-DVB fibre was checked by a recovery study in both ultrapure and tap water. A blank analysis study showed the absence of memory effects for this fibre. The reproducibility (expressed as a percentage of relative standard deviation) was 6-11% and the detection limits were between 0.078 and 0.52microgL-1 for bromoform and chloroform, respectively. Finally, the method was applied to determine THM concentration in two drinking water samples. PMID:17109874

  12. Effective feature selection for image steganalysis using extreme learning machine

    Science.gov (United States)

    Feng, Guorui; Zhang, Haiyan; Zhang, Xinpeng

    2014-11-01

    Image steganography delivers secret data by slight modifications of the cover. To detect these data, steganalysis tries to create some features to embody the discrepancy between the cover and steganographic images. Therefore, the urgent problem is how to design an effective classification architecture for given feature vectors extracted from the images. We propose an approach to automatically select effective features based on the well-known JPEG steganographic methods. This approach, referred to as extreme learning machine revisited feature selection (ELM-RFS), can tune input weights in terms of the importance of input features. This idea is derived from cross-validation learning and one-dimensional (1-D) search. While updating input weights, we seek the energy decreasing direction using the leave-one-out (LOO) selection. Furthermore, we optimize the 1-D energy function instead of directly discarding the least significant feature. Since recent Liu features can gain considerable low detection errors compared to a previous JPEG steganalysis, the experimental results demonstrate that the new approach results in less classification error than other classifiers such as SVM, Kodovsky ensemble classifier, direct ELM-LOO learning, kernel ELM, and conventional ELM in Liu features. Furthermore, ELM-RFS achieves a similar performance with a deep Boltzmann machine using less training time.

  13. Feature Selection for Audio Surveillance in Urban Environment

    Directory of Open Access Journals (Sweden)

    KIKTOVA Eva

    2014-05-01

    Full Text Available This paper presents the work leading to the acoustic event detection system, which is designed to recognize two types of acoustic events (shot and breaking glass in urban environment. For this purpose, a huge front-end processing was performed for the effective parametric representation of an input sound. MFCC features and features computed during their extraction (MELSPEC and FBANK, then MPEG-7 audio descriptors and other temporal and spectral characteristics were extracted. High dimensional feature sets were created and in the next phase reduced by the mutual information based selection algorithms. Hidden Markov Model based classifier was applied and evaluated by the Viterbi decoding algorithm. Thus very effective feature sets were identified and also the less important features were found.

  14. Feature selection versus feature compression in the building of calibration models from FTIR-spectrophotometry datasets.

    Science.gov (United States)

    Vergara, Alexander; Llobet, Eduard

    2012-01-15

    Undoubtedly, FTIR-spectrophotometry has become a standard in chemical industry for monitoring, on-the-fly, the different concentrations of reagents and by-products. However, representing chemical samples by FTIR spectra, which spectra are characterized by hundreds if not thousands of variables, conveys their own set of particular challenges because they necessitate to be analyzed in a high-dimensional feature space, where many of these features are likely to be highly correlated and many others surely affected by noise. Therefore, identifying a subset of features that preserves the classifier/regressor performance seems imperative prior any attempt to build an appropriate pattern recognition method. In this context, we investigate the benefit of utilizing two different dimensionality reduction methods, namely the minimum Redundancy-Maximum Relevance (mRMR) feature selection scheme and a new self-organized map (SOM) based feature compression, coupled to regression methods to quantitatively analyze two-component liquid samples utilizing FTIR spectrophotometry. Since these methods give us the possibility of selecting a small subset of relevant features from FTIR spectra preserving the statistical characteristics of the target variable being analyzed, we claim that expressing the FTIR spectra by these dimensionality-reduced set of features may be beneficial. We demonstrate the utility of these novel feature selection schemes in quantifying the distinct analytes within their binary mixtures utilizing a FTIR-spectrophotometer.

  15. Unsupervised Feature Selection for Latent Dirichlet Allocation

    Institute of Scientific and Technical Information of China (English)

    Xu Weiran; Du Gang; Chen Guang; Guo Jun; Yang Jie

    2011-01-01

    As a generative model Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to distinguish between “general word” and “special word” in LDA topics.Therefore,we add a constraint to the LDA objective function to let the “general words” only happen in “general topics”other than “special topics”.Then a heuristic algorithm is presented to get the solution.Experiments show that this method can not only improve the information gain of topics,but also make the topics easier to understand by human.

  16. Improving Recognition and Identification of Facial Areas Involved in Non-verbal Communication by Feature Selection

    OpenAIRE

    Sheerman-Chase, T; Ong, E-J; Pugeault, N; Bowden, R.

    2013-01-01

    Meaningful Non-Verbal Communication (NVC) signals can be recognised by facial deformations based on video tracking. However, the geometric features previously used contain a significant amount of redundant or irrelevant information. A feature selection method is described for selecting a subset of features that improves performance and allows for the identification and visualisation of facial areas involved in NVC. The feature selection is based on a sequential backward elimination of features ...

  17. Binary classification of chalcone derivatives with LDA or KNN based on their antileishmanial activity and molecular descriptors selected using the Successive Projections Algorithm feature-selection technique.

    Science.gov (United States)

    Goodarzi, Mohammad; Saeys, Wouter; de Araujo, Mario Cesar Ugulino; Galvão, Roberto Kawakami Harrop; Vander Heyden, Yvan

    2014-01-23

    Chalcones are naturally occurring aromatic ketones, which consist of an α-, β-unsaturated carbonyl system joining two aryl rings. These compounds are reported to exhibit several pharmacological activities, including antiparasitic, antibacterial, antifungal, anticancer, immunomodulatory, nitric oxide inhibition and anti-inflammatory effects. In the present work, a Quantitative Structure-Activity Relationship (QSAR) study is carried out to classify chalcone derivatives with respect to their antileishmanial activity (active/inactive) on the basis of molecular descriptors. For this purpose, two techniques to select descriptors are employed, the Successive Projections Algorithm (SPA) and the Genetic Algorithm (GA). The selected descriptors are initially employed to build Linear Discriminant Analysis (LDA) models. An additional investigation is then carried out to determine whether the results can be improved by using a non-parametric classification technique (One Nearest Neighbour, 1NN). In a case study involving 100 chalcone derivatives, the 1NN models were found to provide better rates of correct classification than LDA, both in the training and test sets. The best result was achieved by a SPA-1NN model with six molecular descriptors, which provided correct classification rates of 97% and 84% for the training and test sets, respectively.

  18. Feature-Based Classification of Networks

    CERN Document Server

    Barnett, Ian; Kuijjer, Marieke L; Mucha, Peter J; Onnela, Jukka-Pekka

    2016-01-01

    Network representations of systems from various scientific and societal domains are neither completely random nor fully regular, but instead appear to contain recurring structural building blocks. These features tend to be shared by networks belonging to the same broad class, such as the class of social networks or the class of biological networks. At a finer scale of classification within each such class, networks describing more similar systems tend to have more similar features. This occurs presumably because networks representing similar purposes or constructions would be expected to be generated by a shared set of domain specific mechanisms, and it should therefore be possible to classify these networks into categories based on their features at various structural levels. Here we describe and demonstrate a new, hybrid approach that combines manual selection of features of potential interest with existing automated classification methods. In particular, selecting well-known and well-studied features that ...

  19. An Improved Particle Swarm Optimization for Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Yuanning Liu; Gang Wang; Huiling Chen; Hao Dong; Xiaodong Zhu; Sujing Wang

    2011-01-01

    Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-swarm strategy have been reported. However, the competition among swarms, reservation or destruction of a swarm, has not been considered further. In this paper, we formulate four rules by introducing the mechanism for survival of the fittest, which simulates the competition among the swarms. Based on the mechanism, we design a modified Multi-Swarm PSO (MSPSO) to solve discrete problems,which consists of a number of sub-swarms and a multi-swarm scheduler that can monitor and control each sub-swarm using the rules. To further settle the feature selection problems, we propose an Improved Feature Selection (IFS) method by integrating MSPSO, Support Vector Machines (SVM) with F-score method. The IFS method aims to achieve higher generalization capability through performing kernel parameter optimization and feature selection simultaneously. The performance of the proposed method is compared with that of the standard PSO based, Genetic Algorithm (GA) based and the grid search based methods on 10 benchmark datasets, taken from UCI machine learning and StatLog databases. The numerical results and statistical analysis show that the proposed IFS method performs significantly better than the other three methods in terms of prediction accuracy with smaller subset of features.

  20. Towards literature-based feature selection for diagnostic classification: A meta-analysis of resting-state fMRI in depression

    Directory of Open Access Journals (Sweden)

    Benedikt eSundermann

    2014-09-01

    Full Text Available Information derived from functional magnetic resonance imaging (fMRI during wakeful rest has been introduced as a candidate diagnostic biomarker in unipolar major depressive disorder (MDD. Multiple reports of resting state fMRI in MDD describe group effects. Such prior knowledge can be adopted to pre-select potentially discriminating features for diagnostic classification models with the aim to improve diagnostic accuracy. Purpose of this analysis was to consolidate spatial information about alterations of spontaneous brain activity in MDD, primarily to serve as feature selection for multivariate pattern analysis techniques (MVPA. 32 studies were included in final analyses. Coordinates extracted from the original reports were assigned to two categories based on directionality of findings. Meta-analyses were calculated using the non-additive activation likelihood estimation approach with coordinates organized by subject group to account for non-independent samples. Converging evidence revealed a distributed pattern of brain regions with increased or decreased spontaneous activity in MDD. The most distinct finding was hyperactivity/hyperconnectivity presumably reflecting the interaction of cortical midline structures (posterior default mode network components including the precuneus and neighboring posterior cingulate cortices associated with self-referential processing and the subgenual anterior cingulate and neighboring medial frontal cortices with lateral prefrontal areas related to externally-directed cognition. Other areas of hyperactivity/hyperconnectivity include the left lateral parietal cortex, right hippocampus and right cerebellum whereas hypoactivity/hypoconnectivity was observed mainly in the left temporal cortex, the insula, precuneus, superior frontal gyrus, lentiform nucleus and thalamus. Results are made available in two different data formats to be used as spatial hypotheses in future studies, particularly for diagnostic

  1. A fast one-pass-training feature selection technique for GMM-based acoustic event detection with audio-visual data

    OpenAIRE

    Butko, Taras; Nadeu Camprubí, Climent

    2010-01-01

    Acoustic event detection becomes a difficult task, even for a small number of events, in scenarios where events are produced rather spontaneously and often overlap in time. In this work, we aim to improve the detection rate by means of feature selection. Using a one-against-all detection approach, a new fast one-pass-training algorithm, and an associated highly-precise metric are developed. Choosing a different subset of multimodal features for each acoustic event class, the results obtain...

  2. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Prabir Bhattacharya

    2008-06-01

    Full Text Available The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  3. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Roy Kaushik

    2008-01-01

    Full Text Available Abstract The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  4. 基于改进蝙蝠算法的红外光谱特征选择%Feature selection of infrared spectrum based on improved bat algorithm

    Institute of Scientific and Technical Information of China (English)

    陈媛媛; 王志斌; 王召巴

    2014-01-01

    特征选择是红外光谱定性与定量分析中的重要环节之一。为了解决传统特征选择方法可调参数多、收敛速度慢、精度低、易早熟等不足,对基本蝙蝠算法进行了离散化改进以适用于离散优化问题,同时结合 Lévy 飞行搜索策略,提出了一种新型的红外光谱特征选择算法。采用三个红外光谱数据集对提出的算法进行了验证,同时与遗传算法、模拟退火算法、无信息变量消除法等进行了比较分析。实验结果显示,该方法可以快速地搜索到全局最优值,能有效地提高波长选择的准确性和稳定性,被选择的波长物理、化学意义明确,采用选择的特征波段建立的定量模型优于用全谱建立的模型。同时,三个不同相态、不同光谱范围的数据集表明,所提出的算法具有较大的适用范围与实用价值。%Feature selection is an important part during the process of qualitative and quantitative analysis of infrared spectrum. In order to solve the disadvantage of traditional methods, such as multi-parameters, slow convergence, poor accuracy, prone to premature, etc., a novel feature selection algorithm was proposed, which combined the basic bat algorithm and Lévy flights search strategy. Meanwhile, due to the original version of bat algorithm was only suitbale for continuous problems, a binary version of bat algorithm was proposed. Three infrared spectrum datasets were used to check the performance of proposed method while the comparisons with traditional genetic algorithm, simulate anneal algorithm and uninformative variable elimination methods were also implemented. The experiment results show that, the proposed method can quickly find the global best combination of sub-intervals and improve the accuracy and stability of feature selection. More importantly, the selected wavenumbers have exactly physical meanings. Meanwhile, the generalized performance of the model

  5. Improving the classification of nuclear receptors with feature selection.

    Science.gov (United States)

    Gao, Qing-Bin; Jin, Zhi-Chao; Ye, Xiao-Fei; Wu, Cheng; Lu, Jian; He, Jia

    2009-01-01

    Nuclear receptors are involved in multiple cellular signaling pathways that affect and regulate processes. Because of their physiology and pathophysiology significance, classification of nuclear receptors is essential for the proper understanding of their functions. Bhasin and Raghava have shown that the subfamilies of nuclear receptors are closely correlated with their amino acid composition and dipeptide composition [29]. They characterized each protein by a 400 dimensional feature vector. However, using high dimensional feature vectors for characterization of protein sequences will increase the computational cost as well as the risk of overfitting. Therefore, using only those features that are most relevant to the present task might improve the prediction system, and might also provide us with some biologically useful knowledge. In this paper a feature selection approach was proposed to identify relevant features and a prediction engine of support vector machines was developed to estimate the prediction accuracy of classification using the selected features. A reduced subset containing 30 features was accepted to characterize the protein sequences in view of its good discriminative power towards the classes, in which 18 are of amino acid composition and 12 are of dipeptide composition. This reduced feature subset resulted in an overall accuracy of 98.9% in a 5-fold cross-validation test, higher than 88.7% of amino acid composition based method and almost as high as 99.3% of dipeptide composition based method. Moreover, an overall accuracy of 93.7% was reached when it was evaluated on a blind data set of 63 nuclear receptors. On the other hand, an overall accuracy of 96.1% and 95.2% based on the reduced 12 dipeptide compositions was observed simultaneously in the 5-fold cross-validation test and the blind data set test, respectively. These results demonstrate the effectiveness of the present method. PMID:19601913

  6. Information based universal feature extraction

    Science.gov (United States)

    Amiri, Mohammad; Brause, Rüdiger

    2015-02-01

    In many real world image based pattern recognition tasks, the extraction and usage of task-relevant features are the most crucial part of the diagnosis. In the standard approach, they mostly remain task-specific, although humans who perform such a task always use the same image features, trained in early childhood. It seems that universal feature sets exist, but they are not yet systematically found. In our contribution, we tried to find those universal image feature sets that are valuable for most image related tasks. In our approach, we trained a neural network by natural and non-natural images of objects and background, using a Shannon information-based algorithm and learning constraints. The goal was to extract those features that give the most valuable information for classification of visual objects hand-written digits. This will give a good start and performance increase for all other image learning tasks, implementing a transfer learning approach. As result, in our case we found that we could indeed extract features which are valid in all three kinds of tasks.

  7. Spatial selection of features within perceived and remembered objects

    Directory of Open Access Journals (Sweden)

    Duncan E Astle

    2009-04-01

    Full Text Available Our representation of the visual world can be modulated by spatially specific attentional biases that depend flexibly on task goals. We compared searching for task-relevant features in perceived versus remembered objects. When searching perceptual input, selected task-relevant and suppressed task-irrelevant features elicited contrasting spatiotopic ERP effects, despite them being perceptually identical. This was also true when participants searched a memory array, suggesting that memory had retained the spatial organisation of the original perceptual input and that this representation could be modulated in a spatially specific fashion. However, task-relevant selection and task-irrelevant suppression effects were of the opposite polarity when searching remembered compared to perceived objects. We suggest that this surprising result stems from the nature of feature- and object-based representations when stored in visual short-term memory. When stored, features are integrated into objects, meaning that the spatially specific selection mechanisms must operate upon objects rather than specific feature-level representations.

  8. Magnetic field feature extraction and selection for indoor location estimation.

    Science.gov (United States)

    Galván-Tejada, Carlos E; García-Vázquez, Juan Pablo; Brena, Ramon F

    2014-01-01

    User indoor positioning has been under constant improvement especially with the availability of new sensors integrated into the modern mobile devices, which allows us to exploit not only infrastructures made for everyday use, such as WiFi, but also natural infrastructure, as is the case of natural magnetic field. In this paper we present an extension and improvement of our current indoor localization model based on the feature extraction of 46 magnetic field signal features. The extension adds a feature selection phase to our methodology, which is performed through Genetic Algorithm (GA) with the aim of optimizing the fitness of our current model. In addition, we present an evaluation of the final model in two different scenarios: home and office building. The results indicate that performing a feature selection process allows us to reduce the number of signal features of the model from 46 to 5 regardless the scenario and room location distribution. Further, we verified that reducing the number of features increases the probability of our estimator correctly detecting the user's location (sensitivity) and its capacity to detect false positives (specificity) in both scenarios. PMID:24955944

  9. 基于情感主题模型的特征选择方法%Feature selection algorithm based on sentiment topic model

    Institute of Scientific and Technical Information of China (English)

    郑妍; 庞琳; 毕慧; 刘玮; 程工

    2014-01-01

    意见挖掘在企业智能分析、政府舆情分析等领域发挥着重要作用,为了充分挖掘主观性文本所蕴含的商业价值和社会价值,提出了一种基于情感主题模型的特征选择方法。该方法重点考察极性词及其共现现象,采用主题模型挖掘出正面褒义主题和负面贬义主题中极性词的分布情况,旨在度量情感特征在情感倾向表达中的重要性。实验阶段结合支持向量机分类器进行分析。实验表明该特征选择方法能有效提高跨领域文本情感分类准确性,具有较好的实用价值。%In order to exert potential commercial value and social value of subjectivity text in enterprise business intelli-gence and public opinion survey and so on,a novel feature selection algorithm based on sentiment topic model was pro-posed,which takes both opinion term and opinion co-occurrence term into consideration to help topic modeling,and then the conditional distributions of opinion term in positive topic and negative topic were effectively estimated.This method tries to measure the importance of opinion feature in sentiment orientation.SVM was used in the experimental stage for classification.The experiment result shows that the algorithm has a higher recognition ratio and offers practical capabilities for cross-domain.

  10. Efficient Generation and Selection of Combined Features for Improved Classification

    KAUST Repository

    Shono, Ahmad N.

    2014-05-01

    This study contributes a methodology and associated toolkit developed to allow users to experiment with the use of combined features in classification problems. Methods are provided for efficiently generating combined features from an original feature set, for efficiently selecting the most discriminating of these generated combined features, and for efficiently performing a preliminary comparison of the classification results when using the original features exclusively against the results when using the selected combined features. The potential benefit of considering combined features in classification problems is demonstrated by applying the developed methodology and toolkit to three sample data sets where the discovery of combined features containing new discriminating information led to improved classification results.

  11. Mutual information criterion for feature selection with application to classification of breast microcalcifications

    Science.gov (United States)

    Diamant, Idit; Shalhon, Moran; Goldberger, Jacob; Greenspan, Hayit

    2016-03-01

    Classification of clustered breast microcalcifications into benign and malignant categories is an extremely challenging task for computerized algorithms and expert radiologists alike. In this paper we present a novel method for feature selection based on mutual information (MI) criterion for automatic classification of microcalcifications. We explored the MI based feature selection for various texture features. The proposed method was evaluated on a standardized digital database for screening mammography (DDSM). Experimental results demonstrate the effectiveness and the advantage of using the MI-based feature selection to obtain the most relevant features for the task and thus to provide for improved performance as compared to using all features.

  12. Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data

    Directory of Open Access Journals (Sweden)

    Lijun Wang

    2013-01-01

    Full Text Available Brain decoding with functional magnetic resonance imaging (fMRI requires analysis of complex, multivariate data. Multivoxel pattern analysis (MVPA has been widely used in recent years. MVPA treats the activation of multiple voxels from fMRI data as a pattern and decodes brain states using pattern classification methods. Feature selection is a critical procedure of MVPA because it decides which features will be included in the classification analysis of fMRI data, thereby improving the performance of the classifier. Features can be selected by limiting the analysis to specific anatomical regions or by computing univariate (voxel-wise or multivariate statistics. However, these methods either discard some informative features or select features with redundant information. This paper introduces the principal feature analysis as a novel multivariate feature selection method for fMRI data processing. This multivariate approach aims to remove features with redundant information, thereby selecting fewer features, while retaining the most information.

  13. Unsupervised Feature Selection Based on Locality Preserving Projection and Sparse Representation%基于局部保持投影和稀疏表示的无监督特征选择方法

    Institute of Scientific and Technical Information of China (English)

    简彩仁; 陈晓云

    2015-01-01

    Traditional filter-based feature selection methods calculate some scores of each feature independently to select features in a statistical or geometric perspective only, however, they ignore the correlation of different features. To solve this problem, an unsupervised feature selection method based on locality preserving projection and sparse representation is proposed. The nonnegativity and sparsity of feature weights are limited to select features in the proposed method. The experimental results on 4 gene expression datasets and 2 image datasets show that the method is effective.%传统基于过滤的特征选择方法仅从统计或几何角度分别对数据集的每个特征计算某种得分选择特征,而忽略不同特征之间存在的联系。为解决该问题,利用局部保持投影和稀疏表示的优点,提出新的无监督特征选择算法。该方法通过限制特征权重的非负性和稀疏性选择特征。在4个基因表达数据集和2个图像数据集上的实验表明该方法是有效的。

  14. Clustering-based Improved K-means Text Feature Selection%聚类模式下一种优化的K-means文本特征选择

    Institute of Scientific and Technical Information of China (English)

    刘海峰; 刘守生; 张学仁

    2011-01-01

    文本特征降维是文本自动分类的核心技术.K-means方法是一种常用的基于划分的方法.针对该算法对类中心初始值及孤立点过于敏感的问题,提出了一种改进的K-means算法用于文本特征选择.通过优化初始类中心的选择模式及对孤立点的剔除,改善了文本特征聚类的效果.随后的文本分类试验表明,提出的改进K-means算法具有较好的特征选择能力,文本分类的效率较高.%Text feature reduction is the key technology in text categorization.In addition, K-means is an partitioning method which usually be used.With regards to this arithmetic excessively incentive to the initial centers and the isolated points, the improved K-means arithmetic was put forward which is used in text feature selection.Text feature clustering was improved by optimizing primitive class center's options and the elimination of isolated point.Following text classification test shows that the K-means arithmetic put forward in this paper has a good feature selection ability and high efficiency in text categorization.

  15. Online Feature Selection of Class Imbalance via PA Algorithm

    Institute of Scientific and Technical Information of China (English)

    Chao Han; Yun-Kun Tan; Jin-Hui Zhu; Yong Guo; Jian Chen; Qing-Yao Wu

    2016-01-01

    Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.

  16. AIS TLS-ESPRIT feature selection for prostate tissue characterization

    Science.gov (United States)

    Mohamed, S. S.; Youssef, A. M.; El-Saadany, E. F.; Salama, M. M. A.

    2006-03-01

    The work in this paper aims for analyzing spectral features of the prostate using Trans-Rectal Ultra-Sound images (TRUS) for tissue classification. This research is expected to augment beginner radiologists' decision with the experience of more experienced radiologists. Moreover, Since, in some situations the biopsy results in false negatives due to inaccurate biopsy locations, therefore this research also aims to assist in determining the biopsy locations to decrease the false negative results. In this paper, a new technique for prostate tissue characterization is developed. The proposed system is composed of four stages. The first stage is automatically identifying Regions Of Interest (ROIs). This is achieved using the Gabor multiresolution analysis method, where preliminary regions are identified using the frequency response of the pixels, pixels that have the same response to the same filter are assigned to the same cluster. Next, the radiologist knowledge is integrated to the system to select the most suspicious ROIs among the prelimianry identified regions. The second stage is constructing the spectral features from the identified ROIs. The proposed technique is based on a novel spectral feature set for the TRUS images using the Total Least Square Estimation of Signal Parameters via Rotational Invariance Techniques (TLS-ESPRIT). Classifier based feature selection is then performed to select the most salient features using the recently proposed Artificial Immune System (AIS) optimization technique. Finally, Support Vector Machine (SVM) classifier is used as an accuracy measure, our proposed system obtains a classification accuracy of 94.4%, with 100% sensitivity and 83.3% sensetivity.

  17. Feature selection for facial expression recognition using deformation modeling

    Science.gov (United States)

    Srivastava, Ruchir; Sim, Terence; Yan, Shuicheng; Ranganath, Surendra

    2010-02-01

    Works on Facial Expression Recognition (FER) have mostly been done using image based approaches. However, in recent years, researchers have also been trying to explore the use of 3D information for the task of FER. Most of the time, there is a need for having a neutral (expressionless) face of the subject in both the image based and 3D model based approaches. However, this might not be practical in many applications. This paper tries to address this limitations in previous works by proposing a novel technique of feature extraction which does not require any neutral face of the subjects. It has been proposed and validated experimentally that the motion of some landmark points on the face, in exhibiting a particular facial expression, is similar in different persons. Separate classifier is made and relevant feature points are selected for each expression. One vs all SVM classification gives promising results.

  18. An ensemble approach for feature selection of Cyber Attack Dataset

    CERN Document Server

    Singh, Shailendra

    2009-01-01

    Feature selection is an indispensable preprocessing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the Knearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset.

  19. Comparing observer models and feature selection methods for a task-based statistical assessment of digital breast tomsynthesis in reconstruction space

    Science.gov (United States)

    Park, Subok; Zhang, George Z.; Zeng, Rongping; Myers, Kyle J.

    2014-03-01

    A task-based assessment of image quality1 for digital breast tomosynthesis (DBT) can be done in either the projected or reconstructed data space. As the choice of observer models and feature selection methods can vary depending on the type of task and data statistics, we previously investigated the performance of two channelized- Hotelling observer models in conjunction with 2D Laguerre-Gauss (LG) and two implementations of partial least squares (PLS) channels along with that of the Hotelling observer in binary detection tasks involving DBT projections.2, 3 The difference in these observers lies in how the spatial correlation in DBT angular projections is incorporated in the observer's strategy to perform the given task. In the current work, we extend our method to the reconstructed data space of DBT. We investigate how various model observers including the aforementioned compare for performing the binary detection of a spherical signal embedded in structured breast phantoms with the use of DBT slices reconstructed via filtered back projection. We explore how well the model observers incorporate the spatial correlation between different numbers of reconstructed DBT slices while varying the number of projections. For this, relatively small and large scan angles (24° and 96°) are used for comparison. Our results indicate that 1) given a particular scan angle, the number of projections needed to achieve the best performance for each observer is similar across all observer/channel combinations, i.e., Np = 25 for scan angle 96° and Np = 13 for scan angle 24°, and 2) given these sufficient numbers of projections, the number of slices for each observer to achieve the best performance differs depending on the channel/observer types, which is more pronounced in the narrow scan angle case.

  20. 基于自适应多种群遗传算法的特征选择%Feature selection based on adaptive multi-population genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    刘元宁; 王刚; 朱晓冬; 赵正东; 陈慧灵; 邢翀

    2011-01-01

    An Adaptive Multi-population Genetic Algorithm(AMGA) was proposed,which was applied to feature selection and to find the optimal feature subset from high dimensional feature sets.AMGA consists of a Multi-population Planning(MPP) module and a Dynamic Selection Algorithm(DSA),both being self-designed.The proposed method extends the search space and adjusts the running states of multi-population,thereby controlling the premature convergence and increasing the local search capacity for the premature convergence.Segment from UCI and StatLog data sets were used to evaluate the proposed method.Results show that,comparing with standard genetic algorithm,the proposed method obtained an ideal result with less selected features and higher classification accuracy.The proposed method can be widely applied in the field of feature selection.%针对标准遗传算法早熟收敛和局部搜索能力弱的缺点,提出了一种自适应多种群的遗传算法(AMGA),包含了多种群规划模型(MPP)和动态选择操作算法(DSA),应用于特征选择处理,从多维特征集合中寻找最优的特征子集。该方法扩展了搜索空间,自适应地调整多个种群的运行状态,有效地控制早熟收敛,增强了局部搜索能力。最后,将本文方法与标准遗传算法的试验结果进行比较,表明本文算法选择的特征数量较少、分类精度较高,可广泛应用于特征选择领域。

  1. Geochemical dynamics in selected Yellowstone hydrothermal features

    Science.gov (United States)

    Druschel, G.; Kamyshny, A.; Findlay, A.; Nuzzio, D.

    2010-12-01

    Yellowstone National Park has a wide diversity of thermal features, and includes springs with a range of pH conditions that significantly impact sulfur speciation. We have utilized a combination of voltammetric and spectroscopic techniques to characterize the intermediate sulfur chemistry of Cinder Pool, Evening Primrose, Ojo Caliente, Frying Pan, Azure, and Dragon thermal springs. These measurements additionally have demonstrated the geochemical dynamics inherent in these systems; significant variability in chemical speciation occur in many of these thermal features due to changes in gas supply rates, fluid discharge rates, and thermal differences that occur on second time scales. The dynamics of the geochemical settings shown may significantly impact how microorganisms interact with the sulfur forms in these systems.

  2. HYBRID FEATURE SELECTION ALGORITHM FOR INTRUSION DETECTION SYSTEM

    Directory of Open Access Journals (Sweden)

    Seyed Reza Hasani

    2014-01-01

    Full Text Available Network security is a serious global concern. Usefulness Intrusion Detection Systems (IDS are increasing incredibly in Information Security research using Soft computing techniques. In the previous researches having irrelevant and redundant features are recognized causes of increasing the processing speed of evaluating the known intrusive patterns. In addition, an efficient feature selection method eliminates dimension of data and reduce redundancy and ambiguity caused by none important attributes. Therefore, feature selection methods are well-known methods to overcome this problem. There are various approaches being utilized in intrusion detections, they are able to perform their method and relatively they are achieved with some improvements. This work is based on the enhancement of the highest Detection Rate (DR algorithm which is Linear Genetic Programming (LGP reducing the False Alarm Rate (FAR incorporates with Bees Algorithm. Finally, Support Vector Machine (SVM is one of the best candidate solutions to settle IDSs problems. In this study four sample dataset containing 4000 random records are excluded randomly from this dataset for training and testing purposes. Experimental results show that the LGP_BA method improves the accuracy and efficiency compared with the previous related research and the feature subcategory offered by LGP_BA gives a superior representation of data.

  3. Comparative Study of Triangulation based and Feature based Image Morphing

    Directory of Open Access Journals (Sweden)

    Ms. Bhumika G. Bhatt

    2012-01-01

    Full Text Available Image Morphing is one of the most powerful Digital Image processing technique, which is used to enhancemany multimedia projects, presentations, education and computer based training. It is also used inmedical imaging field to recover features not visible in images by establishing correspondence of featuresamong successive pair of scanned images. This paper discuss what morphing is and implementation ofTriangulation based morphing Technique and Feature based Image Morphing. IT analyze both morphingtechniques in terms of different attributes such as computational complexity, Visual quality of morphobtained and complexity involved in selection of features.

  4. Use of genetic algorithm for the selection of EEG features

    Science.gov (United States)

    Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.

    2015-09-01

    Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.

  5. A Study on Feature Selection Techniques in Educational Data Mining

    CERN Document Server

    Ramaswami, M

    2009-01-01

    Educational data mining (EDM) is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the behaviors of students in the learning process. In this EDM, feature selection is to be made for the generation of subset of candidate variables. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. In this connection, the present study is devoted not only to investigate the most relevant subset features with minimum cardinality for achieving high predictive performance by adopting various filtered feature selection techniques in data mining but also to evaluate the goodness of subsets with different cardinalities and the quality of six filtered feature selection algorithms in terms of F-measure value and Receiver Operating Characteristics (ROC) value, generat...

  6. A Computer-Aided Diagnosis System for Dynamic Contrast-Enhanced MR Images Based on Level Set Segmentation and ReliefF Feature Selection

    Directory of Open Access Journals (Sweden)

    Zhiyong Pang

    2015-01-01

    Full Text Available This study established a fully automated computer-aided diagnosis (CAD system for the classification of malignant and benign masses via breast magnetic resonance imaging (BMRI. A breast segmentation method consisting of a preprocessing step to identify the air-breast interfacing boundary and curve fitting for chest wall line (CWL segmentation was included in the proposed CAD system. The Chan-Vese (CV model level set (LS segmentation method was adopted to segment breast mass and demonstrated sufficiently good segmentation performance. The support vector machine (SVM classifier with ReliefF feature selection was used to merge the extracted morphological and texture features into a classification score. The accuracy, sensitivity, and specificity measurements for the leave-half-case-out resampling method were 92.3%, 98.2%, and 76.2%, respectively. For the leave-one-case-out resampling method, the measurements were 90.0%, 98.7%, and 73.8%, respectively.

  7. An Evident Theoretic Feature Selection Approach for Text Categorization

    OpenAIRE

    UMARSATHIC ALI; JOTHI VENKATESWARAN

    2012-01-01

    With the exponential growth of textual documents available in unstructured form on the Internet, feature selection approaches are increasingly significant for the preprocessing of textual documents for automatic text categorization. Feature selection, which focuses on identifying relevant and informative features, can help reduce the computational cost of processing voluminous amounts of data as well asincrease the effectiveness for the subsequent text categorization tasks. In this paper, we ...

  8. Performance Investigation of Feature Selection Methods

    OpenAIRE

    Sharma, Anuj; Dey, Shubhamoy

    2013-01-01

    Sentiment analysis or opinion mining has become an open research domain after proliferation of Internet and Web 2.0 social media. People express their attitudes and opinions on social media including blogs, discussion forums, tweets, etc. and, sentiment analysis concerns about detecting and extracting sentiment or opinion from online text. Sentiment based text classification is different from topical text classification since it involves discrimination based on expressed opinion on a topic. F...

  9. Facial symmetry assessment based on geometric features

    Science.gov (United States)

    Xu, Guoping; Cao, Hanqiang

    2015-12-01

    Face image symmetry is an important factor affecting the accuracy of automatic face recognition. Selecting high symmetrical face image could improve the performance of the recognition. In this paper, we proposed a novel facial symmetry evaluation scheme based on geometric features, including centroid, singular value, in-plane rotation angle of face and the structural similarity index (SSIM). First, we calculate the value of the four features according to the corresponding formula. Then, we use fuzzy logic algorithm to integrate the value of the four features into a single number which represents the facial symmetry. The proposed method is efficient and can adapt to different recognition methods. Experimental results demonstrate its effectiveness in improving the robustness of face detection and recognition.

  10. Improving permafrost distribution modelling using feature selection algorithms

    Science.gov (United States)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2016-04-01

    The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its

  11. Towards feature selection in actor-critic algorithms

    OpenAIRE

    Rohanimanesh, Khashayar; Roy, Nicholas; Tedrake, Russell Louis

    2009-01-01

    Choosing features for the critic in actor-critic algorithms with function approximation is known to be a challenge. Too few critic features can lead to degeneracy of the actor gradient, and too many features may lead to slower convergence of the learner. In this paper, we show that a wellstudied class of actor policies satisfy the known requirements for convergence when the actor features are selected carefully. We demonstrate that two popular representations for va...

  12. Facial expression feature selection method based on neighborhood rough set theory and quantum genetic algorithm%基于邻域粗糙集与量子遗传算法的人脸表情特征选择方法

    Institute of Scientific and Technical Information of China (English)

    冯林; 李聪; 沈莉

    2013-01-01

    人脸表情特征选择是人脸表情识别研究领域关注的一个热点.基于量子遗传算法与邻域粗糙集理论,文章提出一种新的人脸表情特征选择方法(Feature Selection based on Neighborhood Rough Set Theory and Quantum Genetic Algorithm,简称FSNRSTQGA),以邻域粗糙集理论为基础,定义了最优特征集的适应度函数来评价表情特征子集的选择效果;并结合量子遗传算法进化策略,提出了一种表情特征选择方法.Cohn-Kanade表情数据集上的仿真实验结果表明了该方法的有效性.%Facial expression feature selection is one of the hot issues in the field of facial expression recognition. A novel facial expression feature selection method named feature selection based on neighborhood rough set theory and quantum genetic algorithm (FSNRSTQGA) is proposed. First, an evaluation criterion of the optimization expression feature subset is established based on neighborhood rough set theory and used as the fitness function. Then, by quantum genetic algorithm evolutionary strategy, an approach of facial expression feature selection is proposed. The results of the simulation on Cohn-Kanade expression dataset illustrate that the FSNRSTQGA method is effective.

  13. Feature Selection for Better Identification of Subtypes of Guillain-Barré Syndrome

    OpenAIRE

    José Hernández-Torruco; Juana Canul-Reich; Juan Frausto-Solís; Juan José Méndez-Castillo

    2014-01-01

    Guillain-Barré syndrome (GBS) is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS), chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction t...

  14. Information Theory for Gabor Feature Selection for Face Recognition

    Science.gov (United States)

    Shen, Linlin; Bai, Li

    2006-12-01

    A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004), our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  15. Information Theory for Gabor Feature Selection for Face Recognition

    Directory of Open Access Journals (Sweden)

    Shen Linlin

    2006-01-01

    Full Text Available A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004, our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  16. A Sparse-Feature-Based Face Detector

    Institute of Scientific and Technical Information of China (English)

    LUXiaofeng; ZHENGNanning; ZHENGSongfeng

    2003-01-01

    Local features and global features are two kinds of important statistical features used to distinguish faces from nonfaces. They are both special cases of sparse features. A final classifier can be considered as a combination of a set of selected weak classiflers, and each weak classifier uses a sparse feature to classify samples. Motivated by this thought, we construct an over complete set of weak classifiers using LPSVM (Linear proximal support vector machine) algorithm, and then we select part of them using AdaBoost algorithm and combine the selected weak classifiers to form a strong classifier. And duringthe course of feature extraction and selection, our method can minimize the classification error directly, whereas most previous works cannot do this. The main difference from other methods is that the local features are learned from the training set instead of being arbitrarily defined. We applied our method to face detection; the test result shows that this method performs well.

  17. An Investigation into Feature Selection of Oncological Survival Prediction

    OpenAIRE

    Strunkin, Dmitry; Mac Namee, Brian; Kelleher, John

    2012-01-01

    In machine learning based clinical decision support (CDS) systems the features used to train prediction models are of paramount importance. Strong features will lead to accurate models, whereas as weak features will have the opposite effect. Feature sets can either be designed by domain experts, or automatically extracted for unstructured data that happens to be available from some process other than a CDS system. This paper compares the usefulness of structured expert-designed features to fe...

  18. Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

    Science.gov (United States)

    Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

    2016-07-01

    We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy. PMID:26803769

  19. Investigation on the isoform selectivity of novel kinesin-like protein 1 (KIF11) inhibitor using chemical feature based pharmacophore, molecular docking, and quantum mechanical studies.

    Science.gov (United States)

    Karunagaran, Subramanian; Subhashchandrabose, Subramaniyan; Lee, Keun Woo; Meganathan, Chandrasekaran

    2016-04-01

    Kinesin-like protein (KIF11) is a molecular motor protein that is essential in mitosis. Removal of KIF11 prevents centrosome migration and causes cell arrest in mitosis. KIF11 defects are linked to the disease of microcephaly, lymph edema or mental retardation. The human KIF11 protein has been actively studied for its role in mitosis and its potential as a therapeutic target for cancer treatment. Pharmacophore modeling, molecular docking and density functional theory approaches was employed to reveal the structural, chemical and electronic features essential for the development of small molecule inhibitor for KIF11. Hence we have developed chemical feature based pharmacophore models using Discovery Studio v 2.5 (DS). The best hypothesis (Hypo1) consisting of four chemical features (two hydrogen bond acceptor, one hydrophobic and one ring aromatic) has exhibited high correlation co-efficient of 0.9521, cost difference of 70.63 and low RMS value of 0.9475. This Hypo1 is cross validated by Cat Scramble method; test set and decoy set to prove its robustness, statistical significance and predictability respectively. The well validated Hypo1 was used as 3Dquery to perform virtual screening. The hits obtained from the virtual screening were subjected to various scrupulous drug-like filters such as Lipinski's rule of five and ADMET properties. Finally, six hit compounds were identified based on the molecular interaction and its electronic properties. Our final lead compound could serve as a powerful tool for the discovery of potent inhibitor as KIF11 agonists. PMID:26815769

  20. A New Heuristic for Feature Selection by Consistent Biclustering

    CERN Document Server

    Mucherino, Antonio

    2010-01-01

    Given a set of data, biclustering aims at finding simultaneous partitions in biclusters of its samples and of the features which are used for representing the samples. Consistent biclusterings allow to obtain correct classifications of the samples from the known classification of the features, and vice versa, and they are very useful for performing supervised classifications. The problem of finding consistent biclusterings can be seen as a feature selection problem, where the features that are not relevant for classification purposes are removed from the set of data, while the total number of features is maximized in order to preserve information. This feature selection problem can be formulated as a linear fractional 0-1 optimization problem. We propose a reformulation of this problem as a bilevel optimization problem, and we present a heuristic algorithm for an efficient solution of the reformulated problem. Computational experiments show that the presented algorithm is able to find better solutions with re...

  1. Feature-based Image Sequence Compression Coding

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    A novel compressing method for video teleconference applications is presented. Semantic-based coding based on human image feature is realized, where human features are adopted as parameters. Model-based coding and the concept of vector coding are combined with the work on image feature extraction to obtain the result.

  2. Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

    Energy Technology Data Exchange (ETDEWEB)

    Pon, R K; Cardenas, A F; Buttler, D J

    2007-09-19

    The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifier with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.

  3. The Effect of Feature Selection on Phish Website Detection

    Directory of Open Access Journals (Sweden)

    Hiba Zuhair

    2015-10-01

    Full Text Available Recently, limited anti-phishing campaigns have given phishers more possibilities to bypass through their advanced deceptions. Moreover, failure to devise appropriate classification techniques to effectively identify these deceptions has degraded the detection of phishing websites. Consequently, exploiting as new; few; predictive; and effective features as possible has emerged as a key challenge to keep the detection resilient. Thus, some prior works had been carried out to investigate and apply certain selected methods to develop their own classification techniques. However, no study had generally agreed on which feature selection method that could be employed as the best assistant to enhance the classification performance. Hence, this study empirically examined these methods and their effects on classification performance. Furthermore, it recommends some promoting criteria to assess their outcomes and offers contribution on the problem at hand. Hybrid features, low and high dimensional datasets, different feature selection methods, and classification models were examined in this study. As a result, the findings displayed notably improved detection precision with low latency, as well as noteworthy gains in robustness and prediction susceptibilities. Although selecting an ideal feature subset was a challenging task, the findings retrieved from this study had provided the most advantageous feature subset as possible for robust selection and effective classification in the phishing detection domain.

  4. Selecting Optimal Subset of Features for Student Performance Model

    Directory of Open Access Journals (Sweden)

    Hany M. Harb

    2012-09-01

    Full Text Available Educational data mining (EDM is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the student behavior in the learning process. Classification methods like decision trees, rule mining, and Bayesian network, can be applied on the educational data for predicting the student behavior like performance in an examination. This prediction may help in student evaluation. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. The main objective of this work is to achieve high predictive performance by adopting various feature selection techniques to increase the predictive accuracy with least number of features. The outcomes show a reduction in computational time and constructional cost in both training and classification phases of the student performance model.

  5. System Entropy and Its Application in Feature Selection

    Institute of Scientific and Technical Information of China (English)

    ZHAO Jun; WU Zhong-fu; LI Hua

    2004-01-01

    Feature selection is always an important issue in the research on data mining technologies. However, the problem of optimal feature selection is NP hard. Therefore, heuristic approaches are more practical to actual learning systems. Usually, that kind of algorithm selects features with the help of a heuristic metric compactum to measure the relative importance of features in a learning system. Here a new notion of 'system entropy' is described in terms of rough set theory, and then some of its algebraic characteristics are studied. After its intrinsic value biase is effectively counteracted, the system entropy is applied in BSE, a new heuristic algorithm for feature selection. BSE is efficient, whose time complexity is lower than that of analogous algorithms; BSE is also effective, which can produce the optimal results in the mini-feature biased sense from varieties of learning systems. Besides, BSE is tolerant and also flexible to the inconsistency of a learning system, consequently able to elegantly handle data noise in the learning system.

  6. Large-scale feature selection using evolved neural networks

    Science.gov (United States)

    Stathakis, Demetris; Topouzelis, Kostas; Karathanassi, Vassilia

    2006-09-01

    In this paper computational intelligence, referring here to the synergy of neural networks and genetic algorithms, is deployed in order to determine a near-optimal neural network for the classification of dark formations in oil spills and look-alikes. Optimality is sought in the framework of a multi-objective problem, i.e. the minimization of input features used and, at the same time, the maximization of overall testing classification accuracy. The proposed method consists of two concurrent actions. The first is the identification of the subset of features that results in the highest classification accuracy on the testing data set i.e. feature selection. The second parallel process is the search for the neural network topology, in terms of number of nodes in the hidden layer, which is able to yield optimal results with respect to the selected subset of features. The results show that the proposed method, i.e. concurrently evolving features and neural network topology, yields superior classification accuracy compared to sequential floating forward selection as well as to using all features together. The accuracy matrix is deployed to show the generalization capacity of the discovered neural network topology on the evolved sub-set of features.

  7. 基于电网运行大数据的在线分布式安全特征选择%Online Distributed Security Feature Selection Based on Big Data in Power System Operation

    Institute of Scientific and Technical Information of China (English)

    黄天恩; 孙宏斌; 郭庆来; 温柏坚; 郭文鑫

    2016-01-01

    The latest development and existing problems of power system security feature selection are briefly introduced.An online distributed security feature selection method is proposed.The method is based on power system security feature grouping by correlation and adapts to the big data in power system operation,and it could online discover critical features for power system security.First,a power system security feature selection method for a single compute node is discussed.Then a distributed method based on feature grouping is proposed.As the feature grouping method has an important influence on the selection results,a strategy based on power system security feature grouping by correlation is put forward to make correlation of features within the same group larger while the correlation of features among different groups smaller.This distributed security feature selection method is well applied in IEEE 9-bus system and Guangdong power system for its practicality and effectiveness,which could quickly find out the weak spots in power system operation and accurately help operators grasp the critical features for power system security.Compared with traditional methods,this method performs well for its compute accuracy and speed.%简述大数据环境下,电网安全特征选择的现状与问题。提出了一种基于电网特征量相关性分组、适应于电网运行大数据的在线分布式安全特征选择方法,该方法能在线挖掘出关键的电网安全运行特征。首先阐述了单个计算节点上电网安全特征选择方法,接着提出了基于电网特征量分组的分布式安全特征选择方法;由于电网特征量分组情况会对特征选择结果产生较大影响,故提出了基于电网特征量相关性分组的策略,尽量使得同一组内的电网特征量相关性较大,不同分组间的电网特征量相关性较小。IEEE 9节点系统和广东实际省网系统算例验证了该方法的实用性和有效性

  8. 基于 Adaboost 的高光谱与 LiDAR 数据特征选择与分类%Feature Selection and Classification of Hyperspectral Data and LiDAR Data Based on Adaboost

    Institute of Scientific and Technical Information of China (English)

    朱江涛; 黄睿

    2014-01-01

    Hyperspectral remote sensing can accurately describe the objects’spectral characteristics,and has become an effective way of identifying land covers.However,the hyperspectral imagery does not readily provide the targets’3-D position information which is necessary for recognition of targets with similar spectral signatures and distinct topologies. Supplementation of the hyperspectral images with LiDAR data can compensate the shortage,and then the classification performance can be improved.We propose an adaboost-based feature selection method to integrate hyperspectral images and LiDAR data for classification.The spectral,altimetric,textural features as well as vegetation index are first extracted.The importance of each feature is then evaluated using Adaboost,and the feature subset is produced by discarding features less importance values.The final classification results can be obtained based on the subset.The land cover classification experiments on Zhangye city,Gansu province show that the proposed method can select more useful features for classification.%提出一种基于 Adaboost 的高光谱与 LiDAR 数据特征选择与分类方法。方法首先提取地物的光谱、高程、纹理以及植被指数等特征,接着利用 Adaboost 评估不同特征的重要度,实现特征选择,最后基于特征子集进行 Adaboost 分类。甘肃省张掖城区的地物分类实验表明所提方法能选择出有利于类别区分的特征。

  9. 基于潜在狄利克雷分配模型和互信息的无监督特征选取法%Unsupervised feature selection method based on latent Dirichlet allocation model and mutual information

    Institute of Scientific and Technical Information of China (English)

    董元元; 陈基漓; 唐小侠

    2012-01-01

    为解决互信息(MI)在特征选取中的类别缺失和倾向低频词问题,提出LDA-σ方法.该方法使用潜在狄利克雷分配模型(LDA)提取潜在主题,以“词—主题”间互信息的标准差作为特征评估函数.在Reuters-21578语料集上提取特征词并进行分类,LDA-σ方法的微平均F1最高达0.9096;宏平均F1优于其他算法,最高达0.7823.实验表明,LDA-σ方法可用于文本特征选取.%To solve the category-deficiency and the tendency of selecting low-frequency words in feature selection process based on Mutual Information (MI), the method named LDA-s was presented. Firstly, the latent topics were extracted by the Latent Dirichlet Allocation (LDA) model, and then the standard deviation of "Word-Topic" MI was calculated as the feature evaluation function. When conducting feature selection and categorization in Reuters-21578, the micro average Fl of LDA-s reached up to 0. 909 6, and the highest macro average Fl of LDA-s was 0. 782 3, which were higher than that of other algorithms. The experimental results indicate that LDA-s can be applied to feature selection in text sets.

  10. Highly comparative, feature-based time-series classification

    CERN Document Server

    Fulcher, Ben D

    2014-01-01

    A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across the scientific time-series analysis literature, and include summaries of time series in terms of their correlation structure, distribution, entropy, stationarity, scaling properties, and fits to a range of time-series models. After computing thousands of features for each time series in a training set, those that are most informative of the class structure are selected using greedy forward feature selection with a linear classifier. The resulting feature-based classifiers automatically learn the differences between classes using a reduced number of time-series properties, and circumvent the need to calculate distances between time series. Representing time series in this way results in orders of magnitude of dimensionality reduction, allowing the method to perform well on ve...

  11. Feature selection for surface electromyography signal based on ant colony optimization%基于蚁群算法的表面肌电信号特征选择

    Institute of Scientific and Technical Information of China (English)

    黄虎; 谢洪波

    2012-01-01

    Objective To improve the classification performance of the surface electromyography (Semg) -based prosthesis and reduce the dimensions of features extracted from the Semg signals, a modified ant colony optimization (ACO) was employed to select the best feature subset. Methods The relationship between features and target classes was calculated as the heuristic function and the best feature subset was selected by ACO, and the trained artificial nerve net was utilized to verify the classification performance. Results Ten healthy subjects participated in the experiment on classification of hand and wrist motion using Semg signals. Compared to the principle component analysis (PCA) -based feature subsets, the ACO-reduced feature subsets not only improved the classification accuracy but greatly reduced the number of features in the original feature set, which subsequently simplified the structure of the classifier and reduced the computational cost. Conclusions The proposed method exhibits a great potential in the real-time applications, such as Semg-based prosthesis control.%目的为提高假肢系统对动作信号的识别速度,设计了基于优化蚁群算法(ant colony optimization,ACO)的特征选择法,对表面肌电信号(surface electromyography,sEMG)高维特征向量降维以减少计算负担.方法 以特征与目标类型之间互信息关系作为启发函数,通过蚁群算法选出最佳特征子集,最后用已训练好的人工神经网络检验其分类性能.结果 对10名健康受试者进行了手腕部动作的肌电信号模式分类实验.与传统主成分分析法(principle component analysis,PCA)相比,该算法选出的特征子集提高了识别准确率,并显著降低了原始特征集的特征维数,进而简化分类器的结构,减少计算开销.结论 本方法在实时性要求高的肌电控制假肢等系统中具有良好的应用前景.

  12. Technical Evaluation Report 27: Educational Wikis: Features and selection criteria

    Directory of Open Access Journals (Sweden)

    Jim Rudolph

    2004-04-01

    Full Text Available This report discusses the educational uses of the ‘wiki,’ an increasingly popular approach to online community development. Wikis are defined and compared with ‘blogging’ methods; characteristics of major wiki engines are described; and wiki features and selection criteria are examined.

  13. An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem

    Directory of Open Access Journals (Sweden)

    Jing Bian

    2016-01-01

    Full Text Available In the era of big data, feature selection is an essential process in machine learning. Although the class imbalance problem has recently attracted a great deal of attention, little effort has been undertaken to develop feature selection techniques. In addition, most applications involving feature selection focus on classification accuracy but not cost, although costs are important. To cope with imbalance problems, we developed a cost-sensitive feature selection algorithm that adds the cost-based evaluation function of a filter feature selection using a chaos genetic algorithm, referred to as CSFSG. The evaluation function considers both feature-acquiring costs (test costs and misclassification costs in the field of network security, thereby weakening the influence of many instances from the majority of classes in large-scale datasets. The CSFSG algorithm reduces the total cost of feature selection and trades off both factors. The behavior of the CSFSG algorithm is tested on a large-scale dataset of network security, using two kinds of classifiers: C4.5 and k-nearest neighbor (KNN. The results of the experimental research show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

  14. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles

    2012-04-26

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  15. Neural Gen Feature Selection for Supervised Learning Classifier

    Directory of Open Access Journals (Sweden)

    Mohammed Hasan Abdulameer

    2014-04-01

    Full Text Available Face recognition has recently received significant attention, especially during the past few years. Many face recognition techniques were developed such as PSO-SVM and LDA-SVM However, inefficient features in the face recognition may lead to inadequate in the recognition results. Hence, a new face recognition system based on Genetic Algorithm and FFBNN technique is proposed. Our proposed face recognition system initially performs the feature extraction and these optimal features are promoted to the recognition process. In the feature extraction, the optimal features are extracted from the face image database by Genetic Algorithm (GA with FFBNN and the computed optimal features are given to the FFBNN technique to carry out the training and testing process. The optimal features from the feature database are fed to the FFBNN for accomplishing the training process. The well trained FFBNN with the optimal features provide the recognition result. The optimal features in FFBNN by GA efficiently perform the face recognition process. The human face dataset called YALE is utilized to analyze the performance of our proposed GA-FFNN technique and also this GA-FFBNN is compared with standard SVM and PSO-SVM techniques.

  16. Individual discriminative face recognition models based on subsets of features

    DEFF Research Database (Denmark)

    Clemmensen, Line Katrine Harder; Gomez, David Delgado; Ersbøll, Bjarne Kjær

    2007-01-01

    The accuracy of data classification methods depends considerably on the data representation and on the selected features. In this work, the elastic net model selection is used to identify meaningful and important features in face recognition. Modelling the characteristics which distinguish one...... selection techniques such as forward selection or lasso regression become inadequate. In the experimental section, the performance of the elastic net model is compared with geometrical and color based algorithms widely used in face recognition such as Procrustes nearest neighbor, Eigenfaces, or Fisher...

  17. Novel 3D ultrasound image-based biomarkers based on a feature selection from a 2D standardized vessel wall thickness map: a tool for sensitive assessment of therapies for carotid atherosclerosis

    International Nuclear Information System (INIS)

    With the advent of new therapies and management strategies for carotid atherosclerosis, there is a parallel need for measurement tools or biomarkers to evaluate the efficacy of these new strategies. 3D ultrasound has been shown to provide reproducible measurements of plaque area/volume and vessel wall volume. However, since carotid atherosclerosis is a focal disease that predominantly occurs at bifurcations, biomarkers based on local plaque change may be more sensitive than global volumetric measurements in demonstrating efficacy of new therapies. The ultimate goal of this paper is to develop a biomarker that is based on the local distribution of vessel-wall-plus-plaque thickness change (VWT-Change) that has occurred during the course of a clinical study. To allow comparison between different treatment groups, the VWT-Change distribution of each subject must first be mapped to a standardized domain. In this study, we developed a technique to map the 3D VWT-Change distribution to a 2D standardized template. We then applied a feature selection technique to identify regions on the 2D standardized map on which subjects in different treatment groups exhibit greater difference in VWT-Change. The proposed algorithm was applied to analyse the VWT-Change of 20 subjects in a placebo-controlled study of the effect of atorvastatin (Lipitor). The average VWT-Change for each subject was computed (i) over all points in the 2D map and (ii) over feature points only. For the average computed over all points, 97 subjects per group would be required to detect an effect size of 25% that of atorvastatin in a six-month study. The sample size is reduced to 25 subjects if the average were computed over feature points only. The introduction of this sensitive quantification technique for carotid atherosclerosis progression/regression would allow many proof-of-principle studies to be performed before a more costly and longer study involving a larger population is held to confirm the treatment

  18. Novel 3D ultrasound image-based biomarkers based on a feature selection from a 2D standardized vessel wall thickness map: a tool for sensitive assessment of therapies for carotid atherosclerosis

    Science.gov (United States)

    Chiu, Bernard; Li, Bing; Chow, Tommy W. S.

    2013-09-01

    With the advent of new therapies and management strategies for carotid atherosclerosis, there is a parallel need for measurement tools or biomarkers to evaluate the efficacy of these new strategies. 3D ultrasound has been shown to provide reproducible measurements of plaque area/volume and vessel wall volume. However, since carotid atherosclerosis is a focal disease that predominantly occurs at bifurcations, biomarkers based on local plaque change may be more sensitive than global volumetric measurements in demonstrating efficacy of new therapies. The ultimate goal of this paper is to develop a biomarker that is based on the local distribution of vessel-wall-plus-plaque thickness change (VWT-Change) that has occurred during the course of a clinical study. To allow comparison between different treatment groups, the VWT-Change distribution of each subject must first be mapped to a standardized domain. In this study, we developed a technique to map the 3D VWT-Change distribution to a 2D standardized template. We then applied a feature selection technique to identify regions on the 2D standardized map on which subjects in different treatment groups exhibit greater difference in VWT-Change. The proposed algorithm was applied to analyse the VWT-Change of 20 subjects in a placebo-controlled study of the effect of atorvastatin (Lipitor). The average VWT-Change for each subject was computed (i) over all points in the 2D map and (ii) over feature points only. For the average computed over all points, 97 subjects per group would be required to detect an effect size of 25% that of atorvastatin in a six-month study. The sample size is reduced to 25 subjects if the average were computed over feature points only. The introduction of this sensitive quantification technique for carotid atherosclerosis progression/regression would allow many proof-of-principle studies to be performed before a more costly and longer study involving a larger population is held to confirm the treatment

  19. Feature-based sentiment analysis with ontologies

    OpenAIRE

    Taner, Berk

    2011-01-01

    Sentiment analysis is a topic that many researchers work on. In recent years, new research directions under sentiment analysis appeared. Feature-based sentiment analysis is one such topic that deals not only with finding sentiment in a sentence but providing a more detailed analysis on a given domain. In the beginning researchers focused on commercial products and manually generated list of features for a product. Then they tried to generate a feature-based approach to attach sentiments to th...

  20. A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

    NARCIS (Netherlands)

    Christin, Christin; Hoefsloot, Huub C. J.; Smilde, Age K.; Hoekman, B.; Suits, Frank; Bischoff, Rainer; Horvatovich, Peter

    2013-01-01

    In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination (SVM-R

  1. Economic indicators selection for crime rates forecasting using cooperative feature selection

    Science.gov (United States)

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Salleh Sallehuddin, Roselina

    2013-04-01

    Features selection in multivariate forecasting model is very important to ensure that the model is accurate. The purpose of this study is to apply the Cooperative Feature Selection method for features selection. The features are economic indicators that will be used in crime rate forecasting model. The Cooperative Feature Selection combines grey relational analysis and artificial neural network to establish a cooperative model that can rank and select the significant economic indicators. Grey relational analysis is used to select the best data series to represent each economic indicator and is also used to rank the economic indicators according to its importance to the crime rate. After that, the artificial neural network is used to select the significant economic indicators for forecasting the crime rates. In this study, we used economic indicators of unemployment rate, consumer price index, gross domestic product and consumer sentiment index, as well as data rates of property crime and violent crime for the United States. Levenberg-Marquardt neural network is used in this study. From our experiments, we found that consumer price index is an important economic indicator that has a significant influence on the violent crime rate. While for property crime rate, the gross domestic product, unemployment rate and consumer price index are the influential economic indicators. The Cooperative Feature Selection is also found to produce smaller errors as compared to Multiple Linear Regression in forecasting property and violent crime rates.

  2. 基于粗糙集与信息增益的情感特征选择方法%A Sentiment Feature Selection Method Based on Rough Set and Information Gain

    Institute of Scientific and Technical Information of China (English)

    蒲国林

    2016-01-01

    为了提高情感特征提取的准确率 ,为高性能情感分析打下坚实的基础 ,提出了一种融合粗糙集与信息增益的情感特征选择方法 .该方法借助信息增益判据选出高相关性的特征子集 ,再通过粗糙集剔除高冗余性的特征 ,从而得到最优的特征子集 .在多个数据集上的测试表明 ,该方法可将若干经典方法的准确率提高4~9个百分点 ,是一种优秀的特征选择方法 ,对提升情感分析的整体性能有明显意义 .%A Rough Set and Information Gain based on sentiment feature selection method is proposed for building a solid foundation in sentiment analysis .The novel method firstly uses Information Gain to select a feature subset which has high relativity with the class attribute .Secondly ,the features which have high redundancy will be eliminated by Rough Set .Experimental results on several datasets reveal the method makes accuracy increase 4-9 percentages than other methods .It is an outstanding feature selection method and has significance in sentiment analysis .

  3. A FEATURE SELECTION ALGORITHM DESIGN AND ITS IMPLEMENTATION IN INTRUSION DETECTION SYSTEM

    Institute of Scientific and Technical Information of China (English)

    杨向荣; 沈钧毅

    2003-01-01

    Objective Present a new features selection algorithm. Methods based on rule induction and field knowledge. Results This algorithm can be applied in catching dataflow when detecting network intrusions, only the sub-dataset including discriminating features is catched. Then the time spend in following behavior patterns mining is reduced and the patterns mined are more precise. Conclusion The experiment results show that the feature subset catched by this algorithm is more informative and the dataset's quantity is reduced significantly.

  4. Feature Selection : A Novel Approach for the Prediction of Learning Disabilities in School Aged Children

    Directory of Open Access Journals (Sweden)

    Sabu M.K

    2015-01-01

    Full Text Available Feature selection is a problem closely related to d imensionality reduction. A commonly used approach in feature selection is ranking the indivi dual features according to some criteria and then search for an optimal feature subset based on an evaluation criterion to test the optimality. The objective of this work is to predict more accur ately the presence of Learning Disability (LD in school-aged children with reduced number of symptoms. For this purpose, a novel hybrid feature selection approach is proposed by in tegrating a popular Rough Set based feature ranking process with a modified backward feature el imination algorithm. The approach follows a ranking of the symptoms of LD according to their importance in the data domain. Each symptoms significance or priority values reflect it s relative importance to predict LD among the various cases. Then by eliminating least significa nt features one by one and evaluating the feature subset at each stage of the process, an opt imal feature subset is generated. The experimental results shows the success of the propo sed method in removing redundant attributes efficiently from the LD dataset without sacrificing the classification performance.

  5. Geometric Feature Based Face-Sketch Recognition

    OpenAIRE

    Pramanik, Sourav; Bhattacharjee, Debotosh

    2013-01-01

    This paper presents a novel facial sketch image or face-sketch recognition approach based on facial feature extraction. To recognize a face-sketch, we have concentrated on a set of geometric face features like eyes, nose, eyebrows, lips, etc and their length and width ratio because it is difficult to match photos and sketches because they belong to two different modalities. In this system, first the facial features/components from training images are extracted, then ratios of length, width, a...

  6. Core-Generating Discretization for Rough Set Feature Selection

    Science.gov (United States)

    Tian, David; Zeng, Xiao-Jun; Keane, John

    Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst keeping important ones that preserve the classification power of the original dataset. The feature subsets selected by RSFS are called reducts. The intersection of all reducts is called core. However, RSFS handles discrete attributes only. To process datasets consisting of real attributes, they are discretized before applying RSFS. Discretization controls core of the discrete dataset. Moreover, core may critically affect the classification performance of reducts. This paper defines core-generating discretization, a type of discretization method; analyzes the properties of core-generating discretization; models core-generating discretization using constraint satisfaction; defines core-generating approximate minimum entropy (C-GAME) discretization; models C-GAME using constraint satisfaction and evaluates the performance of C-GAME as a pre-processor of RSFS using ten datasets from the UCI Machine Learning Repository.

  7. Feature Selection for Generator Excitation Neurocontroller Development Using Filter Technique

    Directory of Open Access Journals (Sweden)

    Abdul Ghani Abro

    2011-09-01

    Full Text Available Essentially, motive behind using control system is to generate suitable control signal for yielding desired response of a physical process. Control of synchronous generator has always remained very critical in power system operation and control. For certain well known reasons power generators are normally operated well below their steady state stability limit. This raises demand for efficient and fast controllers. Artificial intelligence has been reported to give revolutionary outcomes in the field of control engineering. Artificial Neural Network (ANN, a branch of artificial intelligence has been used for nonlinear and adaptive control, utilizing its inherent observability. The overall performance of neurocontroller is dependent upon input features too. Selecting optimum features to train a neurocontroller optimally is very critical. Both quality and size of data are of equal importance for better performance. In this work filter technique is employed to select independent factors for ANN training.

  8. The Impact of Feature Selection on Web Spam Detection

    Directory of Open Access Journals (Sweden)

    Jaber Karimpour

    2012-08-01

    Full Text Available Search engine is one of the most important tools for managing the massive amount of distributed web content. Web spamming tries to deceive search engines to rank some pages higher than they deserve. Many methods have been proposed to combat web spamming and to detect spam pages. One basic one is using classification, i.e., learning a classification model for classifying web pages to spam or non-spam. This work tries to select the best feature set for classification of web spam using imperialist competitive algorithm and genetic algorithm. Imperialist competitive algorithm is a novel optimization algorithm that is inspired by socio-political process of imperialism in the real world. Experiments are carried out on WEBSPAM-UK2007 data set, which show feature selection improves classification accuracy, and imperialist competitive algorithm outperforms GA.

  9. Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

    CERN Document Server

    Donalek, Ciro; Djorgovski, S G; Mahabal, Ashish A; Graham, Matthew J; Fuchs, Thomas J; Turmon, Michael J; Philip, N Sajeeth; Yang, Michael Ting-Chang; Longo, Giuseppe

    2013-01-01

    The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection is quickly becoming a crucial task in analyzing astronomical data sets. Using data sets extracted from the ongoing Catalina Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a variety of feature selection strategies used to identify the subsets that give the most information and the results achieved applying these techniques to three major astronomical problems.

  10. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

    KAUST Repository

    Abusamra, Heba

    2013-05-01

    Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in

  11. Clustering and Feature Selection using Sparse Principal Component Analysis

    OpenAIRE

    Luss, Ronny; d'Aspremont, Alexandre

    2007-01-01

    In this paper, we study the application of sparse principal component analysis (PCA) to clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow us here to interpret the clusters in terms of a reduced set of variables. We begin with a brief int...

  12. Feature selection for face recognition: a memetic algorithmic approach

    Institute of Scientific and Technical Information of China (English)

    Dinesh KUMAR; Shakti KUMAR; C. S. RAI

    2009-01-01

    The eigenface method that uses principal component analysis (PCA) has been the standard and popular method used in face recognition. This paper presents a PCA-memetic algorithm (PCA-MA) approach for feature selection. PCA has been extended by MAs where the former was used for feature extraction/dimensionality reduction and the latter exploited for feature selection. Simulations were performed over ORL and YaleB face databases using Euclidean norm as the classifier. It was found that as far as the recognition rate is concerned, PCA-MA completely outperforms the eigenface method. We compared the performance of PCA extended with genetic algorithm (PCA-GA) with our proposed PCA-MA method. The results also clearly established the supremacy of the PCA-MA method over the PCA-GA method. We further extended linear discriminant analysis (LDA) and kernel principal component analysis (KPCA) approaches with the MA and observed significant improvement in recognition rate with fewer features. This paper also compares the performance of PCA-MA, LDA-MA and KPCA-MA approaches.

  13. Linear feature detection based on ridgelet

    Institute of Scientific and Technical Information of China (English)

    HOU; Biao; (侯彪); LIU; Fang; (刘芳); JIAO; Licheng; (焦李成)

    2003-01-01

    Linear feature detection is very important in image processing. The detection efficiency will directly affect the perfomance of pattern recognition and pattern classification. Based on the idea of ridgelet, this paper presents a new discrete localized ridgelet transform and a new method for detecting linear feature in anisotropic images. Experimental results prove the efficiency of the proposed method.

  14. Fault Feature Selection Method for Axial Piston Pump Based on Quantum Genetic Algorithm%基于量子遗传算法的轴向柱塞泵故障特征选择

    Institute of Scientific and Technical Information of China (English)

    李胜; 张培林; 李兵; 王国德

    2014-01-01

    为了进一步减少特征维数、缩短运算时间、提高分类正确率等,提出了一种基于量子遗传算法的轴向柱塞泵故障特征选择方法,该方法采用量子位进行染色体编码,利用量子门更新种群。首先,对轴向柱塞泵振动信号进行小波包变换,提取出原始信号和各个小波包系数的统计特征;然后,利用量子遗传算法从原始特征集中选择出最优特征集;最后,以神经网络为分类器(其输入为最优特征集),对故障进行诊断与识别。利用该方法对轴向柱塞泵正常、缸体与配流盘磨损和柱塞滑履松动三种状态的特征集进行选择,试验结果表明,与普通遗传算法相比,量子遗传算法可以更有效地减少特征维数,提高分类正确率。%In order to reduce feature dimension ,shorten calculation time and improve classification accuracy ,a fault feature selection method for axial piston pump was proposed based on quantum ge-netic algorithm .In this method ,chromosomes were coded by quantum bits ,and population was up-dated with quantum gate .Firstly ,the vibration signals of axial piston pump were decomposed by wavelet transform ,and the statistic features were extracted from original signals and each wavelet co-efficient .Then ,the optimal feature set was selected form original feature set by QGA .Finally ,by u-sing neural network as classifier ,the optimal feature set was used as input for fault diagnosis .This proposed method was used for distinguishing different operating states of axial piston pump .The ex-perimental results show ,compared with common genetic algorithm ,QGA can reduce feature dimen-sion more effectively and improve classification accuracy greatly .

  15. Discriminative multi-task feature selection for multi-modality classification of Alzheimer's disease.

    Science.gov (United States)

    Ye, Tingting; Zu, Chen; Jie, Biao; Shen, Dinggang; Zhang, Daoqiang

    2016-09-01

    Recently, multi-task based feature selection methods have been used in multi-modality based classification of Alzheimer's disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). However, in traditional multi-task feature selection methods, some useful discriminative information among subjects is usually not well mined for further improving the subsequent classification performance. Accordingly, in this paper, we propose a discriminative multi-task feature selection method to select the most discriminative features for multi-modality based classification of AD/MCI. Specifically, for each modality, we train a linear regression model using the corresponding modality of data, and further enforce the group-sparsity regularization on weights of those regression models for joint selection of common features across multiple modalities. Furthermore, we propose a discriminative regularization term based on the intra-class and inter-class Laplacian matrices to better use the discriminative information among subjects. To evaluate our proposed method, we perform extensive experiments on 202 subjects, including 51 AD patients, 99 MCI patients, and 52 healthy controls (HC), from the baseline MRI and FDG-PET image data of the Alzheimer's Disease Neuroimaging Initiative (ADNI). The experimental results show that our proposed method not only improves the classification performance, but also has potential to discover the disease-related biomarkers useful for diagnosis of disease, along with the comparison to several state-of-the-art methods for multi-modality based AD/MCI classification. PMID:26311394

  16. Discriminative multi-task feature selection for multi-modality classification of Alzheimer's disease.

    Science.gov (United States)

    Ye, Tingting; Zu, Chen; Jie, Biao; Shen, Dinggang; Zhang, Daoqiang

    2016-09-01

    Recently, multi-task based feature selection methods have been used in multi-modality based classification of Alzheimer's disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). However, in traditional multi-task feature selection methods, some useful discriminative information among subjects is usually not well mined for further improving the subsequent classification performance. Accordingly, in this paper, we propose a discriminative multi-task feature selection method to select the most discriminative features for multi-modality based classification of AD/MCI. Specifically, for each modality, we train a linear regression model using the corresponding modality of data, and further enforce the group-sparsity regularization on weights of those regression models for joint selection of common features across multiple modalities. Furthermore, we propose a discriminative regularization term based on the intra-class and inter-class Laplacian matrices to better use the discriminative information among subjects. To evaluate our proposed method, we perform extensive experiments on 202 subjects, including 51 AD patients, 99 MCI patients, and 52 healthy controls (HC), from the baseline MRI and FDG-PET image data of the Alzheimer's Disease Neuroimaging Initiative (ADNI). The experimental results show that our proposed method not only improves the classification performance, but also has potential to discover the disease-related biomarkers useful for diagnosis of disease, along with the comparison to several state-of-the-art methods for multi-modality based AD/MCI classification.

  17. Discriminative multi-task feature selection for multi-modality classification of Alzheimer’s disease

    Science.gov (United States)

    Ye, Tingting; Zu, Chen; Jie, Biao

    2016-01-01

    Recently, multi-task based feature selection methods have been used in multi-modality based classification of Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). However, in traditional multi-task feature selection methods, some useful discriminative information among subjects is usually not well mined for further improving the subsequent classification performance. Accordingly, in this paper, we propose a discriminative multitask feature selection method to select the most discriminative features for multi-modality based classification of AD/MCI. Specifically, for each modality, we train a linear regression model using the corresponding modality of data, and further enforce the group-sparsity regularization on weights of those regression models for joint selection of common features across multiple modalities. Furthermore, we propose a discriminative regularization term based on the intra-class and inter-class Laplacian matrices to better use the discriminative information among subjects. To evaluate our proposed method, we perform extensive experiments on 202 subjects, including 51 AD patients, 99 MCI patients, and 52 healthy controls (HC), from the baseline MRI and FDG-PET image data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The experimental results show that our proposed method not only improves the classification performance, but also has potential to discover the disease-related biomarkers useful for diagnosis of disease, along with the comparison to several state-of-the-art methods for multi-modality based AD/MCI classification. PMID:26311394

  18. Binary Matrix Shuffling Filter for Feature Selection in Neuronal Morphology Classification

    OpenAIRE

    Congwei Sun; Zhijun Dai; Hongyan Zhang; Lanzhi Li; Zheming Yuan

    2015-01-01

    A prerequisite to understand neuronal function and characteristic is to classify neuron correctly. The existing classification techniques are usually based on structural characteristic and employ principal component analysis to reduce feature dimension. In this work, we dedicate to classify neurons based on neuronal morphology. A new feature selection method named binary matrix shuffling filter was used in neuronal morphology classification. This method, coupled with support vector machine fo...

  19. Support Vector Classifier with enhanced feature selection for transient stability evaluation

    Directory of Open Access Journals (Sweden)

    Balasingh Selvi Arul Dora

    2009-01-01

    Full Text Available Today's power transmission systems have a tendency to operate closer and closer to their stability limits. In this scenario, there have been continuous efforts to develop new techniques and tools for assessing the stability status of power systems. This paper presents a Support Vector Classifier (SVC to identify the transient stability of power systems subjected to severe disturbances. The nonlinear relationship between the pre-fault, during-fault and post-fault power system parameters and the stability status of the system under post-fault state is captured by the SVC trained offline. Significant generators are selected by feature selection based on the sensitivity of stability margin and the features other than generators are selected based on a step wise feature selection by three fold cross validation. The performance of the proposed SVC is demonstrated through the simulations carried out on the IEEE 17 generator reduced Iowa system.

  20. Multi-features Selection in Traffic Video Event Detection Based on Genetic Algorithm%基于遗传算法的交通视频事件多特征选择方法

    Institute of Scientific and Technical Information of China (English)

    2013-01-01

    A new multi-features selection method on traffic videos which is based on genetic algorithm is proposed to improve the accuracy and rate of the traffic video event detection .This method firstly extracted multiple features of traffic videos to gain information of a variety of traffic events as much as possible ,and then fused to those features to get a high-dimensional redundancy feature vector that can characterize the video ,then used genetic algorithm to optimize and select multiple feathers and finally obtain the optimal feature set by SVM classifier to detect and analysis traffic events .Experimental results show that this method can effectively reduce the dimension of features and improve the accuracy and rate of the traffic events detection .%  为了提高交通视频事件检测的准确性和检测速率,提出了一种基于遗传算法的交通视频多特征选择方法。该方法首先提取交通视频的多种特征,尽可能多地获取各种视频事件的信息,然后将这些特征进行融合,得到一个可以表征事件的高维冗余的特征向量,再采用遗传算法对多特征进行优化筛选,最后使用SVM分类器进行训练获得低维的最优特征子集并应用于交通事件检测。实验结果表明,该方法在降低提取特征的维数的同时,可有效提高交通视频事件检测的准确率和检测速率。

  1. 一种自适应的Gabor图像特征抽取和权重选择的人脸识别方法%An Adaptive Feature and Weight Selection Method Based on Gabor Image for Face Recognition

    Institute of Scientific and Technical Information of China (English)

    刘中华; 殷俊; 金忠

    2011-01-01

    To overcome the negative effect of factors such as illumination and expression on face recognition, an adaptive feature and weight selection method was proposed. The method was based on Gabor image for face recognition. Firstly, 40 independent feature matrices which were reconstructed with the same scale and the same direction transform results of the different face images were obtained by regarding every Gabor wavelet transformed output image as an independent sample. In order to enhance the robustness to facial expression and illumination variations, the contribution of each new feature matrix could be adaptively computed by the proposed adaptive weight method. Secondly, after applying discrete cosine transform to each feature matrix, the coefficients which had more power to discriminate different classes than others were selected by discrimination power analysis to construct feature vectors. And, linear discriminant analysis features were extracted to fulfill recognition task. Experiments on the face databases demonstrate the effectiveness of the proposed method.%为了克服光照、表情变化等因素对人脸识别的影响,本文提出了一种自适应的Gabor图像特征抽取和权重选择的人脸识别方法.该方法首先把每幅人脸图像经过Gabor小波变换后得到的40个不同尺度和方向下的图像都看作是独立的样本,再把不同人脸中的同一尺度和方向的变换结果进行特征重组,得到40个独立地新特征矩阵.为了增强对光照、表情变化的鲁棒性,每一新特征矩阵的识别贡献被本文所提出的自适应权重方法计算得到.其次,对每一新特征矩阵采用离散余弦变化进行降维,并采用了鉴别力量分析方法来选取最有鉴别力的离散余弦变换系数作为特征向量.最后,抽取线性鉴别分析特征进行识别.大量的实验证明了本文所提方法的有效性.

  2. Ontology Based Feature Driven Development Life Cycle

    Directory of Open Access Journals (Sweden)

    Farheen Siddiqui

    2012-01-01

    Full Text Available The upcoming technology support for semantic web promises fresh directions for Software Engineering community. Also semantic web has its roots in knowledge engineering that provoke software engineers to look for application of ontology applications throughout the Software Engineering lifecycle. The internal components of a semantic web are "light weight", and may be of less quality standards than the externally visible modules. In fact the internal components are generated from external (ontological component. That's the reason agile development approaches such as feature driven development are suitable for applications internal component development. As yet there is no particular procedure that describes the role of ontology in FDD processes. Therefore we propose an ontology based feature driven development for semantic web application that can be used form application model development to feature design and implementation. Features are precisely defined in the OWL-based domain model. Transition from OWL based domain model to feature list is directly defined in transformation rules. On the other hand the ontology based overall model can be easily validated through automated tools. Advantages of ontology-based feature Driven development are also discussed.

  3. ENBFS+kNN: Hybrid ensemble classifier using entropy-based naïve Bayes with feature selection and k-nearest neighbor

    Science.gov (United States)

    Sainin, Mohd Shamrie; Alfred, Rayner; Ahmad, Faudziah

    2016-08-01

    A hybrid ensemble classifier which combines the entropy based naive Bayes (ENB) classifier strategy and k-nearest neighbor (k-NN) is examined. The classifiers are joined in light of the fact that naive Bayes gives prior estimations taking into account entropy while k-NN gives neighborhood estimate to model for a deferred characterization. While original NB utilizes the probabilities, this study utilizes the entropy as priors for class estimations. The result of the hybrid ensemble classifier demonstrates that by consolidating the classifiers, the proposed technique accomplishes promising execution on several benchmark datasets.

  4. A Novel Feature Selection Strategy for Enhanced Biomedical Event Extraction Using the Turku System

    Directory of Open Access Journals (Sweden)

    Jingbo Xia

    2014-01-01

    Full Text Available Feature selection is of paramount importance for text-mining classifiers with high-dimensional features. The Turku Event Extraction System (TEES is the best performing tool in the GENIA BioNLP 2009/2011 shared tasks, which relies heavily on high-dimensional features. This paper describes research which, based on an implementation of an accumulated effect evaluation (AEE algorithm applying the greedy search strategy, analyses the contribution of every single feature class in TEES with a view to identify important features and modify the feature set accordingly. With an updated feature set, a new system is acquired with enhanced performance which achieves an increased F-score of 53.27% up from 51.21% for Task 1 under strict evaluation criteria and 57.24% according to the approximate span and recursive criterion.

  5. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

    KAUST Repository

    Abusamra, Heba

    2013-11-01

    Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

  6. Facial Features for Template Matching Based Face Recognition

    Directory of Open Access Journals (Sweden)

    Chai T. Yuen

    2009-01-01

    Full Text Available Problem statement: Template matching had been a conventional method for object detection especially facial features detection at the early stage of face recognition research. The appearance of moustache and beard had affected the performance of features detection and face recognition system since ages ago. Approach: The proposed algorithm aimed to reduce the effect of beard and moustache for facial features detection and introduce facial features based template matching as the classification method. An automated algorithm for face recognition system based on detected facial features, iris and mouth had been developed. First, the face region was located using skin color information. Next, the algorithm computed the costs for each pair of iris candidates from intensity valleys as references for iris selection. As for mouth detection, color space method was used to allocate lips region, image processing methods to eliminate unwanted noises and corner detection technique to refine the exact location of mouth. Finally, template matching was used to classify faces based on the extracted features. Results: The proposed method had shown a better features detection rate (iris = 93.06%, mouth = 95.83% than conventional method. Template matching had achieved a recognition rate of 86.11% with acceptable processing time (0.36 sec. Conclusion: The results indicate that the elimination of moustache and beard has not affected the performance of facial features detection. The proposed features based template matching has significantly improved the processing time of this method in face recognition research.

  7. Cost effective approach on feature selection using genetic algorithms and fuzzy logic for diabetes diagnosis

    CERN Document Server

    Ephzibah, E P

    2011-01-01

    A way to enhance the performance of a model that combines genetic algorithms and fuzzy logic for feature selection and classification is proposed. Early diagnosis of any disease with less cost is preferable. Diabetes is one such disease. Diabetes has become the fourth leading cause of death in developed countries and there is substantial evidence that it is reaching epidemic proportions in many developing and newly industrialized nations. In medical diagnosis, patterns consist of observable symptoms along with the results of diagnostic tests. These tests have various associated costs and risks. In the automated design of pattern classification, the proposed system solves the feature subset selection problem. It is a task of identifying and selecting a useful subset of pattern-representing features from a larger set of features. Using fuzzy rule-based classification system, the proposed system proves to improve the classification accuracy.

  8. Bankruptcy prediction using SVM models with a new approach to combine features selection and parameter optimisation

    Science.gov (United States)

    Zhou, Ligang; Keung Lai, Kin; Yen, Jerome

    2014-03-01

    Due to the economic significance of bankruptcy prediction of companies for financial institutions, investors and governments, many quantitative methods have been used to develop effective prediction models. Support vector machine (SVM), a powerful classification method, has been used for this task; however, the performance of SVM is sensitive to model form, parameter setting and features selection. In this study, a new approach based on direct search and features ranking technology is proposed to optimise features selection and parameter setting for 1-norm and least-squares SVM models for bankruptcy prediction. This approach is also compared to the SVM models with parameter optimisation and features selection by the popular genetic algorithm technique. The experimental results on a data set with 2010 instances show that the proposed models are good alternatives for bankruptcy prediction.

  9. 乳腺癌数据的几何代数特征提取和微分进化特征选择研究%Feature Extraction for Breast Cancer Data Based on Geometric Algebra Theory and Feature Selection Using Differential Evolution

    Institute of Scientific and Technical Information of China (English)

    李静; 洪文学

    2014-01-01

    模式识别问题中特征提取和特征选择是一个重要问题.基于向量的几何代数表示方法,提出了一种新的几何代数片积系数特征提取方法,并对其中存在的维数升高问题进行了研究,提出了改进的微分进化特征选择方法.本文分类器采用线性判别分析,以公开的乳腺癌生物医学数据集进行10折交叉验证(10 CV),得到的分类结果超过了96%,优于原始特征和传统特征提取方法下的分类性能.%The feature extraction and feature selection are the important issues in pattern recognition.Based on the geometric algebra representation of vector,a new feature extraction method using blade coefficient of geometric algebra was proposed in this study.At the same time,an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue.The simple linear discriminant analysis was used as the classifier.The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96 % and proved superior to that of the original features and traditional feature extraction method.

  10. Robust feature-based object tracking

    Science.gov (United States)

    Han, Bing; Roberts, William; Wu, Dapeng; Li, Jian

    2007-04-01

    Object tracking is an important component of many computer vision systems. It is widely used in video surveillance, robotics, 3D image reconstruction, medical imaging, and human computer interface. In this paper, we focus on unsupervised object tracking, i.e., without prior knowledge about the object to be tracked. To address this problem, we take a feature-based approach, i.e., using feature points (or landmark points) to represent objects. Feature-based object tracking consists of feature extraction and feature correspondence. Feature correspondence is particularly challenging since a feature point in one image may have many similar points in another image, resulting in ambiguity in feature correspondence. To resolve the ambiguity, algorithms, which use exhaustive search and correlation over a large neighborhood, have been proposed. However, these algorithms incur high computational complexity, which is not suitable for real-time tracking. In contrast, Tomasi and Kanade's tracking algorithm only searches corresponding points in a small candidate set, which significantly reduces computational complexity; but the algorithm may lose track of feature points in a long image sequence. To mitigate the limitations of the aforementioned algorithms, this paper proposes an efficient and robust feature-based tracking algorithm. The key idea of our algorithm is as below. For a given target feature point in one frame, we first find a corresponding point in the next frame, which minimizes the sum-of-squared-difference (SSD) between the two points; then we test whether the corresponding point gives large value under the so-called Harris criterion. If not, we further identify a candidate set of feature points in a small neighborhood of the target point; then find a corresponding point from the candidate set, which minimizes the SSD between the two points. The algorithm may output no corresponding point due to disappearance of the target point. Our algorithm is capable of tracking

  11. A Text Feature Selection Algorithm Based on Analysing the Relationship Between Words%基于词间关系分析的文本特征选择算法

    Institute of Scientific and Technical Information of China (English)

    吴双; 张文生; 徐海瑞

    2012-01-01

    传统的特征选择方法通常使用特征评价函数从原始词集中筛选出最具有类别区分能力的特征.这些方法是基于以独立的词作为语义单元的向量空间模型,忽略了词与词之间的关联关系,难以突出文本内容中的关键特征.针对传统特征选择方法的不足,本文提出一种新的基于词间关系的文本特征选择算法.该方法考虑对文本内容表示起到关键性作用的词,利用关联规则挖掘算法发现词语之间的关联关系,并且通过相关分析对强关联规则进行筛选,最终生成与类别属性密切相关的特征空间.实验结果表明,该方法更好地表示了文本的语义内容,而且分类效果优于传统算法.%The traditional feature selection algorithms usually select features distinguishing the different types of documents by the evaluation functions. However, these methods take the separate word as unit to establish a vector space model. The important words in the documents and the relationship between words are not realized. In allusion to the disadvantages mentioned above, a new feature selection algorithm based on the relationship between words is presented. This algorithm considers key words, mines words' association and checks these association rules by a correlation analysis to produce a feature space which closely relates to the category attributes. The experiment indicates that this method is better to express the semantic content of the documents and has a good categorization result.

  12. 基于蚁群算法特征选择的语音情感识别%Feature Selection of Speech Emotional Recognition Based on Ant Colony Optimization Algorithm

    Institute of Scientific and Technical Information of China (English)

    杨鸿章

    2013-01-01

    情感特征提取是语音情感准确识别的关键,传统方法采用单一特征或者简单组合特征提取方法,单一特征无法全面反映语音情感变化,简单组合特征会使特征间产生大量冗余特征,影响识别正确结果.为了提高语音情感识别率,提了一种蚁群算法的语音情感智能识别方法.首先采用语音识别正确率和特征子集维数加权作为目标函数,然后利用蚁群算法找到最优语音特征子集,消除特征冗余信息.通过汉话和丹麦语两种情感语音库进行仿真测试,仿真结果表明,改进方法不仅消除了冗余、无用特征,降低了特征维数,而且提高了语音情感识别率,是一种有效的语音情感智能识别方法.%Speech emotion information has the characteristics of high dimension and redundancy, in order to improve the accuracy of speech emotion recognition, this paper put forward a speech emotion recognition model to select features based on ant colony optimization algorithm. The classification accuracy of KNN and the selected feature dimension form the fitness function, and the ant colony optimization algorithm provides good global searching capability and multiple sub - optimal solutions. A local refinement searching scheme was designed to exclude the redundant features and improve the convergence rate. The performance of method was tested by Chinese emotional speech database and a Danish Emotional Speech. The simulation results show that the proposed method can not only eliminate redundant and useless features to reduce the dimension of features, but also improve the speech emotion recognition rate, therefore the proposed model is an effective speech emotion recognition method.

  13. Development of in Silico Models for Predicting P-Glycoprotein Inhibitors Based on a Two-Step Approach for Feature Selection and Its Application to Chinese Herbal Medicine Screening.

    Science.gov (United States)

    Yang, Ming; Chen, Jialei; Shi, Xiufeng; Xu, Liwen; Xi, Zhijun; You, Lisha; An, Rui; Wang, Xinhong

    2015-10-01

    P-glycoprotein (P-gp) is regarded as an important factor in determining the ADMET (absorption, distribution, metabolism, elimination, and toxicity) characteristics of drugs and drug candidates. Successful prediction of P-gp inhibitors can thus lead to an improved understanding of the underlying mechanisms of both changes in the pharmacokinetics of drugs and drug-drug interactions. Therefore, there has been considerable interest in the development of in silico modeling of P-gp inhibitors in recent years. Considering that a large number of molecular descriptors are used to characterize diverse structural moleculars, efficient feature selection methods are required to extract the most informative predictors. In this work, we constructed an extensive available data set of 2428 molecules that includes 1518 P-gp inhibitors and 910 P-gp noninhibitors from multiple resources. Importantly, a two-step feature selection approach based on a genetic algorithm and a greedy forward-searching algorithm was employed to select the minimum set of the most informative descriptors that contribute to the prediction of P-gp inhibitors. To determine the best machine learning algorithm, 18 classifiers coupled with the feature selection method were compared. The top three best-performing models (flexible discriminant analysis, support vector machine, and random forest) and their ensemble model using respectively only 3, 9, 7, and 14 descriptors achieve an overall accuracy of 83.2%-86.7% for the training set containing 1040 compounds, an overall accuracy of 82.3%-85.5% for the test set containing 1039 compounds, and a prediction accuracy of 77.4%-79.9% for the external validation set containing 349 compounds. The models were further extensively validated by DrugBank database (1890 compounds). The proposed models are competitive with and in some cases better than other published models in terms of prediction accuracy and minimum number of descriptors. Applicability domain then was addressed

  14. Electronic image stabilization system based on global feature tracking

    Institute of Scientific and Technical Information of China (English)

    Zhu Juanjuan; Guo Baolong

    2008-01-01

    A new robust electronic image stabilization system is presented, which involves feature-point, tracking based global motion estimation and Kalman filtering based motion compensation. First, global motion is estimated from the local motions of selected feature points. Considering the local moving objects or the inevitable mismatch,the matching validation, based on the stable relative distance between the points set is proposed, thus maintaining high accuracy and robustness. Next, the global motion parameters are accumulated for correction by Kalman filter-ation. The experimental result illustrates that the proposed system is effective to stabilize translational, rotational,and zooming jitter and robust to local motions.

  15. Statistical feature extraction based iris recognition system

    Indian Academy of Sciences (India)

    ATUL BANSAL; RAVINDER AGARWAL; R K SHARMA

    2016-05-01

    Iris recognition systems have been proposed by numerous researchers using different feature extraction techniques for accurate and reliable biometric authentication. In this paper, a statistical feature extraction technique based on correlation between adjacent pixels has been proposed and implemented. Hamming distance based metric has been used for matching. Performance of the proposed iris recognition system (IRS) has been measured by recording false acceptance rate (FAR) and false rejection rate (FRR) at differentthresholds in the distance metric. System performance has been evaluated by computing statistical features along two directions, namely, radial direction of circular iris region and angular direction extending from pupil tosclera. Experiments have also been conducted to study the effect of number of statistical parameters on FAR and FRR. Results obtained from the experiments based on different set of statistical features of iris images show thatthere is a significant improvement in equal error rate (EER) when number of statistical parameters for feature extraction is increased from three to six. Further, it has also been found that increasing radial/angular resolution,with normalization in place, improves EER for proposed iris recognition system

  16. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection

    Science.gov (United States)

    Moomen, Abdelniser; Ali, Abdulbaset; Ramahi, Omar M.

    2016-01-01

    Nondestructive Testing (NDT) assessment of materials’ health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies) are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms. PMID:27104533

  17. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection.

    Science.gov (United States)

    Moomen, Abdelniser; Ali, Abdulbaset; Ramahi, Omar M

    2016-04-19

    Nondestructive Testing (NDT) assessment of materials' health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies) are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms.

  18. A new ensemble feature selection and its application to pattern classification

    Institute of Scientific and Technical Information of China (English)

    Dongbo ZHANG; Yaonan WANG

    2009-01-01

    Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.

  19. On the selection of optimal feature region set for robust digital image watermarking.

    Science.gov (United States)

    Tsai, Jen-Sheng; Huang, Win-Bin; Kuo, Yau-Hwang

    2011-03-01

    A novel feature region selection method for robust digital image watermarking is proposed in this paper. This method aims to select a nonoverlapping feature region set, which has the greatest robustness against various attacks and can preserve image quality as much as possible after watermarked. It first performs a simulated attacking procedure using some predefined attacks to evaluate the robustness of every candidate feature region. According to the evaluation results, it then adopts a track-with-pruning procedure to search a minimal primary feature set which can resist the most predefined attacks. In order to enhance its resistance to undefined attacks under the constraint of preserving image quality, the primary feature set is then extended by adding into some auxiliary feature regions. This work is formulated as a multidimensional knapsack problem and solved by a genetic algorithm based approach. The experimental results for StirMark attacks on some benchmark images support our expectation that the primary feature set can resist all the predefined attacks and its extension can enhance the robustness against undefined attacks. Comparing with some well-known feature-based methods, the proposed method exhibits better performance in robust digital watermarking.

  20. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    Science.gov (United States)

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  1. Feature Selection, Flaring Size and Time-to-Flare Prediction Using Support Vector Regression, and Automated Prediction of Flaring Behavior Based on Spatio-Temporal Measures Using Hidden Markov Models

    Science.gov (United States)

    Al-Ghraibah, Amani

    Solar flares release stored magnetic energy in the form of radiation and can have significant detrimental effects on earth including damage to technological infrastructure. Recent work has considered methods to predict future flare activity on the basis of quantitative measures of the solar magnetic field. Accurate advanced warning of solar flare occurrence is an area of increasing concern and much research is ongoing in this area. Our previous work 111] utilized standard pattern recognition and classification techniques to determine (classify) whether a region is expected to flare within a predictive time window, using a Relevance Vector Machine (RVM) classification method. We extracted 38 features which describing the complexity of the photospheric magnetic field, the result classification metrics will provide the baseline against which we compare our new work. We find a true positive rate (TPR) of 0.8, true negative rate (TNR) of 0.7, and true skill score (TSS) of 0.49. This dissertation proposes three basic topics; the first topic is an extension to our previous work [111, where we consider a feature selection method to determine an appropriate feature subset with cross validation classification based on a histogram analysis of selected features. Classification using the top five features resulting from this analysis yield better classification accuracies across a large unbalanced dataset. In particular, the feature subsets provide better discrimination of the many regions that flare where we find a TPR of 0.85, a TNR of 0.65 sightly lower than our previous work, and a TSS of 0.5 which has an improvement comparing with our previous work. In the second topic, we study the prediction of solar flare size and time-to-flare using support vector regression (SVR). When we consider flaring regions only, we find an average error in estimating flare size of approximately half a GOES class. When we additionally consider non-flaring regions, we find an increased average

  2. Clustering and Feature Selection using Sparse Principal Component Analysis

    CERN Document Server

    Luss, Ronny

    2007-01-01

    In this paper, we use sparse principal component analysis (PCA) to solve clustering and feature selection problems. Sparse PCA seeks sparse factors, or linear combinations of the data variables, explaining a maximum amount of variance in the data while having only a limited number of nonzero coefficients. PCA is often used as a simple clustering technique and sparse factors allow us here to interpret the clusters in terms of a reduced set of variables. We begin with a brief introduction and motivation on sparse PCA and detail our implementation of the algorithm in d'Aspremont et al. (2005). We finish by describing the application of sparse PCA to clustering and by a brief description of DSPCA, the numerical package used in these experiments.

  3. Relevant, irredundant feature selection and noisy example elimination.

    Science.gov (United States)

    Lashkia, George V; Anthony, Laurence

    2004-04-01

    In many real-world situations, the method for computing the desired output from a set of inputs is unknown. One strategy for solving these types of problems is to learn the input-output functionality from examples in a training set. However, in many situations it is difficult to know what information is relevant to the task at hand. Subsequently, researchers have investigated ways to deal with the so-called problem of consistency of attributes, i.e., attributes that can distinguish examples from different classes. In this paper, we first prove that the notion of relevance of attributes is directly related to the consistency of attributes, and show how relevant, irredundant attributes can be selected. We then compare different relevant attribute selection algorithms, and show the superiority of algorithms that select irredundant attributes over those that select relevant attributes. We also show that searching for an "optimal" subset of attributes, which is considered to be the main purpose of attribute selection, is not the best way to improve the accuracy of classifiers. Employing sets of relevant, irredundant attributes improves classification accuracy in many more cases. Finally, we propose a new method for selecting relevant examples, which is based on filtering the so-called pattern frequency domain. By identifying examples that are nontypical in the determination of relevant, irredundant attributes, irrelevant examples can be eliminated prior to the learning process. Empirical results using artificial and real databases show the effectiveness of the proposed method in selecting relevant examples leading to improved performance even on greatly reduced training sets. PMID:15376837

  4. Our Selections and Decisions: Inherent Features of the Nervous System?

    Science.gov (United States)

    Rösler, Frank

    The chapter summarizes findings on the neuronal bases of decisionmaking. Taking the phenomenon of selection it will be explained that systems built only from excitatory and inhibitory neuron (populations) have the emergent property of selecting between different alternatives. These considerations suggest that there exists a hierarchical architecture with central selection switches. However, in such a system, functions of selection and decision-making are not localized, but rather emerge from an interaction of several participating networks. These are, on the one hand, networks that process specific input and output representations and, on the other hand, networks that regulate the relative activation/inhibition of the specific input and output networks. These ideas are supported by recent empirical evidence. Moreover, other studies show that rather complex psychological variables, like subjective probability estimates, expected gains and losses, prediction errors, etc., do have biological correlates, i.e., they can be localized in time and space as activation states of neural networks and single cells. These findings suggest that selections and decisions are consequences of an architecture which, seen from a biological perspective, is fully deterministic. However, a transposition of such nomothetic functional principles into the idiographic domain, i.e., using them as elements for comprehensive 'mechanistic' explanations of individual decisions, seems not to be possible because of principle limitations. Therefore, individual decisions will remain predictable by means of probabilistic models alone.

  5. Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors

    Directory of Open Access Journals (Sweden)

    Hong Zhao

    2013-01-01

    Full Text Available Feature selection is an essential process in data mining applications since it reduces a model’s complexity. However, feature selection with various types of costs is still a new research topic. In this paper, we study the cost-sensitive feature selection problem of numeric data with measurement errors. The major contributions of this paper are fourfold. First, a new data model is built to address test costs and misclassification costs as well as error boundaries. It is distinguished from the existing models mainly on the error boundaries. Second, a covering-based rough set model with normal distribution measurement errors is constructed. With this model, coverings are constructed from data rather than assigned by users. Third, a new cost-sensitive feature selection problem is defined on this model. It is more realistic than the existing feature selection problems. Fourth, both backtracking and heuristic algorithms are proposed to deal with the new problem. Experimental results show the efficiency of the pruning techniques for the backtracking algorithm and the effectiveness of the heuristic algorithm. This study is a step toward realistic applications of the cost-sensitive learning.

  6. Intrinsic feature-based pose measurement for imaging motion compensation

    Science.gov (United States)

    Baba, Justin S.; Goddard, Jr., James Samuel

    2014-08-19

    Systems and methods for generating motion corrected tomographic images are provided. A method includes obtaining first images of a region of interest (ROI) to be imaged and associated with a first time, where the first images are associated with different positions and orientations with respect to the ROI. The method also includes defining an active region in the each of the first images and selecting intrinsic features in each of the first images based on the active region. Second, identifying a portion of the intrinsic features temporally and spatially matching intrinsic features in corresponding ones of second images of the ROI associated with a second time prior to the first time and computing three-dimensional (3D) coordinates for the portion of the intrinsic features. Finally, the method includes computing a relative pose for the first images based on the 3D coordinates.

  7. A Comparative Analysis of Swarm Intelligence Techniques for Feature Selection in Cancer Classification

    Directory of Open Access Journals (Sweden)

    Chellamuthu Gunavathi

    2014-01-01

    Full Text Available Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR, and F-test values. The swarm intelligence (SI technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL algorithm. The SI techniques such as particle swarm optimization (PSO, cuckoo search (CS, SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.

  8. A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.

    Science.gov (United States)

    Gunavathi, Chellamuthu; Premalatha, Kandasamy

    2014-01-01

    Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL. PMID:25157377

  9. Spatiotemporal Features for Asynchronous Event-based Data

    Directory of Open Access Journals (Sweden)

    Xavier eLagorce

    2015-02-01

    Full Text Available Bio-inspired asynchronous event-based vision sensors are currently introducing a paradigm shift in visual information processing. These new sensors rely on a stimulus-driven principle of light acquisition similar to biological retinas. They are event-driven and fully asynchronous, thereby reducing redundancy and encoding exact times of input signal changes, leading to a very precise temporal resolution. Approaches for higher-level computer vision often rely on the realiable detection of features in visual frames, but similar definitions of features for the novel dynamic and event-based visual input representation of silicon retinas have so far been lacking. This article addresses the problem of learning and recognizing features for event-based vision sensors, which capture properties of truly spatiotemporal volumes of sparse visual event information. A novel computational architecture for learning and encoding spatiotemporal features is introduced based on a set of predictive recurrent reservoir networks, competing via winner-take-all selection. Features are learned in an unsupervised manner from real-world input recorded with event-based vision sensors. It is shown that the networks in the architecture learn distinct and task-specific dynamic visual features, and can predict their trajectories over time.

  10. Facial Feature Extraction Based on Wavelet Transform

    Science.gov (United States)

    Hung, Nguyen Viet

    Facial feature extraction is one of the most important processes in face recognition, expression recognition and face detection. The aims of facial feature extraction are eye location, shape of eyes, eye brow, mouth, head boundary, face boundary, chin and so on. The purpose of this paper is to develop an automatic facial feature extraction system, which is able to identify the eye location, the detailed shape of eyes and mouth, chin and inner boundary from facial images. This system not only extracts the location information of the eyes, but also estimates four important points in each eye, which helps us to rebuild the eye shape. To model mouth shape, mouth extraction gives us both mouth location and two corners of mouth, top and bottom lips. From inner boundary we obtain and chin, we have face boundary. Based on wavelet features, we can reduce the noise from the input image and detect edge information. In order to extract eyes, mouth, inner boundary, we combine wavelet features and facial character to design these algorithms for finding midpoint, eye's coordinates, four important eye's points, mouth's coordinates, four important mouth's points, chin coordinate and then inner boundary. The developed system is tested on Yale Faces and Pedagogy student's faces.

  11. Partial fingerprint matching based on SIFT Features

    Directory of Open Access Journals (Sweden)

    Ms. S.Malathi,

    2010-07-01

    Full Text Available Fingerprints are being extensively used for person identification in a number of commercial, civil, and forensic applications. The current Fingerprint matching technology is quite mature for matching full prints, matching partial fingerprints still needs lots of improvement. Most of the current fingerprint identification systems utilize features that are based on minutiae points and ridge patterns. The major challenges faced in partial fingerprint matching are the absence of sufficient minutiae features and other structures such as core and delta. However, this technology suffers from the problem of handling incomplete prints and often discards any partial fingerprints obtained. Recent research has begun to delve into the problems of latent or partial fingerprints. In this paper we present a novel approach for partial fingerprint matching scheme based on SIFT(Scale Invariant Feature Transform features and matching is achieved using a modified point matching process. Using Neurotechnology database, we demonstrate that the proposed method exhibits an improved performance when matching full print against partial print.

  12. Gender Recognition Based on Sift Features

    CERN Document Server

    Yousefi, Sahar

    2011-01-01

    This paper proposes a robust approach for face detection and gender classification in color images. Previous researches about gender recognition suppose an expensive computational and time-consuming pre-processing step in order to alignment in which face images are aligned so that facial landmarks like eyes, nose, lips, chin are placed in uniform locations in image. In this paper, a novel technique based on mathematical analysis is represented in three stages that eliminates alignment step. First, a new color based face detection method is represented with a better result and more robustness in complex backgrounds. Next, the features which are invariant to affine transformations are extracted from each face using scale invariant feature transform (SIFT) method. To evaluate the performance of the proposed algorithm, experiments have been conducted by employing a SVM classifier on a database of face images which contains 500 images from distinct people with equal ratio of male and female.

  13. Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection

    Institute of Scientific and Technical Information of China (English)

    Fatemeh Azmandian; Ayse Yilmazer; Jennifer G Dy; Javed A Aslam; David R Kaeli

    2014-01-01

    Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.

  14. Implementation of DB-Scan in Multi-Type Feature CoSelection for Clustering

    Directory of Open Access Journals (Sweden)

    K.Parimala

    2013-01-01

    Full Text Available Feature Selection is a preprocessing technique in supervised learning for improving predictive accuracy while reducing dimension in clustering and categorization. Multitype Feature Coselection for Clustering (MFCC with hard k-means is the algorithm which uses intermediate results in one type of feature space enhancing feature selection in other spaces, better feature set is co-selected by heterogeneous features to produce better cluster in each space. Db-Scan is a density-based clustering algorithm finding a number of clusters starting from the estimated density distribution of corresponding nodes. It is one of the most common clustering algorithms and also most cited in scientific literature, as a generalization of DBSCAN to multiple ranges, effectively replacing the parameter with a maximum search radius.This paper presents the empirical results of the MFCC algorithm with Db-scan and also gives the comparison results of MFCC with hard k-means and DB-Scan. DB-Scan clustering is proposed for getting the quality clustering against the outliers and time criteria is less than any other clustering in high density data set.

  15. A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks

    Data.gov (United States)

    National Aeronautics and Space Administration — In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used...

  16. Jointly Feature Learning and Selection for Robust Tracking via a Gating Mechanism.

    Science.gov (United States)

    Zhong, Bineng; Zhang, Jun; Wang, Pengfei; Du, Jixiang; Chen, Duansheng

    2016-01-01

    To achieve effective visual tracking, a robust feature representation composed of two separate components (i.e., feature learning and selection) for an object is one of the key issues. Typically, a common assumption used in visual tracking is that the raw video sequences are clear, while real-world data is with significant noise and irrelevant patterns. Consequently, the learned features may be not all relevant and noisy. To address this problem, we propose a novel visual tracking method via a point-wise gated convolutional deep network (CPGDN) that jointly performs the feature learning and feature selection in a unified framework. The proposed method performs dynamic feature selection on raw features through a gating mechanism. Therefore, the proposed method can adaptively focus on the task-relevant patterns (i.e., a target object), while ignoring the task-irrelevant patterns (i.e., the surrounding background of a target object). Specifically, inspired by transfer learning, we firstly pre-train an object appearance model offline to learn generic image features and then transfer rich feature hierarchies from an offline pre-trained CPGDN into online tracking. In online tracking, the pre-trained CPGDN model is fine-tuned to adapt to the tracking specific objects. Finally, to alleviate the tracker drifting problem, inspired by an observation that a visual target should be an object rather than not, we combine an edge box-based object proposal method to further improve the tracking accuracy. Extensive evaluation on the widely used CVPR2013 tracking benchmark validates the robustness and effectiveness of the proposed method. PMID:27575684

  17. Jointly Feature Learning and Selection for Robust Tracking via a Gating Mechanism

    Science.gov (United States)

    Zhong, Bineng; Zhang, Jun; Wang, Pengfei; Du, Jixiang; Chen, Duansheng

    2016-01-01

    To achieve effective visual tracking, a robust feature representation composed of two separate components (i.e., feature learning and selection) for an object is one of the key issues. Typically, a common assumption used in visual tracking is that the raw video sequences are clear, while real-world data is with significant noise and irrelevant patterns. Consequently, the learned features may be not all relevant and noisy. To address this problem, we propose a novel visual tracking method via a point-wise gated convolutional deep network (CPGDN) that jointly performs the feature learning and feature selection in a unified framework. The proposed method performs dynamic feature selection on raw features through a gating mechanism. Therefore, the proposed method can adaptively focus on the task-relevant patterns (i.e., a target object), while ignoring the task-irrelevant patterns (i.e., the surrounding background of a target object). Specifically, inspired by transfer learning, we firstly pre-train an object appearance model offline to learn generic image features and then transfer rich feature hierarchies from an offline pre-trained CPGDN into online tracking. In online tracking, the pre-trained CPGDN model is fine-tuned to adapt to the tracking specific objects. Finally, to alleviate the tracker drifting problem, inspired by an observation that a visual target should be an object rather than not, we combine an edge box-based object proposal method to further improve the tracking accuracy. Extensive evaluation on the widely used CVPR2013 tracking benchmark validates the robustness and effectiveness of the proposed method. PMID:27575684

  18. Arabic writer identification based on diacritic's features

    Science.gov (United States)

    Maliki, Makki; Al-Jawad, Naseer; Jassim, Sabah A.

    2012-06-01

    Natural languages like Arabic, Kurdish, Farsi (Persian), Urdu, and any other similar languages have many features, which make them different from other languages like Latin's script. One of these important features is diacritics. These diacritics are classified as: compulsory like dots which are used to identify/differentiate letters, and optional like short vowels which are used to emphasis consonants. Most indigenous and well trained writers often do not use all or some of these second class of diacritics, and expert readers can infer their presence within the context of the writer text. In this paper, we investigate the use of diacritics shapes and other characteristic as parameters of feature vectors for Arabic writer identification/verification. Segmentation techniques are used to extract the diacritics-based feature vectors from examples of Arabic handwritten text. The results of evaluation test will be presented, which has been carried out on an in-house database of 50 writers. Also the viability of using diacritics for writer recognition will be demonstrated.

  19. FEATURE EXTRACTION FOR EMG BASED PROSTHESES CONTROL

    Directory of Open Access Journals (Sweden)

    R. Aishwarya

    2013-01-01

    Full Text Available The control of prosthetic limb would be more effective if it is based on Surface Electromyogram (SEMG signals from remnant muscles. The analysis of SEMG signals depend on a number of factors, such as amplitude as well as time- and frequency-domain properties. Time series analysis using Auto Regressive (AR model and Mean frequency which is tolerant to white Gaussian noise are used as feature extraction techniques. EMG Histogram is used as another feature vector that was seen to give more distinct classification. The work was done with SEMG dataset obtained from the NINAPRO DATABASE, a resource for bio robotics community. Eight classes of hand movements hand open, hand close, Wrist extension, Wrist flexion, Pointing index, Ulnar deviation, Thumbs up, Thumb opposite to little finger are taken into consideration and feature vectors are extracted. The feature vectors can be given to an artificial neural network for further classification in controlling the prosthetic arm which is not dealt in this paper.

  20. BROAD PHONEME CLASSIFICATION USING SIGNAL BASED FEATURES

    Directory of Open Access Journals (Sweden)

    Deekshitha G

    2014-12-01

    Full Text Available Speech is the most efficient and popular means of human communication Speech is produced as a sequence of phonemes. Phoneme recognition is the first step performed by automatic speech recognition system. The state-of-the-art recognizers use mel-frequency cepstral coefficients (MFCC features derived through short time analysis, for which the recognition accuracy is limited. Instead of this, here broad phoneme classification is achieved using features derived directly from the speech at the signal level itself. Broad phoneme classes include vowels, nasals, fricatives, stops, approximants and silence. The features identified useful for broad phoneme classification are voiced/unvoiced decision, zero crossing rate (ZCR, short time energy, most dominant frequency, energy in most dominant frequency, spectral flatness measure and first three formants. Features derived from short time frames of training speech are used to train a multilayer feedforward neural network based classifier with manually marked class label as output and classification accuracy is then tested. Later this broad phoneme classifier is used for broad syllable structure prediction which is useful for applications such as automatic speech recognition and automatic language identification.

  1. 基于熵特征优选分组聚类的相似重复记录检测%Detection of approximately duplicated records based on entropy feature selection grouping clustering

    Institute of Scientific and Technical Information of China (English)

    张平; 党选举; 陈皓; 杨文雷

    2011-01-01

    At present, the approximately duplicate records of massive data can not be detected effectively by current methods,an algorithm based on entropy feature selection grouping clustering( FSGC) is proposed. The basic idea is that through constructing an entropy metric based on similarity between objects, the importance of each property can be evaluated and a key property subset can be obtained. According to the key property to split the data sets into small data sets,the approximately duplicated records are identified based on the algorithm of density-based spatial of applications with noise (DBSCAN). The theory analysis and experimental results show that identification precision and detection efficiency of the method are high and it can effectively solve the problems of identification in approximately duplicate records of the massive data set.%针对目前相似重复记录检测方法不能有效处理大数据量的问题,提出一种基于熵的特征优选分组聚类的算法.该方法通过构造一个基于对象间相似度的熵度量,对原始数据集中各属性进行重要性评估,筛选出关键属性集,并依据关键属性将数据划分为不相交的小数据集,在各小数据集中用DBSCAN聚类算法进行相似重复记录的检测.理论分析和实验结果表明:该方法识别精度和检测效率较高.

  2. 多特征和多分类器组合的湿地遥感影像分类%Wetland remote sensing image classification method based on multi-feature selection and multi-classifiers combination

    Institute of Scientific and Technical Information of China (English)

    李畅; 刘鹏程

    2012-01-01

    为了适应湿地遥感影像分类,选择了湿地影像的典型特征,提出了一种组合多分类器的湿地遥感分类方法.提取湿地遥感影像的独立分量、纹理、湖泊透明度、归一化水体指数、绿度指数和湿度分量特征;选择样本对最小欧氏距离、光谱夹角填图、贝叶斯和支持向量机分类器进行训练学习.根据各分类器的混淆矩阵对其赋权值,检验样本是否满足正态分布;根据权值和假设检验结果构建组合分类器决策网络.实验表明该方法较传统湿地分类方法具有更好的性能和更高的精度.%Taking features of wetland's remote sensing image into account, typical feature selection is discussed. The independent component, texture, lake clarity, NDWI, GVI and WI of wetland image are extracted. The classifiers of minimum Euclidean distance, spectral angle mapper, Bayes and supporting vector machine are trained by sample respectively. Weights of every classifier are given by confusion matrices, and whether the sample meets normal distribution is tested. Multi-classifiers combination based on decision network is generated by weights and hypothesis test result. The experimental results show presented method has better performance and higher accuracy than traditional single-classifier method.

  3. Dermoscopy analysis of RGB-images based on comparative features

    Science.gov (United States)

    Myakinin, Oleg O.; Zakharov, Valery P.; Bratchenko, Ivan A.; Artemyev, Dmitry N.; Neretin, Evgeny Y.; Kozlov, Sergey V.

    2015-09-01

    In this paper, we propose an algorithm for color and texture analysis for dermoscopic images of human skin based on Haar wavelets, Local Binary Patterns (LBP) and Histogram Analysis. This approach is a modification of «7-point checklist» clinical method. Thus, that is an "absolute" diagnostic method because one is using only features extracted from tumor's ROI (Region of Interest), which can be selected manually and/or using a special algorithm. We propose additional features extracted from the same image for comparative analysis of tumor and healthy skin. We used Euclidean distance, Cosine similarity, and Tanimoto coefficient as comparison metrics between color and texture features extracted from tumor's and healthy skin's ROI separately. A classifier for separating melanoma images from other tumors has been built by SVM (Support Vector Machine) algorithm. Classification's errors with and without comparative features between skin and tumor have been analyzed. Significant increase of recognition quality with comparative features has been demonstrated. Moreover, we analyzed two modes (manual and automatic) for ROI selecting on tumor and healthy skin areas. We have reached 91% of sensitivity using comparative features in contrast with 77% of sensitivity using the only "absolute" method. The specificity was the invariable (94%) in both cases.

  4. BUILDING ROBUST APPEARANCE MODELS USING ON-LINE FEATURE SELECTION

    Energy Technology Data Exchange (ETDEWEB)

    PORTER, REID B. [Los Alamos National Laboratory; LOVELAND, ROHAN [Los Alamos National Laboratory; ROSTEN, ED [Los Alamos National Laboratory

    2007-01-29

    In many tracking applications, adapting the target appearance model over time can improve performance. This approach is most popular in high frame rate video applications where latent variables, related to the objects appearance (e.g., orientation and pose), vary slowly from one frame to the next. In these cases the appearance model and the tracking system are tightly integrated, and latent variables are often included as part of the tracking system's dynamic model. In this paper we describe our efforts to track cars in low frame rate data (1 frame/second) acquired from a highly unstable airborne platform. Due to the low frame rate, and poor image quality, the appearance of a particular vehicle varies greatly from one frame to the next. This leads us to a different problem: how can we build the best appearance model from all instances of a vehicle we have seen so far. The best appearance model should maximize the future performance of the tracking system, and maximize the chances of reacquiring the vehicle once it leaves the field of view. We propose an online feature selection approach to this problem and investigate the performance and computational trade-offs with a real-world dataset.

  5. GENDER RECOGNITION BASED ON SIFT FEATURES

    Directory of Open Access Journals (Sweden)

    Sahar Yousefi

    2011-08-01

    Full Text Available This paper proposes a robust approach for face detection and gender classification in color images.Previous researches about gender recognition suppose an expensive computational and time-consumingpre-processing step in order to alignment in which face images are aligned so that facial landmarks likeeyes, nose, lips, chin are placed in uniform locations in image. In this paper, a novel technique based onmathematical analysis is represented in three stages that eliminates alignment step. First, a new colorbased face detection method is represented with a better result and more robustness in complexbackgrounds. Next, the features which are invariant to affine transformations are extracted from eachface using scale invariant feature transform (SIFT method. To evaluate the performance of the proposedalgorithm, experiments have been conducted by employing a SVM classifier on a database of face imageswhich contains 500 images from distinct people with equal ratio of male and female.

  6. Research on Chinese Word Segmentation Based on Bi- Direction Marching Method and Feature Selection Algorithm%基于双向匹配法和特征选择算法的中文分词技术研究

    Institute of Scientific and Technical Information of China (English)

    麦范金; 李东普; 岳晓光

    2011-01-01

    Bi-direction marching method is a traditional algorithm, which can find ambiguity but can not solve the ambiguity problem. In order to find a better solution, this paper proposes a combination method based on bidirection marching method and feature selection algorithm. Through the accumulation of corpus, a Chinese word segmentation system is designed. Experimental results show that the new Chinese word segmentation method is better than traditional methods.%传统的双向匹配算法虽然能够发现歧义现象,但是却不能解决歧义问题.为了更好地进行歧义消解,提出了一种基于双向匹配法和特征选择算法的中文分词技术,通过积累的语料库,设计并实现了一个基于两种方法的分词系统.该系统的实验结果表明,基于双向匹配法和特征选择算法的中文分词技术比传统方法的效果要好.

  7. Research into a Feature Selection Method for Hyperspectral Imagery Using PSO and SVM

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Classification and recognition of hyperspectral remote sensing images is not the same as that of conventional multi-spectral remote sensing images.We propose, a novel feature selection and classification method for hyperspectral images by combining the global optimization ability of particle swarm optimization (PSO) algorithm and the superior classification performance of a support vector machine (SVM).Global optimal search performance of PSO is improved by using a chaotic optimization search technique.Granularity based grid search strategy is used to optimize the SVM model parameters.Parameter optimization and classification of the SVM are addressed using the training date corresponding to the feature subset.A false classification rate is adopted as a fitness function.Tests of feature selection and classification are carried out on a hyperspectral data set.Classification performances are also compared among different feature extraction methods commonly used today.Results indicate that this hybrid method has a higher classification accuracy and can effectively extract optimal bands.A feasible approach is provided for feature selection and classification of hyperspectral image data.

  8. Feature selection and multi-kernel learning for sparse representation on a manifold

    KAUST Repository

    Wang, Jingyan

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao etal. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods. © 2013 Elsevier Ltd.

  9. Feature selection and multi-kernel learning for adaptive graph regularized nonnegative matrix factorization

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-09-20

    Nonnegative matrix factorization (NMF), a popular part-based representation technique, does not capture the intrinsic local geometric structure of the data space. Graph regularized NMF (GNMF) was recently proposed to avoid this limitation by regularizing NMF with a nearest neighbor graph constructed from the input data set. However, GNMF has two main bottlenecks. First, using the original feature space directly to construct the graph is not necessarily optimal because of the noisy and irrelevant features and nonlinear distributions of data samples. Second, one possible way to handle the nonlinear distribution of data samples is by kernel embedding. However, it is often difficult to choose the most suitable kernel. To solve these bottlenecks, we propose two novel graph-regularized NMF methods, AGNMFFS and AGNMFMK, by introducing feature selection and multiple-kernel learning to the graph regularized NMF, respectively. Instead of using a fixed graph as in GNMF, the two proposed methods learn the nearest neighbor graph that is adaptive to the selected features and learned multiple kernels, respectively. For each method, we propose a unified objective function to conduct feature selection/multi-kernel learning, NMF and adaptive graph regularization simultaneously. We further develop two iterative algorithms to solve the two optimization problems. Experimental results on two challenging pattern classification tasks demonstrate that the proposed methods significantly outperform state-of-the-art data representation methods.

  10. Prediction of cell-penetrating peptides with feature selection techniques.

    Science.gov (United States)

    Tang, Hua; Su, Zhen-Dong; Wei, Huan-Huan; Chen, Wei; Lin, Hao

    2016-08-12

    Cell-penetrating peptides are a group of peptides which can transport different types of cargo molecules such as drugs across plasma membrane and have been applied in the treatment of various diseases. Thus, the accurate prediction of cell-penetrating peptides with bioinformatics methods will accelerate the development of drug delivery systems. The study aims to develop a powerful model to accurately identify cell-penetrating peptides. At first, the peptides were translated into a set of vectors with the same dimension by using dipeptide compositions. Secondly, the Analysis of Variance-based technique was used to reduce the dimension of the vector and explore the optimized features. Finally, the support vector machine was utilized to discriminate cell-penetrating peptides from non-cell-penetrating peptides. The five-fold cross-validated results showed that our proposed method could achieve an overall prediction accuracy of 83.6%. Based on the proposed model, we constructed a free webserver called C2Pred (http://lin.uestc.edu.cn/server/C2Pred). PMID:27291150

  11. Feature Selection in Detection of Adverse Drug Reactions from the Health Improvement Network (THIN Database

    Directory of Open Access Journals (Sweden)

    Yihui Liu

    2015-02-01

    Full Text Available Adverse drug reaction (ADR is widely concerned for public health issue. ADRs are one of most common causes to withdraw some drugs from market. Prescription event monitoring (PEM is an important approach to detect the adverse drug reactions. The main problem to deal with this method is how to automatically extract the medical events or side effects from high-throughput medical events, which are collected from day to day clinical practice. In this study we propose a novel concept of feature matrix to detect the ADRs. Feature matrix, which is extracted from big medical data from The Health Improvement Network (THIN database, is created to characterize the medical events for the patients who take drugs. Feature matrix builds the foundation for the irregular and big medical data. Then feature selection methods are performed on feature matrix to detect the significant features. Finally the ADRs can be located based on the significant features. The experiments are carried out on three drugs: Atorvastatin, Alendronate, and Metoclopramide. Major side effects for each drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on computerized methods, further investigation is needed.

  12. Cuckoo search optimisation for feature selection in cancer classification: a new approach.

    Science.gov (United States)

    Gunavathi, C; Premalatha, K

    2015-01-01

    Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets. PMID:26547979

  13. Feature selection for better identification of subtypes of Guillain-Barré syndrome.

    Science.gov (United States)

    Hernández-Torruco, José; Canul-Reich, Juana; Frausto-Solís, Juan; Méndez-Castillo, Juan José

    2014-01-01

    Guillain-Barré syndrome (GBS) is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS), chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM) clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques. PMID:25302074

  14. Feature Selection for Better Identification of Subtypes of Guillain-Barré Syndrome

    Directory of Open Access Journals (Sweden)

    José Hernández-Torruco

    2014-01-01

    Full Text Available Guillain-Barré syndrome (GBS is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS, chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.

  15. Feature Subset Selection for Hot Method Prediction using Genetic Algorithm wrapped with Support Vector Machines

    Directory of Open Access Journals (Sweden)

    S. Johnson

    2011-01-01

    Full Text Available Problem statement: All compilers have simple profiling-based heuristics to identify and predict program hot methods and also to make optimization decisions. The major challenge in the profile-based optimization is addressing the problem of overhead. The aim of this work is to perform feature subset selection using Genetic Algorithms (GA to improve and refine the machine learnt static hot method predictive technique and to compare the performance of the new models against the simple heuristics. Approach: The relevant features for training the predictive models are extracted from an initial set of randomly selected ninety static program features, with the help of the GA wrapped with the predictive model using the Support Vector Machine (SVM, a Machine Learning (ML algorithm. Results: The GA-generated feature subsets containing thirty and twenty nine features respectively for the two predictive models when tested on MiBench predict Long Running Hot Methods (LRHM and frequently called hot methods (FCHM with the respective accuracies of 71% and 80% achieving an increase of 19% and 22%. Further, inlining of the predicted LRHM and FCHM improve the program performance by 3% and 5% as against 4% and 6% with Low Level Virtual Machines (LLVM default heuristics. When intra-procedural optimizations (IPO are performed on the predicted hot methods, this system offers a performance improvement of 5% and 4% as against 0% and 3% by LLVM default heuristics on LRHM and FCHM respectively. However, we observe an improvement of 36% in certain individual programs. Conclusion: Overall, the results indicate that the GA wrapped with SVM derived feature reduction improves the hot method prediction accuracy and that the technique of hot method prediction based optimization is potentially useful in selective optimization.

  16. 采用多特征融合的自动适配区选择方法%Automatic suitable-matching area selection method based on multi-feature fusion

    Institute of Scientific and Technical Information of China (English)

    罗海波; 常铮; 余新荣; 丁庆海

    2011-01-01

    Target tracking with local non-texture is a difficult point and hot topic in the field of ground imaging guidance. Since the automatic suitable-matching area selection is an effective method to solve this problem, an algorithm of automatic suitable-matching area selection based on multi-feature fusion was proposed. Firstly, the edge density, the average edge strength, the edge direction dispersion degree andthe space distance were integrated to form a suitable-matching measure function. Then, the credibility of suitable-matching of each point in the image was calculated by this function. Lastly, through developing adaptive selection strategy to the suitable-matching area, three suitable-matching areas with high credibility were segmented as target template for matching tracking. Experimental results show that the segmented suitable-matching area with proposed algorithm can achieve more tracking precision compared with the results judged by the human experience. This proposed algorithm can be widely used in the applications of the ground imaging-guided target tracking with local non-texture target and the scene matching task planning.%局部无纹理目标跟踪是当今空地成像制导领域的一个难点和热点问题,而自动适配区选择是解决该难题的一种有效方法.介绍了一种基于多特征融合的自动适配区选择方法.首先,构造一个融合边缘密度、平均边缘强度、边缘方向离散度以及空间距离的适配性度量函数;然后,采用该函数计算图像中每一点的适配置信度;通过制定适当的适配区选择策略,分割出3个置信度相对较高的适配区,用作匹配跟踪的目标模板.实验结果表明,采用该方法分割出的适配区与通过人工经验判断的结果相近,获得了较好的结果.该方法可广泛用于空地成像制导的局部无纹理目标跟踪以及景象匹配任务规划等应用中.

  17. Feature selection and classification of multiparametric medical images using bagging and SVM

    Science.gov (United States)

    Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

    2008-03-01

    This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.

  18. Feature Selection for Bayesian Evaluation of Trauma Death Risk

    CERN Document Server

    Jakaite, L

    2008-01-01

    In the last year more than 70,000 people have been brought to the UK hospitals with serious injuries. Each time a clinician has to urgently take a patient through a screening procedure to make a reliable decision on the trauma treatment. Typically, such procedure comprises around 20 tests; however the condition of a trauma patient remains very difficult to be tested properly. What happens if these tests are ambiguously interpreted, and information about the severity of the injury will come misleading? The mistake in a decision can be fatal: using a mild treatment can put a patient at risk of dying from posttraumatic shock, while using an overtreatment can also cause death. How can we reduce the risk of the death caused by unreliable decisions? It has been shown that probabilistic reasoning, based on the Bayesian methodology of averaging over decision models, allows clinicians to evaluate the uncertainty in decision making. Based on this methodology, in this paper we aim at selecting the most important screeni...

  19. Feature selection from high resolution remote sensing data for biotope mapping

    Science.gov (United States)

    Bindel, M.; Hese, S.; Berger, C.; Schmullius, C.

    2011-09-01

    Mapping of Landscape Protection Areas with regard to user requirements for detailed land cover and biotope classes has been limited by the spatial and temporal resolution of Earth observation data. The synergistic use of new generation optical and SAR data may overcome these limitations. The presented work is part of the ENVILAND-2 project, which focuses on the complementary use of RapidEye and TerraSAR-X data to derive land cover and biotope classes as needed by the Environmental Agencies. The goal is to semi-automatically update the corresponding maps by utilising more Earth observation data and less field work derived information. Properties of both sensors are used including the red edge band of the RapidEye system and the high spatial and temporal resolution TerraSAR-X data.The main part of this work concentrates on the process of feature selection. Based upon multi-temporal optical and SAR data various features like textural measurements, spectral features and vegetation indices can be computed. The resulting information stacks can easily exceed hundreds of layers. The goal of this work is to reduce these information layers to get a set of decorrelated features for the classification of biotope types. The first step is to evaluate possible features. Followed by a feature extraction and pre-processing. The pre-processing contains outlier removal and feature normalization. The next step describes the process of feature selection and is divided into two parts. The first part is a regression analysis to remove redundant information. The second part constitutes the class separability analysis. For the remaining features and for every class combination present in the study area different separability measurements like divergence or Jeffries-Matusita distance are computed. As result there is a set of features for every class providing the highest class separability values. As the final step an evaluation is performed to estimate how much features for a class are

  20. Oxygen Saturation and RR Intervals Feature Selection for Sleep Apnea Detection

    Directory of Open Access Journals (Sweden)

    Antonio G. Ravelo-García

    2015-05-01

    Full Text Available A diagnostic system for sleep apnea based on oxygen saturation and RR intervals obtained from the EKG (electrocardiogram is proposed with the goal to detect and quantify minute long segments of sleep with breathing pauses. We measured the discriminative capacity of combinations of features obtained from RR series and oximetry to evaluate improvements of the performance compared to oximetry-based features alone. Time and frequency domain variables derived from oxygen saturation (SpO2 as well as linear and non-linear variables describing the RR series have been explored in recordings from 70 patients with suspected sleep apnea. We applied forward feature selection in order to select a minimal set of variables that are able to locate patterns indicating respiratory pauses. Linear discriminant analysis (LDA was used to classify the presence of apnea during specific segments. The system will finally provide a global score indicating the presence of clinically significant apnea integrating the segment based apnea detection. LDA results in an accuracy of 87%; sensitivity of 76% and specificity of 91% (AUC = 0.90 with a global classification of 97% when only oxygen saturation is used. In case of additionally including features from the RR series; the system performance improves to an accuracy of 87%; sensitivity of 73% and specificity of 92% (AUC = 0.92, with a global classification rate of 100%.

  1. A hybrid features based image matching algorithm

    Science.gov (United States)

    Tu, Zhenbiao; Lin, Tao; Sun, Xiao; Dou, Hao; Ming, Delie

    2015-12-01

    In this paper, we present a novel image matching method to find the correspondences between two sets of image interest points. The proposed method is based on a revised third-order tensor graph matching method, and introduces an energy function that takes four kinds of energy term into account. The third-order tensor method can hardly deal with the situation that the number of interest points is huge. To deal with this problem, we use a potential matching set and a vote mechanism to decompose the matching task into several sub-tasks. Moreover, the third-order tensor method sometimes could only find a local optimum solution. Thus we use a cluster method to divide the feature points into some groups and only sample feature triangles between different groups, which could make the algorithm to find the global optimum solution much easier. Experiments on different image databases could prove that our new method would obtain correct matching results with relatively high efficiency.

  2. Localization of neural efficiency of the mathematically gifted brain through a feature subset selection method.

    Science.gov (United States)

    Zhang, Li; Gan, John Q; Wang, Haixian

    2015-10-01

    Based on the neural efficiency hypothesis and task-induced EEG gamma-band response (GBR), this study investigated the brain regions where neural resource could be most efficiently recruited by the math-gifted adolescents in response to varying cognitive demands. In this experiment, various GBR-based mental states were generated with three factors (level of mathematical ability, task complexity, and short-term learning) modulating the level of neural activation. A feature subset selection method based on the sequential forward floating search algorithm was used to identify an "optimal" combination of EEG channel locations, where the corresponding GBR feature subset could obtain the highest accuracy in discriminating pairwise mental states influenced by each experiment factor. The integrative results from multi-factor selections suggest that the right-lateral fronto-parietal system is highly involved in neural efficiency of the math-gifted brain, primarily including the bilateral superior frontal, right inferior frontal, right-lateral central and right temporal regions. By means of the localization method based on single-trial classification of mental states, new GBR features and EEG channel-based brain regions related to mathematical giftedness were identified, which could be useful for the brain function improvement of children/adolescents in mathematical learning through brain-computer interface systems.

  3. Image Recommendation Algorithm Using Feature-Based Collaborative Filtering

    Science.gov (United States)

    Kim, Deok-Hwan

    As the multimedia contents market continues its rapid expansion, the amount of image contents used in mobile phone services, digital libraries, and catalog service is increasing remarkably. In spite of this rapid growth, users experience high levels of frustration when searching for the desired image. Even though new images are profitable to the service providers, traditional collaborative filtering methods cannot recommend them. To solve this problem, in this paper, we propose feature-based collaborative filtering (FBCF) method to reflect the user's most recent preference by representing his purchase sequence in the visual feature space. The proposed approach represents the images that have been purchased in the past as the feature clusters in the multi-dimensional feature space and then selects neighbors by using an inter-cluster distance function between their feature clusters. Various experiments using real image data demonstrate that the proposed approach provides a higher quality recommendation and better performance than do typical collaborative filtering and content-based filtering techniques.

  4. How Invariant Feature Selectivity Is Achieved in Cortex.

    Science.gov (United States)

    Sharpee, Tatyana O

    2016-01-01

    Parsing the visual scene into objects is paramount to survival. Yet, how this is accomplished by the nervous system remains largely unknown, even in the comparatively well understood visual system. It is especially unclear how detailed peripheral signal representations are transformed into the object-oriented representations that are independent of object position and are provided by the final stages of visual processing. This perspective discusses advances in computational algorithms for fitting large-scale models that make it possible to reconstruct the intermediate steps of visual processing based on neural responses to natural stimuli. In particular, it is now possible to characterize how different types of position invariance, such as local (also known as phase invariance) and more global, are interleaved with nonlinear operations to allow for coding of curved contours. Neurons in the mid-level visual area V4 exhibit selectivity to pairs of even- and odd-symmetric profiles along curved contours. Such pairing is reminiscent of the response properties of complex cells in the primary visual cortex (V1) and suggests specific ways in which V1 signals are transformed within subsequent visual cortical areas. These examples illustrate that large-scale models fitted to neural responses to natural stimuli can provide generative models of successive stages of sensory processing. PMID:27601991

  5. Feature Selection in Data-Mining for Genetics Using Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    V. N. Rajavarman

    2007-01-01

    Full Text Available We discovered genetic features and environmental factors which were involved in multifactorial diseases. To exploit the massive data obtained from the experiments conducted at the General Hospital, Chennai, data mining tools were required and we proposed a 2-Phase approach using a specific genetic algorithm. This heuristic approach had been chosen as the number of features to consider was large (upto 3654 for biological data under our study. Collected data indicated for pairs of affected individuals of a same family their similarity at given points (locus of their chromosomes. This was represented in a matrix where each locus was represented by a column and each pairs of individuals considered by a row. The objective was first to isolate the most relevant associations of features and then to class individuals that had the considered disease according to these associations. For the first phase, the feature selection problem, we used a genetic algorithm (GA. To deal with this very specific problem, some advanced mechanisms had been introduced in the genetic algorithm such as sharing, random immigrant, dedicated genetic operators and a particular distance operator had been defined. Then, the second phase, a clustering based on the features selected during the previous phase, will use the clustering algorithm k-means.

  6. Multi scale feature based matched filter processing

    Institute of Scientific and Technical Information of China (English)

    LI Jun; HOU Chaohuan

    2004-01-01

    Using the extreme difference of self-similarity and kurtosis at large level scale of wavelet transform approximation between the PTFM (Pulse Trains of Frequency Modulated)signals and its reverberation, a feature-based matched filter method using the classify-beforedetect paragriam is proposed to improve the detection performance in reverberation and multipath environments. Processing the data of lake-trails showed that the processing gain of the proposed method is bigger than that of matched filter about 10 dB. In multipath environments, detection performance of matched filter become badly poorer, while that of the proposed method is improved better. It shows that the method is much more robust with the effect of multipath.

  7. Improved AAG based recognization of machining feature

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The lost information caused by feature interaction is restored by using auxiliary faces(AF)and virtual links(VL).The delta volume of the interacted features represented by concave attachable connected graph (CACG)can be decomposed into several isolated features represented by complete concave adjacency graph (CCAG).We can recognize the features sketchy type by using CCAG as a hint; the exact type of the feature can be attained by deleting the auxiliary faces from the isolated feature.United machining feature(UMF)is used to represent the features that can be machined in the same machining process.It is important to the rationalizing of the process plans and reduce the time costing in machining.An example is given to demonstrate the effectiveness of this method.

  8. On a Variational Model for Selective Image Segmentation of Features with Infinite Perimeter

    Institute of Scientific and Technical Information of China (English)

    Lavdie RADA; Ke CHEN

    2013-01-01

    Variational models provide reliable formulation for segmentation of features and their boundaries in an image,following the seminal work of Mumford-Shah (1989,Commun.Pure Appl.Math.) on dividing a general surface into piecewise smooth sub-surfaces.A central idea of models based on this work is to minimize the length of feature's boundaries (i.e.,(H)1 Hausdorff measure).However there exist problems with irregular and oscillatory object boundaries,where minimizing such a length is not appropriate,as noted by Barchiesi et al.(2010,SIAM J.Multiscale Model.Simu.) who proposed to miminize (L)2 Lebesgue measure of the γ-neighborhood of the boundaries.This paper presents a dual level set selective segmentation model based on Barchiesi et al.(2010) to automatically select a local feature instead of all global features.Our model uses two level set functions:a global level set which segments all boundaries,and the local level set which evolves and finds the boundary of the object closest to the geometric constraints.Using real life images with oscillatory boundaries,we show qualitative results demonstrating the effectiveness of the proposed method.

  9. Feature Selection by Merging Sequential Bidirectional Search into Relevance Vector Machine in Condition Monitoring

    Institute of Scientific and Technical Information of China (English)

    ZHANG Kui; DONG Yu; BALL Andrew

    2015-01-01

    For more accurate fault detection and diagnosis, there is an increasing trend to use a large number of sensors and to collect data at high frequency. This inevitably produces large-scale data and causes difficulties in fault classification. Actually, the classification methods are simply intractable when applied to high-dimensional condition monitoring data. In order to solve the problem, engineers have to resort to complicated feature extraction methods to reduce the dimensionality of data. However, the features transformed by the methods cannot be understood by the engineers due to a loss of the original engineering meaning. In this paper, other forms of dimensionality reduction technique(feature selection methods) are employed to identify machinery condition, based only on frequency spectrum data. Feature selection methods are usually divided into three main types: filter, wrapper and embedded methods. Most studies are mainly focused on the first two types, whilst the development and application of the embedded feature selection methods are very limited. This paper attempts to explore a novel embedded method. The method is formed by merging a sequential bidirectional search algorithm into scale parameters tuning within a kernel function in the relevance vector machine. To demonstrate the potential for applying the method to machinery fault diagnosis, the method is implemented to rolling bearing experimental data. The results obtained by using the method are consistent with the theoretical interpretation, proving that this algorithm has important engineering significance in revealing the correlation between the faults and relevant frequency features. The proposed method is a theoretical extension of relevance vector machine, and provides an effective solution to detect the fault-related frequency components with high efficiency.

  10. Biosensor method and system based on feature vector extraction

    Science.gov (United States)

    Greenbaum, Elias; Rodriguez, Jr., Miguel; Qi, Hairong; Wang, Xiaoling

    2012-04-17

    A method of biosensor-based detection of toxins comprises the steps of providing at least one time-dependent control signal generated by a biosensor in a gas or liquid medium, and obtaining a time-dependent biosensor signal from the biosensor in the gas or liquid medium to be monitored or analyzed for the presence of one or more toxins selected from chemical, biological or radiological agents. The time-dependent biosensor signal is processed to obtain a plurality of feature vectors using at least one of amplitude statistics and a time-frequency analysis. At least one parameter relating to toxicity of the gas or liquid medium is then determined from the feature vectors based on reference to the control signal.

  11. UNLABELED SELECTED SAMPLES IN FEATURE EXTRACTION FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES WITH LIMITED TRAINING SAMPLES

    Directory of Open Access Journals (Sweden)

    A. Kianisarkaleh

    2015-12-01

    Full Text Available Feature extraction plays a key role in hyperspectral images classification. Using unlabeled samples, often unlimitedly available, unsupervised and semisupervised feature extraction methods show better performance when limited number of training samples exists. This paper illustrates the importance of selecting appropriate unlabeled samples that used in feature extraction methods. Also proposes a new method for unlabeled samples selection using spectral and spatial information. The proposed method has four parts including: PCA, prior classification, posterior classification and sample selection. As hyperspectral image passes these parts, selected unlabeled samples can be used in arbitrary feature extraction methods. The effectiveness of the proposed unlabeled selected samples in unsupervised and semisupervised feature extraction is demonstrated using two real hyperspectral datasets. Results show that through selecting appropriate unlabeled samples, the proposed method can improve the performance of feature extraction methods and increase classification accuracy.

  12. Feature Selection for High-Dimensional Data with RapidMiner

    OpenAIRE

    Lee, Sangkyun; Schowe, Benjamin; Sivakumar, Viswanath

    2012-01-01

    Feature selection is an important task in machine learning, reducing dimensionality of learning problems by selecting few relevant features without losing too much information. Focusing on smaller sets of features, we can learn simpler models from data that are easier to understand and to apply. In fact, simpler models are more robust to input noise and outliers, often leading to better prediction performance than the models trained in higher dimensions with all features. We imple...

  13. Structural Features for Functional Selectivity at Serotonin Receptors

    OpenAIRE

    Wacker, Daniel; Wang, Chong; Katritch, Vsevolod; Han, Gye Won; Huang, Xi-Ping; Vardy, Eyal; McCorvy, John D.; Jiang, Yi; Chu, Meihua; Siu, Fai Yiu; Liu, Wei; Xu, H Eric; Cherezov, Vadim; Roth, Bryan L.; Stevens, Raymond C.

    2013-01-01

    Drugs active at G protein-coupled receptors (GPCRs) can differentially modulate either canonical or non-canonical signaling pathways via a phenomenon known as functional selectivity or biased signaling. We report biochemical studies that show that the hallucinogen lysergic acid diethylamide (LSD), its precursor ergotamine (ERG) and related ergolines display strong functional selectivity for β-arrestin signaling at the 5-hydroxytryptamine (5-HT) receptor 5-HT2B, while being relatively unbiased...

  14. Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

    Directory of Open Access Journals (Sweden)

    D. Das

    2014-04-01

    Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.

  15. 基于阿尔茨海默病早期诊断集成特征选择方法的研究%RESEARCH OF INTEGRATED FEATURE SELECTION METHOD BASED ON THE EARLY DIAGNOSIS OF ALZHEIMER'S DISEASE

    Institute of Scientific and Technical Information of China (English)

    曹元磊; 胡斌; 高翔

    2016-01-01

    Alzheimer is a disease which effects our lives. It is difficult to cure. However,the diagnose of mild cognitive impairment,the early stage of Alzheimer,is the key to delay the progress and treatment of the diseas. MRI is a kind of important image data. Analysis of MRI,use of classification algorithm and separating MCI from normal control is a significant method. And feature selection is an essential step to improve the accuracy of classification. We proposed integrated feature selection method combining the mutual information and Pearson correlation coefficient,not only investigating the correlation between each feature and class labels,and ensuring minimum redundancy between the selected feature subsets. Compared with the classification model of support vector institutions with single mutual information method and max - relevance and min - redundancy method,the results show that the proposed method of classification in higher prediction accuracy,illustrating certain advantages.%阿尔茨海默病是一种严重影响人类生活的病症,它具有难以治愈的特点。而其早期症状,轻度认知障碍的诊断就成了延缓发展和治疗的关键。核磁共振图像是诊断脑部疾病的重要影像资料。通过分析核磁共振图像,再利用分类算法,将轻度认知障碍患者从正常人中区分开来成为一种重要的方法。而特征选择则是提高分类准确率的必要步骤。本文提出将互信息和皮尔逊相关系数集成的特征选择方法,不仅考察每个特征对类标签的相关性,而且保证选出的特征子集之间冗余度最小。实验结果证明,与互信息和 mRMR 方法结合支持向量机进行分类性能比较,本文提出的方法分类准确性更高,说明本文的特征选择方法具有较好的优势。

  16. New Approach for Feature Selection of Thermomechanically Processed HSLA Steel using Pruned-Modular Neural Networks

    Science.gov (United States)

    Das, Prasun; Ghosh, Avishek; Bhattacharyay, Bidyut Kr.; Datta, Shubhabrata

    2012-10-01

    A new approach has been used in modeling of strength and ductility of high strength low alloy (HSLA) steel, where a comparative study among fully-connected neural network, modular network and pruned-module architecture has been performed. The important features for modeling such a complex steel processing system have been worked out. Performance evaluation and feature selection in the soft computing domain are the two important activities for modeling input-output relationship. The need arises specially when the system is complex in terms of type of network architecture, number of features involved, number of inter-connections, application domain etc. In this paper, an attempt is made to develop a new metric of performance evaluation, using mean squared error and the total number of inter-connections of a network to improve the understanding about a complex system of thermomechanically controlled processed HSLA steels. The methodology for feature selection is developed next based on the functional form of output in terms of input variables where gradient of the function can be computed in the network.

  17. Advances in feature selection methods for hyperspectral image processing in food industry applications: a review.

    Science.gov (United States)

    Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An

    2015-01-01

    There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.

  18. Intelligent feature selection techniques for pattern classification of Lamb wave signals

    International Nuclear Information System (INIS)

    Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crosshole tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy

  19. Local Feature based Gender Independent Bangla ASR

    Directory of Open Access Journals (Sweden)

    Bulbul Ahamed

    2012-11-01

    Full Text Available This paper presents an automatic speech recognition (ASR for Bangla (widely used as Bengali by suppressing the speaker gender types based on local features extracted from an input speech. Speaker-specific characteristics play an important role on the performance of Bangla automatic speech recognition (ASR. Gender factor shows adverse effect in the classifier while recognizing a speech by an opposite gender, such as, training a classifier by male but testing is done by female or vice-versa. To obtain a robust ASR system in practice it is necessary to invent a system that incorporates gender independent effect for particular gender. In this paper, we have proposed a Gender-Independent technique for ASR that focused on a gender factor. The proposed method trains the classifier with the both types of gender, male and female, and evaluates the classifier for the male and female. For the experiments, we have designed a medium size Bangla (widely known as Bengali speech corpus for both the male and female.The proposed system has showed a significant improvement of word correct rates, word accuracies and sentence correct rates in comparison with the method that suffers from gender effects using. Moreover, it provides the highest level recognition performance by taking a fewer mixture component in hidden Markov model (HMMs.

  20. A research of selected textural features for detection of asbestos-cement roofing sheets using orthoimages

    Science.gov (United States)

    Książek, Judyta

    2015-10-01

    At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.

  1. Deep sparse multi-task learning for feature selection in Alzheimer's disease diagnosis.

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2016-06-01

    Recently, neuroimaging-based Alzheimer's disease (AD) or mild cognitive impairment (MCI) diagnosis has attracted researchers in the field, due to the increasing prevalence of the diseases. Unfortunately, the unfavorable high-dimensional nature of neuroimaging data, but a limited small number of samples available, makes it challenging to build a robust computer-aided diagnosis system. Machine learning techniques have been considered as a useful tool in this respect and, among various methods, sparse regression has shown its validity in the literature. However, to our best knowledge, the existing sparse regression methods mostly try to select features based on the optimal regression coefficients in one step. We argue that since the training feature vectors are composed of both informative and uninformative or less informative features, the resulting optimal regression coefficients are inevidently affected by the uninformative or less informative features. To this end, we first propose a novel deep architecture to recursively discard uninformative features by performing sparse multi-task learning in a hierarchical fashion. We further hypothesize that the optimal regression coefficients reflect the relative importance of features in representing the target response variables. In this regard, we use the optimal regression coefficients learned in one hierarchy as feature weighting factors in the following hierarchy, and formulate a weighted sparse multi-task learning method. Lastly, we also take into account the distributional characteristics of samples per class and use clustering-induced subclass label vectors as target response values in our sparse regression model. In our experiments on the ADNI cohort, we performed both binary and multi-class classification tasks in AD/MCI diagnosis and showed the superiority of the proposed method by comparing with the state-of-the-art methods. PMID:25993900

  2. Deep sparse multi-task learning for feature selection in Alzheimer's disease diagnosis.

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2016-06-01

    Recently, neuroimaging-based Alzheimer's disease (AD) or mild cognitive impairment (MCI) diagnosis has attracted researchers in the field, due to the increasing prevalence of the diseases. Unfortunately, the unfavorable high-dimensional nature of neuroimaging data, but a limited small number of samples available, makes it challenging to build a robust computer-aided diagnosis system. Machine learning techniques have been considered as a useful tool in this respect and, among various methods, sparse regression has shown its validity in the literature. However, to our best knowledge, the existing sparse regression methods mostly try to select features based on the optimal regression coefficients in one step. We argue that since the training feature vectors are composed of both informative and uninformative or less informative features, the resulting optimal regression coefficients are inevidently affected by the uninformative or less informative features. To this end, we first propose a novel deep architecture to recursively discard uninformative features by performing sparse multi-task learning in a hierarchical fashion. We further hypothesize that the optimal regression coefficients reflect the relative importance of features in representing the target response variables. In this regard, we use the optimal regression coefficients learned in one hierarchy as feature weighting factors in the following hierarchy, and formulate a weighted sparse multi-task learning method. Lastly, we also take into account the distributional characteristics of samples per class and use clustering-induced subclass label vectors as target response values in our sparse regression model. In our experiments on the ADNI cohort, we performed both binary and multi-class classification tasks in AD/MCI diagnosis and showed the superiority of the proposed method by comparing with the state-of-the-art methods.

  3. Effective automated feature construction and selection for classification of biological sequences.

    Directory of Open Access Journals (Sweden)

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  4. Deep sparse multi-task learning for feature selection in Alzheimer’s disease diagnosis

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2016-01-01

    Recently, neuroimaging-based Alzheimer’s disease (AD) or mild cognitive impairment (MCI) diagnosis has attracted researchers in the field, due to the increasing prevalence of the diseases. Unfortunately, the unfavorable high-dimensional nature of neuroimaging data, but a limited small number of samples available, makes it challenging to build a robust computer-aided diagnosis system. Machine learning techniques have been considered as a useful tool in this respect and, among various methods, sparse regression has shown its validity in the literature. However, to our best knowledge, the existing sparse regression methods mostly try to select features based on the optimal regression coefficients in one step. We argue that since the training feature vectors are composed of both informative and uninformative or less informative features, the resulting optimal regression coefficients are inevidently affected by the uninformative or less informative features. To this end, we first propose a novel deep architecture to recursively discard uninformative features by performing sparse multi-task learning in a hierarchical fashion. We further hypothesize that the optimal regression coefficients reflect the relative importance of features in representing the target response variables. In this regard, we use the optimal regression co-efficients learned in one hierarchy as feature weighting factors in the following hierarchy, and formulate a weighted sparse multi-task learning method. Lastly, we also take into account the distributional characteristics of samples per class and use clustering-induced subclass label vectors as target response values in our sparse regression model. In our experiments on the ADNI cohort, we performed both binary and multi-class classification tasks in AD/MCI diagnosis and showed the superiority of the proposed method by comparing with the state-of-the-art methods. PMID:25993900

  5. A combinational feature selection and ensemble neural network method for classification of gene expression data

    Directory of Open Access Journals (Sweden)

    Jiang Tianzi

    2004-09-01

    Full Text Available Abstract Background Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification. Results We validate our new method on several recent publicly available datasets both with predictive accuracy of testing samples and through cross validation. Compared with the best performance of other current methods, remarkably improved results can be obtained using our new strategy on a wide range of different datasets. Conclusions Thus, we conclude that our methods can obtain more information in microarray data to get more accurate classification and also can help to extract the latent marker genes of the diseases for better diagnosis and treatment.

  6. Feature selection of seismic waveforms for long period event detection at Cotopaxi Volcano

    Science.gov (United States)

    Lara-Cueva, R. A.; Benítez, D. S.; Carrera, E. V.; Ruiz, M.; Rojo-Álvarez, J. L.

    2016-04-01

    Volcano Early Warning Systems (VEWS) have become a research topic in order to preserve human lives and material losses. In this setting, event detection criteria based on classification using machine learning techniques have proven useful, and a number of systems have been proposed in the literature. However, to the best of our knowledge, no comprehensive and principled study has been conducted to compare the influence of the many different sets of possible features that have been used as input spaces in previous works. We present an automatic recognition system of volcano seismicity, by considering feature extraction, event classification, and subsequent event detection, in order to reduce the processing time as a first step towards a high reliability automatic detection system in real-time. We compiled and extracted a comprehensive set of temporal, moving average, spectral, and scale-domain features, for separating long period seismic events from background noise. We benchmarked two usual kinds of feature selection techniques, namely, filter (mutual information and statistical dependence) and embedded (cross-validation and pruning), each of them by using suitable and appropriate classification algorithms such as k Nearest Neighbors (k-NN) and Decision Trees (DT). We applied this approach to the seismicity presented at Cotopaxi Volcano in Ecuador during 2009 and 2010. The best results were obtained by using a 15 s segmentation window, feature matrix in the frequency domain, and DT classifier, yielding 99% of detection accuracy and sensitivity. Selected features and their interpretation were consistent among different input spaces, in simple terms of amplitude and spectral content. Our study provides the framework for an event detection system with high accuracy and reduced computational requirements.

  7. 基于SURF特征贡献度矩阵的图像ROI选取与检索方法%ROI Selection and Image Retrieval Method Based on Contribution Matrix of SURF Features

    Institute of Scientific and Technical Information of China (English)

    薛峰; 顾靖; 崔国影; 徐珊; 徐娟

    2015-01-01

    In traditional image retrieval method, features need to be extracted within the whole region of image, which leads to high computation and semantic ambiguity. To address this issue, this paper proposes a technique to select region of interests(ROI), and carries out the retrieve process within the ROI. Firstly, SURF feature descriptor is used to extract local features and keypoints. Then, dynamic program is employed to calculate the sum of sub-matrix of feature points distribution, which is finally utilized to extract the ROI. Finally, we integrate the color, texture and shape features into a fused feature within ROI, and use nonlinear Gaussian distance function to retrieve images from the database with user input. Experimental results show that our proposed method has high conformity with human vision, and is effective for image retrieval.%传统的基于全局特征的图像检索方法中需要对整幅图像特征提取,计算复杂度大,且容易导致语义歧义。针对这一问题,提出一种基于 SURF 特征贡献度矩阵的 ROI 选取和图像检索方法。首先采用 SURF 算子提取图像局部特征,然后依据特征点的Hessian矩阵计算其贡献度矩阵,并将其应用到ROI检测中;在此基础上,融合并归一化ROI 的颜色、纹理以及形状等底层特征,利用非线性高斯距离函数进行相似度匹配,实现图像检索。实验结果表明,与已有算法相比,该算法提取的ROI与人类视觉意图一致性高,检索效果较好。

  8. Multi-Feature Segmentation and Cluster based Approach for Product Feature Categorization

    Directory of Open Access Journals (Sweden)

    Bharat Singh

    2016-03-01

    Full Text Available At a recent time, the web has become a valuable source of online consumer review however as the number of reviews is growing in high speed. It is infeasible for user to read all reviews to make a valuable or satisfying decision because the same features, people can write it contrary words or phrases. To produce a useful summary of domain synonyms words and phrase, need to be a group into same feature group. We focus on feature-based opinion mining problem and this paper mainly studies feature based product categorization from the number of users - generated review available on the different website. First, a multi-feature segmentation method is proposed which segment multi-feature review sentences into the single feature unit. Second part of speech dictionary and context information is used to consider the irrelevant feature identification, sentiment words are used to identify the polarity of feature and finally an unsupervised clustering based product feature categorization method is proposed. Clustering is unsupervised machine learning approach that groups feature that have a high degree of similarity in a same cluster. The proposed approach provides satisfactory results and can achieve 100% average precision for clustering based product feature categorization task. This approach can be applicable to different product.

  9. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis.

    Science.gov (United States)

    Ding, Hui; Feng, Peng-Mian; Chen, Wei; Lin, Hao

    2014-08-01

    The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells. Accurate identification of bacteriophage virion proteins is very important for understanding their functions and clarifying the lysis mechanism of bacterial cells. In this study, a new sequence-based method was developed to identify phage virion proteins. In the new method, the protein sequences were initially formulated by the g-gap dipeptide compositions. Subsequently, the analysis of variance (ANOVA) with incremental feature selection (IFS) was used to search for the optimal feature set. It was observed that, in jackknife cross-validation, the optimal feature set including 160 optimized features can produce the maximum accuracy of 85.02%. By performing feature analysis, we found that the correlation between two amino acids with one gap was more important than other correlations for phage virion protein prediction and that some of the 1-gap dipeptides were important and mainly contributed to the virion protein prediction. This analysis will provide novel insights into the function of phage virion proteins. On the basis of the proposed method, an online web-server, PVPred, was established and can be freely accessed from the website (http://lin.uestc.edu.cn/server/PVPred). We believe that the PVPred will become a powerful tool to study phage virion proteins and to guide the related experimental validations.

  10. Fingerprint Feature Extraction Based on Macroscopic Curvature

    Institute of Scientific and Technical Information of China (English)

    Zhang Xiong; He Gui-ming; Zhang Yun

    2003-01-01

    In the Automatic Fingerprint Identification System (AFIS), extracting the feature of fingerprint is very important. The local curvature of ridges of fingerprint is irregular, so people have the barrier to effectively extract the fingerprint curve features to describe fingerprint. This article proposes a novel algorithm; it embraces information of few nearby fingerprint ridges to extract a new characteristic which can describe the curvature feature of fingerprint. Experimental results show the algorithm is feasible, and the characteristics extracted by it can clearly show the inner macroscopic curve properties of fingerprint. The result also shows that this kind of characteristic is robust to noise and pollution.

  11. Fingerprint Feature Extraction Based on Macroscopic Curvature

    Institute of Scientific and Technical Information of China (English)

    Zhang; Xiong; He; Gui-Ming; 等

    2003-01-01

    In the Automatic Fingerprint Identification System(AFIS), extracting the feature of fingerprint is very important. The local curvature of ridges of fingerprint is irregular, so people have the barrier to effectively extract the fingerprint curve features to describe fingerprint. This article proposes a novel algorithm; it embraces information of few nearby fingerprint ridges to extract a new characterstic which can describe the curvature feature of fingerprint. Experimental results show the algorithm is feasible, and the characteristics extracted by it can clearly show the inner macroscopic curve properties of fingerprint. The result also shows that this kind of characteristic is robust to noise and pollution.

  12. Innovations in individual feature history management - The significance of feature-based temporal model

    Science.gov (United States)

    Choi, J.; Seong, J.C.; Kim, B.; Usery, E.L.

    2008-01-01

    A feature relies on three dimensions (space, theme, and time) for its representation. Even though spatiotemporal models have been proposed, they have principally focused on the spatial changes of a feature. In this paper, a feature-based temporal model is proposed to represent the changes of both space and theme independently. The proposed model modifies the ISO's temporal schema and adds new explicit temporal relationship structure that stores temporal topological relationship with the ISO's temporal primitives of a feature in order to keep track feature history. The explicit temporal relationship can enhance query performance on feature history by removing topological comparison during query process. Further, a prototype system has been developed to test a proposed feature-based temporal model by querying land parcel history in Athens, Georgia. The result of temporal query on individual feature history shows the efficiency of the explicit temporal relationship structure. ?? Springer Science+Business Media, LLC 2007.

  13. Structural features for functional selectivity at serotonin receptors.

    Science.gov (United States)

    Wacker, Daniel; Wang, Chong; Katritch, Vsevolod; Han, Gye Won; Huang, Xi-Ping; Vardy, Eyal; McCorvy, John D; Jiang, Yi; Chu, Meihua; Siu, Fai Yiu; Liu, Wei; Xu, H Eric; Cherezov, Vadim; Roth, Bryan L; Stevens, Raymond C

    2013-05-01

    Drugs active at G protein-coupled receptors (GPCRs) can differentially modulate either canonical or noncanonical signaling pathways via a phenomenon known as functional selectivity or biased signaling. We report biochemical studies showing that the hallucinogen lysergic acid diethylamide, its precursor ergotamine (ERG), and related ergolines display strong functional selectivity for β-arrestin signaling at the 5-HT2B 5-hydroxytryptamine (5-HT) receptor, whereas they are relatively unbiased at the 5-HT1B receptor. To investigate the structural basis for biased signaling, we determined the crystal structure of the human 5-HT2B receptor bound to ERG and compared it with the 5-HT1B/ERG structure. Given the relatively poor understanding of GPCR structure and function to date, insight into different GPCR signaling pathways is important to better understand both adverse and favorable therapeutic activities. PMID:23519215

  14. Clustering Based Feature Learning on Variable Stars

    CERN Document Server

    Mackenzie, Cristóbal; Protopapas, Pavlos

    2016-01-01

    The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our ...

  15. A study of selected textural features usefulness for impervious surface coverage estimation using Landsat images

    Science.gov (United States)

    Bernat, Katarzyna; Drzewiecki, Wojciech

    2015-10-01

    The aim of our research was to evaluate the applicability of textural measures for sub-pixel impervious surfaces estimation using Landsat TM images based on machine learning algorithms. We put the particular focus on determining usefulness of five textural features groups in respect to pixel- and sub-pixel level. However, the two-stage approach to impervious surfaces coverage estimation was also tested. We compared the accuracy of impervious surfaces estimation using spectral bands only with results of imperviousness index estimation based on extended classification features sets (spectral band values supplemented with measures derived from various textural characteristics groups). Impervious surfaces coverage estimation was done using decision and regression trees based on C5.0 and Cubist algorithms. At the stage of classification the research area was divided into two categories: i) completely permeable (imperviousness index less than 1%) and ii) fully or partially impervious areas. At the stage of sub-pixel classification evaluation of percentage impervious surfaces coverage within single pixel was done. Based on the results of cross-validation, we selected the approaches guaranteeing the lowest means errors in terms of training set. Accuracy of the imperviousness index estimation was checked based on validation data set. The average error of hard classification using spectral features only was 6.5% and about 4.4% for spectral features combining with absolute gradient-based characteristics. The root mean square error (RMSE) of determination of the percentage impervious surfaces coverage within a single pixel was equal to 9.46% for the best tested classification features sets. The two-stage procedure was utilized for the primary approach involving spectral bands as the classification features set and for the approach guaranteeing the best accuracy for classification and regression stage. The results have shown that inclusion of textural measures into

  16. Margin-Maximizing Feature Elimination Methods for Linear and Nonlinear Kernel-Based Discriminant Functions

    OpenAIRE

    Aksu, Yaman; Miller, David J.; Kesidis, George; Yang, Qing X.

    2010-01-01

    Feature selection for classification in high-dimensional spaces can improve generalization, reduce classifier complexity, and identify important, discriminating feature “markers.” For support vector machine (SVM) classification, a widely used technique is recursive feature elimination (RFE). We demonstrate that RFE is not consistent with margin maximization, central to the SVM learning approach. We thus propose explicit margin-based feature elimination (MFE) for SVMs and demonstrate both impr...

  17. [Plant Spectral Discrimination Based on Phenological Features].

    Science.gov (United States)

    Zhang, Lei; Zhao, Jian-long; Jia, Kun; Li, Xiao-song

    2015-10-01

    Spectral analysis plays a significant role onplant characteristic identification and mechanism recognition, there were many papers published on the aspects of absorption features in the spectra of chlorophyll and moisture, spectral analysis onvegetation red edge effect, spectra profile feature extraction, spectra profile conversion, vegetation leaf structure and chemical composition impacts on the spectra in past years. However, fewer researches issued on spectral changes caused by plant seasonal changes of life form, chlorophyll, leaf area index. This paper studied on spectral observation of 11 plants of various life form, plant leaf structure and its size, phenological characteristics, they include deciduous forest with broad vertical leaf, needle leaf evergreen forest, needle leaf deciduous forest, deciduous forest with broadflat leaf, high shrub with big leaf, high shrub with little leaf, deciduous forest with broad little leaf, short shrub, meadow, steppe and grass. Field spectral data were observed with SVC-HR768 (Spectra Vista company, USA), the band width covers 350-2 500 nm, spectral resolution reaches 1-4 nm. The features of NDVI, spectral maximum absorption depth in green band, and spectral maximum absorption depth in red band were measured after continuum removal processing, the mean, amplitude and gradient of these features on seasonal change profile were analyzed, meanwhile, separability research on plant spectral feature of growth period and maturation period were compared. The paper presents a calculation method of separability of vegetation spectra which consider feature spatial distances. This index is carried on analysis of the vegetation discrimination. The results show that: the spectral features during plant growth period are easier to distinguish than them during maturation period. With the same features comparison, plant separability of growth period is 3 points higher than it during maturation period. The overall separabilityof vegetation

  18. Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection

    Directory of Open Access Journals (Sweden)

    Park Jinho

    2012-06-01

    Full Text Available Abstract Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1 the area between QRS offset and T-peak points, 2 the normalized and signed sum from QRS offset to effective zero voltage point, and 3 the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE and support vector machine (SVM methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical

  19. Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data.

    Science.gov (United States)

    Grissa, Dhouha; Pétéra, Mélanie; Brandolini, Marion; Napoli, Amedeo; Comte, Blandine; Pujos-Guillot, Estelle

    2016-01-01

    Untargeted metabolomics is a powerful phenotyping tool for better understanding biological mechanisms involved in human pathology development and identifying early predictive biomarkers. This approach, based on multiple analytical platforms, such as mass spectrometry (MS), chemometrics and bioinformatics, generates massive and complex data that need appropriate analyses to extract the biologically meaningful information. Despite various tools available, it is still a challenge to handle such large and noisy datasets with limited number of individuals without risking overfitting. Moreover, when the objective is focused on the identification of early predictive markers of clinical outcome, few years before occurrence, it becomes essential to use the appropriate algorithms and workflow to be able to discover subtle effects among this large amount of data. In this context, this work consists in studying a workflow describing the general feature selection process, using knowledge discovery and data mining methodologies to propose advanced solutions for predictive biomarker discovery. The strategy was focused on evaluating a combination of numeric-symbolic approaches for feature selection with the objective of obtaining the best combination of metabolites producing an effective and accurate predictive model. Relying first on numerical approaches, and especially on machine learning methods (SVM-RFE, RF, RF-RFE) and on univariate statistical analyses (ANOVA), a comparative study was performed on an original metabolomic dataset and reduced subsets. As resampling method, LOOCV was applied to minimize the risk of overfitting. The best k-features obtained with different scores of importance from the combination of these different approaches were compared and allowed determining the variable stabilities using Formal Concept Analysis. The results revealed the interest of RF-Gini combined with ANOVA for feature selection as these two complementary methods allowed selecting the 48

  20. Channel Selection and Feature Projection for Cognitive Load Estimation Using Ambulatory EEG

    Directory of Open Access Journals (Sweden)

    Tian Lan

    2007-01-01

    Full Text Available We present an ambulatory cognitive state classification system to assess the subject's mental load based on EEG measurements. The ambulatory cognitive state estimator is utilized in the context of a real-time augmented cognition (AugCog system that aims to enhance the cognitive performance of a human user through computer-mediated assistance based on assessments of cognitive states using physiological signals including, but not limited to, EEG. This paper focuses particularly on the offline channel selection and feature projection phases of the design and aims to present mutual-information-based techniques that use a simple sample estimator for this quantity. Analyses conducted on data collected from 3 subjects performing 2 tasks (n-back/Larson at 2 difficulty levels (low/high demonstrate that the proposed mutual-information-based dimensionality reduction scheme can achieve up to 94% cognitive load estimation accuracy.

  1. Analytical Features: A Knowledge-Based Approach to Audio Feature Generation

    Directory of Open Access Journals (Sweden)

    Pachet François

    2009-01-01

    Full Text Available We present a feature generation system designed to create audio features for supervised classification tasks. The main contribution to feature generation studies is the notion of analytical features (AFs, a construct designed to support the representation of knowledge about audio signal processing. We describe the most important aspects of AFs, in particular their dimensional type system, on which are based pattern-based random generators, heuristics, and rewriting rules. We show how AFs generalize or improve previous approaches used in feature generation. We report on several projects using AFs for difficult audio classification tasks, demonstrating their advantage over standard audio features. More generally, we propose analytical features as a paradigm to bring raw signals into the world of symbolic computation.

  2. Feature-based Ontology Mapping from an Information Receivers’ Viewpoint

    DEFF Research Database (Denmark)

    Glückstad, Fumiko Kano; Mørup, Morten

    2012-01-01

    This paper compares four algorithms for computing feature-based similarities between concepts respectively possessing a distinctive set of features. The eventual purpose of comparing these feature-based similarity algorithms is to identify a candidate term in a Target Language (TL) that can...

  3. SOFT COMPUTING BASED MEDICAL IMAGE RETRIEVAL USING SHAPE AND TEXTURE FEATURES

    Directory of Open Access Journals (Sweden)

    M. Mary Helta Daisy

    2014-01-01

    Full Text Available Image retrieval is a challenging and important research applications like digital libraries and medical image databases. Content-based image retrieval is useful in retrieving images from database based on the feature vector generated with the help of the image features. In this study, we present image retrieval based on the genetic algorithm. The shape feature and morphological based texture features are extracted images in the database and query image. Then generating chromosome based on the distance value obtained by the difference feature vector of images in the data base and the query image. In the selected chromosome the genetic operators like cross over and mutation are applied. After that the best chromosome selected and displays the most similar images to the query image. The retrieval performance of the method shows better retrieval result.

  4. COMPUTATIONALLY INEXPENSIVE SEQUENTIAL FORWARD FLOATING SELECTION FOR ACQUIRING SIGNIFICANT FEATURES FOR AUTHORSHIP INVARIANCENESS IN WRITER IDENTIFICATION

    OpenAIRE

    Satrya Fajri Pratama; Azah Kamilah Muda; Yun-Huoy Choo; and Noor Azilah Muda

    2011-01-01

    Handwriting is individualistic. The uniqueness of shape and style of handwriting can be used to identify the significant features in authenticating the author of writing. Acquiring these significant features leads to an important research in Writer Identification domain where to find the unique features of individual which also known as Individuality of Handwriting. This paper proposes an improved Sequential Forward Floating Selection method besides the exploration of significant features for...

  5. Surface characterization based upon significant topographic features

    Energy Technology Data Exchange (ETDEWEB)

    Blanc, J; Grime, D; Blateyron, F, E-mail: fblateyron@digitalsurf.fr [Digital Surf, 16 rue Lavoisier, F-25000 Besancon (France)

    2011-08-19

    Watershed segmentation and Wolf pruning, as defined in ISO 25178-2, allow the detection of significant features on surfaces and their characterization in terms of dimension, area, volume, curvature, shape or morphology. These new tools provide a robust way to specify functional surfaces.

  6. Evaluation of Meta-Heuristic Algorithms for Stable Feature Selection

    Directory of Open Access Journals (Sweden)

    Maysam Toghraee

    2016-07-01

    Full Text Available Now a days, developing the science and technology and technology tools, the ability of reviewing and saving the important data has been provided. It is needed to have knowledge for searching the data to reach the necessary useful results. Data mining is searching for big data sources automatically to find patterns and dependencies which are not done by simple statistical analysis. The scope is to study the predictive role and usage domain of data mining in medical science and suggesting a frame for creating, assessing and exploiting the data mining patterns in this field. As it has been found out from previous researches that assessing methods can not be used to specify the data discrepancies, our suggestion is a new approach for assessing the data similarities to find out the relations between the variation in data and stability in selection. Therefore we have chosen meta heuristic methods to be able to choose the best and the stable algorithms among a set of algorithms

  7. Feature Extraction with Ordered Mean Values for Content Based Image Classification

    Directory of Open Access Journals (Sweden)

    Sudeep Thepade

    2014-01-01

    Full Text Available Categorization of images into meaningful classes by efficient extraction of feature vectors from image datasets has been dependent on feature selection techniques. Traditionally, feature vector extraction has been carried out using different methods of image binarization done with selection of global, local, or mean threshold. This paper has proposed a novel technique for feature extraction based on ordered mean values. The proposed technique was combined with feature extraction using discrete sine transform (DST for better classification results using multitechnique fusion. The novel methodology was compared to the traditional techniques used for feature extraction for content based image classification. Three benchmark datasets, namely, Wang dataset, Oliva and Torralba (OT-Scene dataset, and Caltech dataset, were used for evaluation purpose. Performance measure after evaluation has evidently revealed the superiority of the proposed fusion technique with ordered mean values and discrete sine transform over the popular approaches of single view feature extraction methodologies for classification.

  8. Object learning improves feature extraction but does not improve feature selection.

    Directory of Open Access Journals (Sweden)

    Linus Holm

    Full Text Available A single glance at your crowded desk is enough to locate your favorite cup. But finding an unfamiliar object requires more effort. This superiority in recognition performance for learned objects has at least two possible sources. For familiar objects observers might: 1 select more informative image locations upon which to fixate their eyes, or 2 extract more information from a given eye fixation. To test these possibilities, we had observers localize fragmented objects embedded in dense displays of random contour fragments. Eight participants searched for objects in 600 images while their eye movements were recorded in three daily sessions. Performance improved as subjects trained with the objects: The number of fixations required to find an object decreased by 64% across the 3 sessions. An ideal observer model that included measures of fragment confusability was used to calculate the information available from a single fixation. Comparing human performance to the model suggested that across sessions information extraction at each eye fixation increased markedly, by an amount roughly equal to the extra information that would be extracted following a 100% increase in functional field of view. Selection of fixation locations, on the other hand, did not improve with practice.

  9. Prediction of protein modification sites of pyrrolidone carboxylic acid using mRMR feature selection and analysis.

    Directory of Open Access Journals (Sweden)

    Lu-Lu Zheng

    Full Text Available Pyrrolidone carboxylic acid (PCA is formed during a common post-translational modification (PTM of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR and incremental feature selection (IFS. We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations.

  10. Privacy Preserving for Feature Selection in Data Mining Using Centralized Network

    Directory of Open Access Journals (Sweden)

    Hemanta Kumar Bhuyan

    2012-05-01

    Full Text Available This paper proposed a feature selection with privacy preservation in centralized network. Data can be preserved for privacy by perturbation technique as alias name. In centralized data evaluation, it makes data classification and feature selection for data mining decision model which make the structural information of model in this paper. The application of gain ratio technique for better performance of feature selection has taken to perform the centralized computational task. All features don€Ÿt need to preserve the privacy for confidential data for best model. The chi-square testing has taken for the classification of data by centralized data mining model using own processing unit. The alias data model for privacy preserving data mining has taken to develop the data mining technique to make best model without violating the privacy individuals. The proposed process of data miner task has made best feature selection and two type experimental tests have taken in this paper.

  11. Content-Based Image Retrieval Using Multiple Features

    OpenAIRE

    Zhang, Chi; Huang, Lei

    2014-01-01

    Algorithms of Content-Based Image Retrieval (CBIR) have been well developed along with the explosion of information. These algorithms are mainly distinguished based on feature used to describe the image content. In this paper, the algorithms that are based on color feature and texture feature for image retrieval will be presented. Color Coherence Vector based image retrieval algorithm is also attempted during the implementation process, but the best result is generated from the algorithms tha...

  12. Straight line feature based image distortion correction

    Institute of Scientific and Technical Information of China (English)

    Zhang Haofeng; Zhao Chunxia; Lu Jianfeng; Tang Zhenmin; Yang Jingyu

    2008-01-01

    An image distortion correction method is proposed, which uses the straight line features. Many parallel lines of different direction from different images were extracted, and then were used to optimize the distortion parameters by nonlinear least square. The thought of step by step was added when the optimization method working. 3D world coordi-nation is not need to know, and the method is easy to implement. The experiment result shows its high accuracy.

  13. Optimal Timer Based Selection Schemes

    CERN Document Server

    Shah, Virag; Yim, Raymond

    2009-01-01

    Timer-based mechanisms are often used to help a given (sink) node select the best helper node among many available nodes. Specifically, a node transmits a packet when its timer expires, and the timer value is a monotone non-increasing function of its local suitability metric. The best node is selected successfully if no other node's timer expires within a 'vulnerability' window after its timer expiry, and so long as the sink can hear the available nodes. In this paper, we show that the optimal metric-to-timer mapping that (i) maximizes the probability of success or (ii) minimizes the average selection time subject to a minimum constraint on the probability of success, maps the metric into a set of discrete timer values. We specify, in closed-form, the optimal scheme as a function of the maximum selection duration, the vulnerability window, and the number of nodes. An asymptotic characterization of the optimal scheme turns out to be elegant and insightful. For any probability distribution function of the metri...

  14. Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters.

    Science.gov (United States)

    Li, Yifeng; Chen, Chih-Yu; Wasserman, Wyeth W

    2016-05-01

    Sparse linear models approximate target variable(s) by a sparse linear combination of input variables. Since they are simple, fast, and able to select features, they are widely used in classification and regression. Essentially they are shallow feed-forward neural networks that have three limitations: (1) incompatibility to model nonlinearity of features, (2) inability to learn high-level features, and (3) unnatural extensions to select features in a multiclass case. Deep neural networks are models structured by multiple hidden layers with nonlinear activation functions. Compared with linear models, they have two distinctive strengths: the capability to (1) model complex systems with nonlinear structures and (2) learn high-level representation of features. Deep learning has been applied in many large and complex systems where deep models significantly outperform shallow ones. However, feature selection at the input level, which is very helpful to understand the nature of a complex system, is still not well studied. In genome research, the cis-regulatory elements in noncoding DNA sequences play a key role in the expression of genes. Since the activity of regulatory elements involves highly interactive factors, a deep tool is strongly needed to discover informative features. In order to address the above limitations of shallow and deep models for selecting features of a complex system, we propose a deep feature selection (DFS) model that (1) takes advantages of deep structures to model nonlinearity and (2) conveniently selects a subset of features right at the input level for multiclass data. Simulation experiments convince us that this model is able to correctly identify both linear and nonlinear features. We applied this model to the identification of active enhancers and promoters by integrating multiple sources of genomic information. Results show that our model outperforms elastic net in terms of size of discriminative feature subset and classification accuracy

  15. Grammar-based feature generation for time-series prediction

    CERN Document Server

    De Silva, Anthony Mihirana

    2015-01-01

    This book proposes a novel approach for time-series prediction using machine learning techniques with automatic feature generation. Application of machine learning techniques to predict time-series continues to attract considerable attention due to the difficulty of the prediction problems compounded by the non-linear and non-stationary nature of the real world time-series. The performance of machine learning techniques, among other things, depends on suitable engineering of features. This book proposes a systematic way for generating suitable features using context-free grammar. A number of feature selection criteria are investigated and a hybrid feature generation and selection algorithm using grammatical evolution is proposed. The book contains graphical illustrations to explain the feature generation process. The proposed approaches are demonstrated by predicting the closing price of major stock market indices, peak electricity load and net hourly foreign exchange client trade volume. The proposed method ...

  16. Geometrically Invariant Watermarking Scheme Based on Local Feature Points

    Directory of Open Access Journals (Sweden)

    Jing Li

    2012-06-01

    Full Text Available Based on local invariant feature points and cross ratio principle, this paper presents a feature-point-based image watermarking scheme. It is robust to geometric attacks and some signal processes. It extracts local invariant feature points from the image using the improved scale invariant feature transform algorithm. Utilizing these points as vertexes it constructs some quadrilaterals to be as local feature regions. Watermark is inserted these local feature regions repeatedly. In order to get stable local regions it adjusts the number and distribution of extracted feature points. In every chosen local feature region it decides locations to embed watermark bits based on the cross ratio of four collinear points, the cross ratio is invariant to projective transformation. Watermark bits are embedded by quantization modulation, in which the quantization step value is computed with the given PSNR. Experimental results show that the proposed method can strongly fight more geometrical attacks and the compound attacks of geometrical ones.

  17. AN INTEGRATED FRAMEWORK BASED ON TEXTURE FEATURES, CUCKOO SEARCH AND RELEVANCE VECTOR MACHINE FOR MEDICAL IMAGE RETRIEVAL SYSTEM

    Directory of Open Access Journals (Sweden)

    Yogapriya Jaganathan

    2013-01-01

    Full Text Available As medical images are widely used in healthcare applications, Content Based Medical Image Retrieval (CBMIR system is needed for physicians to convey effective decisions to patients and for medical research students to learn imaging characteristics for their extensive research based on visual features. However the performance of the retrieval is restricted due to high feature dimensionality of visual features. To reduce the high feature dimension, an integrated approach is proposed such as Visual feature extraction, Feature selection, Feature Classification and Similarity measurements. The selected feature is texture features by using Local Binary Patterns (LBP in which extracted texture features are designed as feature vector database. Fuzzy based Cuckoo Search (FCKS techniques are applied for feature selection to reduce the high feature vector dimensionality and addresses the difficulty of feature vectors being surrounded in local feature optima also the global optimum feature position to be special for all feature cuckoo hosts. Fuzzy based Relevance Vector Machine (FRVM classification is an proficient method to customize the collections of relevant image features that would classify dimensionally determined optimized feature vectors of images. The Euclidean Distance (ED is a standard technique for similarity measurement between the query image and the image database. The proposed system is implemented on thousands of medical images and achieved a high retrieval precision and recall compared with other two methods as validated through experiments.

  18. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine.

    Science.gov (United States)

    Xi, Maolong; Sun, Jun; Liu, Li; Fan, Fangyun; Wu, Xiaojun

    2016-01-01

    This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO) for cancer feature gene selection, coupling support vector machine (SVM) for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV). Finally, the BQPSO coupling SVM (BQPSO/SVM), binary PSO coupling SVM (BPSO/SVM), and genetic algorithm coupling SVM (GA/SVM) are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms. PMID:27642363

  19. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Maolong Xi

    2016-01-01

    Full Text Available This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO for cancer feature gene selection, coupling support vector machine (SVM for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV. Finally, the BQPSO coupling SVM (BQPSO/SVM, binary PSO coupling SVM (BPSO/SVM, and genetic algorithm coupling SVM (GA/SVM are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms.

  20. Document image retrieval based on multi-density features

    Institute of Scientific and Technical Information of China (English)

    HU Zhilan; LIN Xinggang; YAN Hong

    2007-01-01

    The development of document image databases is becoming a challenge for document image retrieval techniques.Traditional layout-reconstructed-based methods rely on high quality document images as well as an optical character recognition (OCR) precision,and can only deal with several widely used languages.The complexity of document layouts greatly hinders layout analysis-based approaches.This paper describes a multi-density feature based algorithm for binary document images,which is independent of OCR or layout analyses.The text area was extracted after preprocessing such as skew correction and marginal noise removal.Then the aspect ratio and multi-density features were extracted from the text area to select the best candidates from the document image database.Experimental results show that this approach is simple with loss rates less than 3% and can efficiently analyze images with different resolutions and different input systems.The system is also robust to noise due to its notes and complex layouts,etc.

  1. Dwt - Based Feature Extraction from ecg Signal

    Directory of Open Access Journals (Sweden)

    V.K.Srivastava

    2013-01-01

    Full Text Available Electrocardiogram is used to measure the rate and regularity of heartbeats to detect any irregularity to the heart. An ECG translates the heart electrical activity into wave-line on paper or screen. For the feature extraction and classification task we will be using discrete wavelet transform (DWT as wavelet transform is a two-dimensional timescale processing method, so it is suitable for the non-stationary ECG signals(due to adequate scale values and shifting in time. Then the data will be analyzed and classified using neuro-fuzzy which is a hybrid of artificial neural networks and fuzzy logic.

  2. Maize Seed Variety Classification Using the Integration of Spectral and Image Features Combined with Feature Transformation Based on Hyperspectral Imaging

    Directory of Open Access Journals (Sweden)

    Min Huang

    2016-06-01

    Full Text Available Hyperspectral imaging (HSI technology has been extensively studied in the classification of seed variety. A novel procedure for the classification of maize seed varieties based on HSI was proposed in this study. The optimal wavelengths for the classification of maize seed varieties were selected using the successive projections algorithm (SPA to improve the acquiring and processing speed of HSI. Subsequently, spectral and imaging features were extracted from regions of interest of the hyperspectral images. Principle component analysis and multidimensional scaling were then introduced to transform/reduce the classification features for overcoming the risk of dimension disaster caused by the use of a large number of features. Finally, the integrating features were used to develop a least squares–support vector machines (LS–SVM model. The LS–SVM model, using the integration of spectral and image features combined with feature transformation methods, achieved more than 90% of test accuracy, which was better than the 83.68% obtained by model using the original spectral and image features, and much higher than the 76.18% obtained by the model only using the spectral features. This procedure provides a possible way to apply the multispectral imaging system to classify seed varieties with high accuracy.

  3. A Combined Approach for Feature Subset Selection and Size Reduction for High Dimensional Data

    Directory of Open Access Journals (Sweden)

    Anurag Dwivedi,

    2015-09-01

    Full Text Available selection of relevant feature from a given set of feature is one of the important issues in the field of data mining as well as classification. In general the dataset may contain a number of features however it is not necessary that the whole set features are important for particular analysis of decision making because the features may share the common information‟s and can also be completely irrelevant to the undergoing processing. This generally happen because of improper selection of features during the dataset formation or because of improper information availability about the observed system. However in both cases the data will contain the features that will just increase the processing burden which may ultimately cause the improper outcome when used for analysis. Because of these reasons some kind of methods are required to detect and remove these features hence in this paper we are presenting an efficient approach for not just removing the unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information theory to detect the information gain from each feature and minimum span tree to group the similar features with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the performances of the classifier.

  4. Kernel based visual tracking with scale invariant features

    Institute of Scientific and Technical Information of China (English)

    Risheng Han; Zhongliang Jing; Yuanxiang Li

    2008-01-01

    The kernel based tracking has two disadvantages:the tracking window size cannot be adjusted efficiently,and the kernel based color distribution may not have enough ability to discriminate object from clutter background.FDr boosting up the feature's discriminating ability,both scale invariant features and kernel based color distribution features are used as descriptors of tracked object.The proposed algorithm can keep tracking object of varying scales even when the surrounding background is similar to the object's appearance.

  5. Study on Isomerous CAD Model Exchange Based on Feature

    Institute of Scientific and Technical Information of China (English)

    SHAO Xiaodong; CHEN Feng; XU Chenguang

    2006-01-01

    A model-exchange method based on feature between isomerous CAD systems is put forward in this paper. In this method, CAD model information is accessed at both feature and geometry levels and converted according to standard feature operation. The feature information including feature tree, dimensions and constraints, which will be lost in traditional data conversion, as well as geometry are converted completely from source CAD system to destination one. So the transferred model can be edited through feature operation, which cannot be implemented by general model-exchange interface.

  6. Ensemble based system for whole-slide prostate cancer probability mapping using color texture features.

    LENUS (Irish Health Repository)

    DiFranco, Matthew D

    2011-01-01

    We present a tile-based approach for producing clinically relevant probability maps of prostatic carcinoma in histological sections from radical prostatectomy. Our methodology incorporates ensemble learning for feature selection and classification on expert-annotated images. Random forest feature selection performed over varying training sets provides a subset of generalized CIEL*a*b* co-occurrence texture features, while sample selection strategies with minimal constraints reduce training data requirements to achieve reliable results. Ensembles of classifiers are built using expert-annotated tiles from training images, and scores for the probability of cancer presence are calculated from the responses of each classifier in the ensemble. Spatial filtering of tile-based texture features prior to classification results in increased heat-map coherence as well as AUC values of 95% using ensembles of either random forests or support vector machines. Our approach is designed for adaptation to different imaging modalities, image features, and histological decision domains.

  7. Feature Extraction for Facial Expression Recognition based on Hybrid Face Regions

    Directory of Open Access Journals (Sweden)

    LAJEVARDI, S.M.

    2009-10-01

    Full Text Available Facial expression recognition has numerous applications, including psychological research, improved human computer interaction, and sign language translation. A novel facial expression recognition system based on hybrid face regions (HFR is investigated. The expression recognition system is fully automatic, and consists of the following modules: face detection, facial detection, feature extraction, optimal features selection, and classification. The features are extracted from both whole face image and face regions (eyes and mouth using log Gabor filters. Then, the most discriminate features are selected based on mutual information criteria. The system can automatically recognize six expressions: anger, disgust, fear, happiness, sadness and surprise. The selected features are classified using the Naive Bayesian (NB classifier. The proposed method has been extensively assessed using Cohn-Kanade database and JAFFE database. The experiments have highlighted the efficiency of the proposed HFR method in enhancing the classification rate.

  8. CONSTRUCTION AND MODIFICATION OF FLEXIBLE FEATURE-BASED MODELS

    Institute of Scientific and Technical Information of China (English)

    1999-01-01

    A new approach is proposed to generate flexible featrure-based models (FFBM), which can be modified dynamically. BRep/CSFG/FRG hybrid scheme is used to describe FFBM, in which BRep explicitly defines the model, CSFG (Constructive solid-feature geometry) tree records the feature-based modelling procedure and FRG (Feature relation graph) reflects different knids of relationship among features. Topological operators with local retrievability are designed to implement feature addition, which is traced by topological operation list (TOL) in detail. As a result, FFBM can be modified directly in the system database. Related features' chain reactions and variable topologies are supported in design modification, after which the product information adhering on features will not be lost. Further, a feature can be modified as rapidly as it was added.

  9. Fuzzy - Rough Feature Selection With {\\Pi}- Membership Function For Mammogram Classification

    CERN Document Server

    Thangavel, K

    2012-01-01

    Breast cancer is the second leading cause for death among women and it is diagnosed with the help of mammograms. Oncologists are miserably failed in identifying the micro calcification at the early stage with the help of the mammogram visually. In order to improve the performance of the breast cancer screening, most of the researchers have proposed Computer Aided Diagnosis using image processing. In this study mammograms are preprocessed and features are extracted, then the abnormality is identified through the classification. If all the extracted features are used, most of the cases are misidentified. Hence feature selection procedure is sought. In this paper, Fuzzy-Rough feature selection with {\\pi} membership function is proposed. The selected features are used to classify the abnormalities with help of Ant-Miner and Weka tools. The experimental analysis shows that the proposed method improves the mammograms classification accuracy.

  10. Hybridization of Evolutionary Mechanisms for Feature Subset Selection in Unsupervised Learning

    Science.gov (United States)

    Torres, Dolores; Ponce-de-León, Eunice; Torres, Aurora; Ochoa, Alberto; Díaz, Elva

    Feature subset selection for unsupervised learning, is a very important topic in artificial intelligence because it is the base for saving computational resources. In this implementation we use a typical testor’s methodology in order to incorporate an importance index for each variable. This paper presents the general framework and the way two hybridized meta-heuristics work in this NP-complete problem. The evolutionary mechanisms are based on the Univariate Marginal Distribution Algorithm (UMDA) and the Genetic Algorithm (GA). GA and UMDA - Estimation of Distribution Algorithm (EDA) use a very useful rapid operator implemented for finding typical testors on a very large dataset and also, both algorithms, have a local search mechanism for improving time and fitness. Experiments show that EDA is faster than GA because it has a better exploitation performance; nevertheless, GA’ solutions are more consistent.

  11. SELECTED FEATURES OF NEW ECONOMY AND ITS IMPACT ON BUSINESS MANAGEMENT

    OpenAIRE

    Marie Mikusova

    2010-01-01

    Article focuses on selected features of the new economy, especially globalization, changes brought about as a consequence, new skills required for management, including a shift in the indicators for assessing business performance.

  12. Robust speech features representation based on computational auditory model

    Institute of Scientific and Technical Information of China (English)

    LU Xugang; JIA Chuan; DANG Jianwu

    2004-01-01

    A speech signal processing and features extracting method based on computational auditory model is proposed. The computational model is based on psychological, physiological knowledge and digital signal processing methods. In each stage of a hearing perception system, there is a corresponding computational model to simulate its function. Based on this model, speech features are extracted. In each stage, the features in different kinds of level are extracted. A further processing for primary auditory spectrum based on lateral inhibition is proposed to extract much more robust speech features. All these features can be regarded as the internal representations of speech stimulation in hearing system. The robust speech recognition experiments are conducted to test the robustness of the features. Results show that the representations based on the proposed computational auditory model are robust representations for speech signals.

  13. Features Fusion Based on FLD for Face Recognition

    OpenAIRE

    Changjun Zhou; Qiang Zhang; Xiaopeng Wei; Ziqi Wei

    2010-01-01

    In this paper, we introduced a features fusion method for face recognition based on Fisher’s Linear Discriminant (FLD). The method extract features by employed Two-Dimensional principal component analysis (2DPCA) and Gabor wavelets, and then fuse their features which are extracted with FLD respectively. As a holistic feature extraction method, 2DPCA performs dimensional reduction to the input dataset while retaining characteristics of the dataset that contribute most to its variance by elimin...

  14. The role of thalamic population synchrony in the emergence of cortical feature selectivity.

    Directory of Open Access Journals (Sweden)

    Sean T Kelly

    2014-01-01

    Full Text Available In a wide range of studies, the emergence of orientation selectivity in primary visual cortex has been attributed to a complex interaction between feed-forward thalamic input and inhibitory mechanisms at the level of cortex. Although it is well known that layer 4 cortical neurons are highly sensitive to the timing of thalamic inputs, the role of the stimulus-driven timing of thalamic inputs in cortical orientation selectivity is not well understood. Here we show that the synchronization of thalamic firing contributes directly to the orientation tuned responses of primary visual cortex in a way that optimizes the stimulus information per cortical spike. From the recorded responses of geniculate X-cells in the anesthetized cat, we synthesized thalamic sub-populations that would likely serve as the synaptic input to a common layer 4 cortical neuron based on anatomical constraints. We used this synchronized input as the driving input to an integrate-and-fire model of cortical responses and demonstrated that the tuning properties match closely to those measured in primary visual cortex. By modulating the overall level of synchronization at the preferred orientation, we show that efficiency of information transmission in the cortex is maximized for levels of synchronization which match those reported in thalamic recordings in response to naturalistic stimuli, a property which is relatively invariant to the orientation tuning width. These findings indicate evidence for a more prominent role of the feed-forward thalamic input in cortical feature selectivity based on thalamic synchronization.

  15. Accurate Image Retrieval Algorithm Based on Color and Texture Feature

    Directory of Open Access Journals (Sweden)

    Chunlai Yan

    2013-06-01

    Full Text Available Content-Based Image Retrieval (CBIR is one of the most active hot spots in the current research field of multimedia retrieval. According to the description and extraction of visual content (feature of the image, CBIR aims to find images that contain specified content (feature in the image database. In this paper, several key technologies of CBIR, e. g. the extraction of the color and texture features of the image, as well as the similarity measures are investigated. On the basis of the theoretical research, an image retrieval system based on color and texture features is designed. In this system, the Weighted Color Feature based on HSV space is adopted as a color feature vector, four features of the Co-occurrence Matrix, saying Energy, Entropy, Inertia Quadrature and Correlation, are used to construct texture vectors, and the Euclidean distance for similarity measure is employed as well. Experimental results show that this CBIR system is efficient in image retrieval.

  16. A new approach for EEG feature extraction in P300-based lie detection.

    Science.gov (United States)

    Abootalebi, Vahid; Moradi, Mohammad Hassan; Khalilzadeh, Mohammad Ali

    2009-04-01

    P300-based Guilty Knowledge Test (GKT) has been suggested as an alternative approach for conventional polygraphy. The purpose of this study was to extend a previously introduced pattern recognition method for the ERP assessment in this application. This extension was done by the further extending the feature set and also the employing a method for the selection of optimal features. For the evaluation of the method, several subjects went through the designed GKT paradigm and their respective brain signals were recorded. Next, a P300 detection approach based on some features and a statistical classifier was implemented. The optimal feature set was selected using a genetic algorithm from a primary feature set including some morphological, frequency and wavelet features and was used for the classification of the data. The rates of correct detection in guilty and innocent subjects were 86%, which was better than other previously used methods. PMID:19041154

  17. ViZPar: A GUI for ZPar with Manual Feature Selection

    OpenAIRE

    Ortiz, Isabel; Ballesteros Martínez, Miguel; Zhang, Yue

    2014-01-01

    Phrase-structure and dependency parsers are used massively in the Natural Language Processing community. ZPar implements fast and accurate versions of shift-reduce dependency and phrase-structure parsing algorithms. We present ViZPar, a tool that enhances the usability of ZPar, including parameter selection and output visualization. Moreover, ViZPar allows manual feature selection which makes the tool very useful for people interested in obtaining the best parser through feature engineering, ...

  18. Segmentation-Based PolSAR Image Classification Using Visual Features: RHLBP and Color Features

    Directory of Open Access Journals (Sweden)

    Jian Cheng

    2015-05-01

    Full Text Available A segmentation-based fully-polarimetric synthetic aperture radar (PolSAR image classification method that incorporates texture features and color features is designed and implemented. This method is based on the framework that conjunctively uses statistical region merging (SRM for segmentation and support vector machine (SVM for classification. In the segmentation step, we propose an improved local binary pattern (LBP operator named the regional homogeneity local binary pattern (RHLBP to guarantee the regional homogeneity in PolSAR images. In the classification step, the color features extracted from false color images are applied to improve the classification accuracy. The RHLBP operator and color features can provide discriminative information to separate those pixels and regions with similar polarimetric features, which are from different classes. Extensive experimental comparison results with conventional methods on L-band PolSAR data demonstrate the effectiveness of our proposed method for PolSAR image classification.

  19. Audio Watermarking Algorithm Based on Centroid and Statistical Features

    Science.gov (United States)

    Zhang, Xiaoming; Yin, Xiong

    Experimental testing shows that the relative relation in the number of samples among the neighboring bins and the audio frequency centroid are two robust features to the Time Scale Modification (TSM) attacks. Accordingly, an audio watermark algorithm based on frequency centroid and histogram is proposed by modifying the frequency coefficients. The audio histogram with equal-sized bins is extracted from a selected frequency coefficient range referred to the audio centroid. The watermarked audio signal is perceptibly similar to the original one. The experimental results show that the algorithm is very robust to resample TSM and a variety of common attacks. Subjective quality evaluation of the algorithm shows that embedded watermark introduces low, inaudible distortion of host audio signal.

  20. License application design selection feature report: Additive and fillers design feature 19

    International Nuclear Information System (INIS)

    The estimated additional total system life-cycle cost for each of the filler options in 1999 dollars is as follows: $923.4 million for the iron oxide option, $42.4 million to $966.4 million (depending on the extent of surface facility involvement required) for the partial iron shot fill option, $1,012 million for the complete iron shot fill option, and $134.7 million for the integral filler option (Appendix A). All of the filler options evaluated showed improvements in some aspects of pre- and post-closure waste package and repository performance. However, all of the options, except for the integral filler option, negatively impacted other areas of performance, required modification to surface facility design and operations, and invoked additional uncertainty. The iron oxide filler option will require further testing to measure thermal conductivity to ensure that peak cladding temperatures will not exceed the 350 C limit. The complete iron shot fill option may require structural improvements to the waste package design (use of partial shot fill may eliminate this concern). Both the iron shot and iron oxide options will also require further testing to confirm that the conceptual loading strategy will efficiently load a waste package in a timely manner. In addition, both shot and oxide options will require further testing to develop models for their potential to provide resistance to water flow, and, in the case of iron shot, act as an oxygen getter. Finally, uncertainty also exists as to whether the iron shot option will damage the cladding if sufficient corrosion of the shot occurs. Based on the results presented in this evaluation, the integral filler option appears to be the simplest and most cost efficient method for achieving modest improvements in pre- and post-closure performance. Since unqualified inputs were used in the development of this evaluation, they should be considered TBV (to be verified). This document will not directly support any construction

  1. Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data

    OpenAIRE

    Yamada, Makoto; Tang, Jiliang; Lugo-Martinez, Jose; Hodzic, Ermin; Shrestha, Raunak; Saha, Avishek; Ouyang, Hua; Yin, Dawei; Mamitsuka, Hiroshi; Sahinalp, Cenk; Radivojac, Predrag; Menczer, Filippo; Chang, Yi

    2016-01-01

    Machine learning methods are used to discover complex nonlinear relationships in biological and medical data. However, sophisticated learning models are computationally unfeasible for data with millions of features. Here we introduce the first feature selection method for nonlinear learning problems that can scale up to large, ultra-high dimensional biological data. More specifically, we scale up the novel Hilbert-Schmidt Independence Criterion Lasso (HSIC Lasso) to handle millions of feature...

  2. Double feature selection and cluster analyses in mining of microarray data from cotton

    Directory of Open Access Journals (Sweden)

    Wilkins Thea A

    2008-06-01

    Full Text Available Abstract Background Cotton fiber is a single-celled seed trichome of major biological and economic importance. In recent years, genomic approaches such as microarray-based expression profiling were used to study fiber growth and development to understand the developmental mechanisms of fiber at the molecular level. The vast volume of microarray expression data generated requires a sophisticated means of data mining in order to extract novel information that addresses fundamental questions of biological interest. One of the ways to approach microarray data mining is to increase the number of dimensions/levels to the analysis, such as comparing independent studies from different genotypes. However, adding dimensions also creates a challenge in finding novel ways for analyzing multi-dimensional microarray data. Results Mining of independent microarray studies from Pima and Upland (TM1 cotton using double feature selection and cluster analyses identified species-specific and stage-specific gene transcripts that argue in favor of discrete genetic mechanisms that govern developmental programming of cotton fiber morphogenesis in these two cultivated species. Double feature selection analysis identified the highest number of differentially expressed genes that distinguish the fiber transcriptomes of developing Pima and TM1 fibers. These results were based on the finding that differences in fibers harvested between 17 and 24 day post-anthesis (dpa represent the greatest expressional distance between the two species. This powerful selection method identified a subset of genes expressed during primary (PCW and secondary (SCW cell wall biogenesis in Pima fibers that exhibits an expression pattern that is generally reversed in TM1 at the same developmental stage. Cluster and functional analyses revealed that this subset of genes are primarily regulated during the transition stage that overlaps the termination of PCW and onset of SCW biogenesis, suggesting

  3. Semantic 3D scene interpretation: A framework combining optimal neighborhood size selection with relevant features

    Science.gov (United States)

    Weinmann, M.; Jutzi, B.; Mallet, C.

    2014-08-01

    3D scene analysis by automatically assigning 3D points a semantic label has become an issue of major interest in recent years. Whereas the tasks of feature extraction and classification have been in the focus of research, the idea of using only relevant and more distinctive features extracted from optimal 3D neighborhoods has only rarely been addressed in 3D lidar data processing. In this paper, we focus on the interleaved issue of extracting relevant, but not redundant features and increasing their distinctiveness by considering the respective optimal 3D neighborhood of each individual 3D point. We present a new, fully automatic and versatile framework consisting of four successive steps: (i) optimal neighborhood size selection, (ii) feature extraction, (iii) feature selection, and (iv) classification. In a detailed evaluation which involves 5 different neighborhood definitions, 21 features, 6 approaches for feature subset selection and 2 different classifiers, we demonstrate that optimal neighborhoods for individual 3D points significantly improve the results of scene interpretation and that the selection of adequate feature subsets may even further increase the quality of the derived results.

  4. Multifinger Feature Level Fusion Based Fingerprint Identification

    OpenAIRE

    Praveen N; Tessamma Thomas

    2012-01-01

    Fingerprint based authentication systems are one of the cost-effective biometric authentication techniques employed for personal identification. As the data base population increases, fast identification/recognition algorithms are required with high accuracy. Accuracy can be increased using multimodal evidences collected by multiple biometric traits. In this work, consecutive fingerprint images are taken, global singularities are located using directional field strength and their local orient...

  5. Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2013-01-01

    Full Text Available Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.

  6. Gender Classification Based on Geometry Features of Palm Image

    OpenAIRE

    Ming Wu; Yubo Yuan

    2014-01-01

    This paper presents a novel gender classification method based on geometry features of palm image which is simple, fast, and easy to handle. This gender classification method based on geometry features comprises two main attributes. The first one is feature extraction by image processing. The other one is classification system with polynomial smooth support vector machine (PSSVM). A total of 180 palm images were collected from 30 persons to verify the validity of the proposed gender classi...

  7. Fingerprint image segmentation based on multi-features histogram analysis

    Science.gov (United States)

    Wang, Peng; Zhang, Youguang

    2007-11-01

    An effective fingerprint image segmentation based on multi-features histogram analysis is presented. We extract a new feature, together with three other features to segment fingerprints. Two of these four features, each of which is related to one of the other two, are reciprocals with each other, so features are divided into two groups. These two features' histograms are calculated respectively to determine which feature group is introduced to segment the aim-fingerprint. The features could also divide fingerprints into two classes with high and low quality. Experimental results show that our algorithm could classify foreground and background effectively with lower computational cost, and it can also reduce pseudo-minutiae detected and improve the performance of AFIS.

  8. A new approach to modeling the influence of image features on fixation selection in scenes.

    Science.gov (United States)

    Nuthmann, Antje; Einhäuser, Wolfgang

    2015-03-01

    Which image characteristics predict where people fixate when memorizing natural images? To answer this question, we introduce a new analysis approach that combines a novel scene-patch analysis with generalized linear mixed models (GLMMs). Our method allows for (1) directly describing the relationship between continuous feature value and fixation probability, and (2) assessing each feature's unique contribution to fixation selection. To demonstrate this method, we estimated the relative contribution of various image features to fixation selection: luminance and luminance contrast (low-level features); edge density (a mid-level feature); visual clutter and image segmentation to approximate local object density in the scene (higher-level features). An additional predictor captured the central bias of fixation. The GLMM results revealed that edge density, clutter, and the number of homogenous segments in a patch can independently predict whether image patches are fixated or not. Importantly, neither luminance nor contrast had an independent effect above and beyond what could be accounted for by the other predictors. Since the parcellation of the scene and the selection of features can be tailored to the specific research question, our approach allows for assessing the interplay of various factors relevant for fixation selection in scenes in a powerful and flexible manner. PMID:25752239

  9. Voltammetric Electronic Tongue and Support Vector Machines for Identification of Selected Features in Mexican Coffee

    Directory of Open Access Journals (Sweden)

    Rocio Berenice Domínguez

    2014-09-01

    Full Text Available This paper describes a new method based on a voltammetric electronic tongue (ET for the recognition of distinctive features in coffee samples. An ET was directly applied to different samples from the main Mexican coffee regions without any pretreatment before the analysis. The resulting electrochemical information was modeled with two different mathematical tools, namely Linear Discriminant Analysis (LDA and Support Vector Machines (SVM. Growing conditions (i.e., organic or non-organic practices and altitude of crops were considered for a first classification. LDA results showed an average discrimination rate of 88% ± 6.53% while SVM successfully accomplished an overall accuracy of 96.4% ± 3.50% for the same task. A second classification based on geographical origin of samples was carried out. Results showed an overall accuracy of 87.5% ± 7.79% for LDA and a superior performance of 97.5% ± 3.22% for SVM. Given the complexity of coffee samples, the high accuracy percentages achieved by ET coupled with SVM in both classification problems suggested a potential applicability of ET in the assessment of selected coffee features with a simpler and faster methodology along with a null sample pretreatment. In addition, the proposed method can be applied to authentication assessment while improving cost, time and accuracy of the general procedure.

  10. Efficient IRIS Recognition through Improvement of Feature Extraction and subset Selection

    CERN Document Server

    Azizi, Amir

    2009-01-01

    The selection of the optimal feature subset and the classification has become an important issue in the field of iris recognition. In this paper we propose several methods for iris feature subset selection and vector creation. The deterministic feature sequence is extracted from the iris image by using the contourlet transform technique. Contourlet transform captures the intrinsic geometrical structures of iris image. It decomposes the iris image into a set of directional sub-bands with texture details captured in different orientations at various scales so for reducing the feature vector dimensions we use the method for extract only significant bit and information from normalized iris images. In this method we ignore fragile bits. And finally we use SVM (Support Vector Machine) classifier for approximating the amount of people identification in our proposed system. Experimental result show that most proposed method reduces processing time and increase the classification accuracy and also the iris feature vec...

  11. Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems

    OpenAIRE

    Besse Philippe; Boitard Simon; Lê Cao Kim-Anh

    2011-01-01

    Abstract Background Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches c...

  12. Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

    CERN Document Server

    Mairal, Julien

    2012-01-01

    We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to take into account the problem structure, and automatically select a subgraph with a small number of connected components. By exploiting prior knowledge, one can indeed improve the prediction performance and/or obtain better interpretable results. Regularization or penalty functions for selecting features in graphs have recently been proposed but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and "well connected" subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called "path coding" penalties. Unlike existing regularization functions, path coding penalties can both model long range ...

  13. YamiPred: A novel evolutionary method for predicting pre-miRNAs and selecting relevant features

    KAUST Repository

    Kleftogiannis, Dimitrios

    2015-01-23

    MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of Support Vector Machines (SVM) with Genetic Algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.

  14. Remote sensing image classification based on block feature point density analysis and multiple-feature fusion

    Science.gov (United States)

    Li, Shijin; Jiang, Yaping; Zhang, Yang; Feng, Jun

    2015-10-01

    With the development of remote sensing (RS) and the related technologies, the resolution of RS images is enhancing. Compared with moderate or low resolution images, high-resolution ones can provide more detailed ground information. However, a variety of terrain has complex spatial distribution. The different objectives of high-resolution images have a variety of features. The effectiveness of these features is not the same, but some of them are complementary. Considering the above information and characteristics, a new method is proposed to classify RS images based on hierarchical fusion of multi-features. Firstly, RS images are pre-classified into two categories in terms of whether feature points are uniformly or non-uniformly distributed. Then, the color histogram and Gabor texture feature are extracted from the uniformly-distributed categories, and the linear spatial pyramid matching using sparse coding (ScSPM) feature is obtained from the non-uniformly-distributed categories. Finally, the classification is performed by two support vector machine classifiers. The experimental results on a large RS image database with 2100 images show that the overall classification accuracy is boosted by 10.1% in comparison with the highest accuracy of single feature classification method. Compared with other multiple-feature fusion methods, the proposed method has achieved the highest classification accuracy on this dataset which has reached 90.1%, and the time complexity of the algorithm is also greatly reduced.

  15. Stereo vision-based pedestrian detection using multiple features for automotive application

    Science.gov (United States)

    Lee, Chung-Hee; Kim, Dongyoung

    2015-12-01

    In this paper, we propose a stereo vision-based pedestrian detection using multiple features for automotive application. The disparity map from stereo vision system and multiple features are utilized to enhance the pedestrian detection performance. Because the disparity map offers us 3D information, which enable to detect obstacles easily and reduce the overall detection time by removing unnecessary backgrounds. The road feature is extracted from the v-disparity map calculated by the disparity map. The road feature is a decision criterion to determine the presence or absence of obstacles on the road. The obstacle detection is performed by comparing the road feature with all columns in the disparity. The result of obstacle detection is segmented by the bird's-eye-view mapping to separate the obstacle area which has multiple objects into single obstacle area. The histogram-based clustering is performed in the bird's-eye-view map. Each segmented result is verified by the classifier with the training model. To enhance the pedestrian recognition performance, multiple features such as HOG, CSS, symmetry features are utilized. In particular, the symmetry feature is proper to represent the pedestrian standing or walking. The block-based symmetry feature is utilized to minimize the type of image and the best feature among the three symmetry features of H-S-V image is selected as the symmetry feature in each pixel. ETH database is utilized to verify our pedestrian detection algorithm.

  16. A HYBRID APPROACH BASED MEDICAL IMAGE RETRIEVAL SYSTEM USING FEATURE OPTIMIZED CLASSIFICATION SIMILARITY FRAMEWORK

    Directory of Open Access Journals (Sweden)

    Yogapriya Jaganathan

    2013-01-01

    Full Text Available For the past few years, massive upgradation is obtained in the pasture of Content Based Medical Image Retrieval (CBMIR for effective utilization of medical images based on visual feature analysis for the purpose of diagnosis and educational research. The existing medical image retrieval systems are still not optimal to solve the feature dimensionality reduction problem which increases the computational complexity and decreases the speed of a retrieval process. The proposed CBMIR is used a hybrid approach based on Feature Extraction, Optimization of Feature Vectors, Classification of Features and Similarity Measurements. This type of CBMIR is called Feature Optimized Classification Similarity (FOCS framework. The selected features are Textures using Gray level Co-occurrence Matrix Features (GLCM and Tamura Features (TF in which extracted features are formed as feature vector database. The Fuzzy based Particle Swarm Optimization (FPSO technique is used to reduce the feature vector dimensionality and classification is performed using Fuzzy based Relevance Vector Machine (FRVM to form groups of relevant image features that provide a natural way to classify dimensionally reduced feature vectors of images. The Euclidean Distance (ED is used as similarity measurement to measure the significance between the query image and the target images. This FOCS approach can get the query from the user and has retrieved the needed images from the databases. The retrieval algorithm performances are estimated in terms of precision and recall. This FOCS framework comprises several benefits when compared to existing CBMIR. GLCM and TF are used to extract texture features and form a feature vector database. Fuzzy-PSO is used to reduce the feature vector dimensionality issues while selecting the important features in the feature vector database in which computational complexity is decreased. Fuzzy based RVM is used for feature classification in which it increases the

  17. Moment feature based fast feature extraction algorithm for moving object detection using aerial images.

    Directory of Open Access Journals (Sweden)

    A F M Saifuddin Saif

    Full Text Available Fast and computationally less complex feature extraction for moving object detection using aerial images from unmanned aerial vehicles (UAVs remains as an elusive goal in the field of computer vision research. The types of features used in current studies concerning moving object detection are typically chosen based on improving detection rate rather than on providing fast and computationally less complex feature extraction methods. Because moving object detection using aerial images from UAVs involves motion as seen from a certain altitude, effective and fast feature extraction is a vital issue for optimum detection performance. This research proposes a two-layer bucket approach based on a new feature extraction algorithm referred to as the moment-based feature extraction algorithm (MFEA. Because a moment represents the coherent intensity of pixels and motion estimation is a motion pixel intensity measurement, this research used this relation to develop the proposed algorithm. The experimental results reveal the successful performance of the proposed MFEA algorithm and the proposed methodology.

  18. A DYNAMIC FEATURE SELECTION METHOD FOR DOCUMENT RANKING WITH RELEVANCE FEEDBACK APPROACH

    Directory of Open Access Journals (Sweden)

    K. Latha

    2010-07-01

    Full Text Available Ranking search results is essential for information retrieval and Web search. Search engines need to not only return highly relevant results, but also be fast to satisfy users. As a result, not all available features can be used for ranking, and in fact only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. In this paper we describe a 0/1 knapsack procedure for automatically selecting features to use within Generalization model for Document Ranking. We propose an approach for Relevance Feedback using Expectation Maximization method and evaluate the algorithm on the TREC Collection for describing classes of feedback textual information retrieval features. Experimental results, evaluated on standard TREC-9 part of the OHSUMED collections, show that our feature selection algorithm produces models that are either significantly more effective than, or equally effective as, models such as Markov Random Field model, Correlation Co-efficient and Count Difference method

  19. Comparative Study on Feature Selection and Fusion Schemes for Emotion Recognition from Speech

    Directory of Open Access Journals (Sweden)

    Santiago Planet

    2012-09-01

    Full Text Available The automatic analysis of speech to detect affective states may improve the way users interact with electronic devices. However, the analysis only at the acoustic level could be not enough to determine the emotion of a user in a realistic scenario. In this paper we analyzed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We compared two versions of this greedy selection algorithm by performing the search of the relevant features forwards and backwards. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme. Moreover, the extremely reduced set of acoustic features obtained by the greedy forward search selection algorithm improved the results provided by the full set.

  20. Feature Extraction based Face Recognition, Gender and Age Classification

    OpenAIRE

    Venugopal K R2; L M Patnaik; Ramesha K; K B Raja

    2010-01-01

    The face recognition system with large sets of training sets for personal identification normally attains good accuracy. In this paper, we proposed Feature Extraction based Face Recognition, Gender and Age Classification (FEBFRGAC) algorithm with only small training sets and it yields good results even with one image per person. This process involves three stages: Pre-processing, Feature Extraction and Classification. The geometric features of facial images like eyes, nose, mouth etc. are loc...

  1. Image Retrieval Based on Content Using Color Feature

    OpenAIRE

    Afifi, Ahmed J.; Wesam M. Ashour

    2012-01-01

    Content-based image retrieval from large resources has become an area of wide interest in many applications. In this paper we present a CBIR system that uses Ranklet Transform and the color feature as a visual feature to represent the images. Ranklet Transform is proposed as a preprocessing step to make the image invariant to rotation and any image enhancement operations. To speed up the retrieval time, images are clustered according to their features using k-means clustering algorithm.

  2. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  3. Feature-based multiresolution techniques for product design

    Institute of Scientific and Technical Information of China (English)

    LEE Sang Hun; LEE Kunwoo

    2006-01-01

    3D computer-aided design (CAD) systems based on feature-based solid modelling technique have been widely spread and used for product design. However, when part models associated with features are used in various downstream applications,simplified models in various levels of detail (LODs) are frequently more desirable than the full details of the parts. In particular,the need for feature-based multiresolution representation of a solid model representing an object at multiple LODs in the feature unit is increasing for engineering tasks. One challenge is to generate valid models at various LODs after an arbitrary rearrangement of features using a certain LOD criterion, because composite Boolean operations consisting of union and subtraction are not commutative. The other challenges are to devise proper topological framework for multiresolution representation, to suggest more reasonable LOD criteria, and to extend applications. This paper surveys the recent research on these issues.

  4. Spatial Circular Granulation Method Based on Multimodal Finger Feature

    Directory of Open Access Journals (Sweden)

    Jinfeng Yang

    2016-01-01

    Full Text Available Finger-based personal identification has become an active research topic in recent years because of its high user acceptance and convenience. How to reliably and effectively fuse the multimodal finger features together, however, has still been a challenging problem in practice. In this paper, viewing the finger trait as the combination of a fingerprint, finger vein, and finger-knuckle-print, a new multimodal finger feature recognition scheme is proposed based on granular computing. First, the ridge texture features of FP, FV, and FKP are extracted using Gabor Ordinal Measures (GOM. Second, combining the three-modal GOM feature maps in a color-based manner, we then constitute the original feature object set of a finger. To represent finger features effectively, they are granulated at three levels of feature granules (FGs in a bottom-up manner based on spatial circular granulation. In order to test the performance of the multilevel FGs, a top-down matching method is proposed. Experimental results show that the proposed method achieves higher accuracy recognition rate in finger feature recognition.

  5. Feature-based attention enhances performance by increasing response gain.

    Science.gov (United States)

    Herrmann, Katrin; Heeger, David J; Carrasco, Marisa

    2012-12-01

    Covert spatial attention can increase contrast sensitivity either by changes in contrast gain or by changes in response gain, depending on the size of the attention field and the size of the stimulus (Herrmann et al., 2010), as predicted by the normalization model of attention (Reynolds & Heeger, 2009). For feature-based attention, unlike spatial attention, the model predicts only changes in response gain, regardless of whether the featural extent of the attention field is small or large. To test this prediction, we measured the contrast dependence of feature-based attention. Observers performed an orientation-discrimination task on a spatial array of grating patches. The spatial locations of the gratings were varied randomly so that observers could not attend to specific locations. Feature-based attention was manipulated with a 75% valid and 25% invalid pre-cue, and the featural extent of the attention field was manipulated by introducing uncertainty about the upcoming grating orientation. Performance accuracy was better for valid than for invalid pre-cues, consistent with a change in response gain, when the featural extent of the attention field was small (low uncertainty) or when it was large (high uncertainty) relative to the featural extent of the stimulus. These results for feature-based attention clearly differ from results of analogous experiments with spatial attention, yet both support key predictions of the normalization model of attention. PMID:22580017

  6. Application of Fisher Score and mRMR Techniques for Feature Selection in Compressed Medical Images

    Directory of Open Access Journals (Sweden)

    Vamsidhar Enireddy

    2015-12-01

    Full Text Available In nowadays there is a large increase in the digital medical images and different medical imaging equipments are available for diagnoses, medical professionals are increasingly relying on computer aided techniques for both indexing these images and retrieving similar images from large repositories. To develop systems which are computationally less intensive without compromising on the accuracy from the high dimensional feature space is always challenging. In this paper an investigation is made on the retrieval of compressed medical images. Images are compressed using the visually lossless compression technique. Shape and texture features are extracted and best features are selected using the fisher technique and mRMR. Using these selected features RNN with BPTT was utilized for classification of the compressed images.

  7. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Directory of Open Access Journals (Sweden)

    Pavol Partila

    2015-01-01

    Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  8. Feature extraction and selection for objective gait analysis and fall risk assessment by accelerometry

    Directory of Open Access Journals (Sweden)

    Cremer Gerald

    2011-01-01

    Full Text Available Abstract Background Falls in the elderly is nowadays a major concern because of their consequences on elderly general health and moral states. Moreover, the aging of the population and the increasing life expectancy make the prediction of falls more and more important. The analysis presented in this article makes a first step in this direction providing a way to analyze gait and classify hospitalized elderly fallers and non-faller. This tool, based on an accelerometer network and signal processing, gives objective informations about the gait and does not need any special gait laboratory as optical analysis do. The tool is also simple to use by a non expert and can therefore be widely used on a large set of patients. Method A population of 20 hospitalized elderlies was asked to execute several classical clinical tests evaluating their risk of falling. They were also asked if they experienced any fall in the last 12 months. The accelerations of the limbs were recorded during the clinical tests with an accelerometer network distributed on the body. A total of 67 features were extracted from the accelerometric signal recorded during a simple 25 m walking test at comfort speed. A feature selection algorithm was used to select those able to classify subjects at risk and not at risk for several classification algorithms types. Results The results showed that several classification algorithms were able to discriminate people from the two groups of interest: fallers and non-fallers hospitalized elderlies. The classification performances of the used algorithms were compared. Moreover a subset of the 67 features was considered to be significantly different between the two groups using a t-test. Conclusions This study gives a method to classify a population of hospitalized elderlies in two groups: at risk of falling or not at risk based on accelerometric data. This is a first step to design a risk of falling assessment system that could be used to provide

  9. Multi-features Based Approach for Moving Shadow Detection

    Institute of Scientific and Technical Information of China (English)

    ZHOU Ning; ZHOU Man-li; XU Yi-ping; FANG Bao-hong

    2004-01-01

    In the video-based surveillance application, moving shadows can affect the correct localization and detection of moving objects. This paper aims to present a method for shadow detection and suppression used for moving visual object detection. The major novelty of the shadow suppression is the integration of several features including photometric invariant color feature, motion edge feature, and spatial feature etc. By modifying process for false shadow detected, the averaging detection rate of moving object reaches above 90% in the test of Hall-Monitor sequence.

  10. A new approach to modeling the influence of image features on fixation selection in scenes

    OpenAIRE

    Nuthmann, Antje; Einhäuser, Wolfgang

    2015-01-01

    Which image characteristics predict where people fixate when memorizing natural images? To answer this question, we introduce a new analysis approach that combines a novel scene-patch analysis with generalized linear mixed models (GLMMs). Our method allows for (1) directly describing the relationship between continuous feature value and fixation probability, and (2) assessing each feature's unique contribution to fixation selection. To demonstrate this method, we estimated the relative contri...

  11. Opinion mining feature-level using Naive Bayes and feature extraction based analysis dependencies

    Science.gov (United States)

    Sanda, Regi; Baizal, Z. K. Abdurahman; Nhita, Fhira

    2015-12-01

    Development of internet and technology, has major impact and providing new business called e-commerce. Many e-commerce sites that provide convenience in transaction, and consumers can also provide reviews or opinions on products that purchased. These opinions can be used by consumers and producers. Consumers to know the advantages and disadvantages of particular feature of the product. Procuders can analyse own strengths and weaknesses as well as it's competitors products. Many opinions need a method that the reader can know the point of whole opinion. The idea emerged from review summarization that summarizes the overall opinion based on sentiment and features contain. In this study, the domain that become the main focus is about the digital camera. This research consisted of four steps 1) giving the knowledge to the system to recognize the semantic orientation of an opinion 2) indentify the features of product 3) indentify whether the opinion gives a positive or negative 4) summarizing the result. In this research discussed the methods such as Naï;ve Bayes for sentiment classification, and feature extraction algorithm based on Dependencies Analysis, which is one of the tools in Natural Language Processing (NLP) and knowledge based dictionary which is useful for handling implicit features. The end result of research is a summary that contains a bunch of reviews from consumers on the features and sentiment. With proposed method, accuration for sentiment classification giving 81.2 % for positive test data, 80.2 % for negative test data, and accuration for feature extraction reach 90.3 %.

  12. Functional connectivity supporting the selective maintenance of feature-location binding in visual working memory

    Directory of Open Access Journals (Sweden)

    Sachiko eTakahama

    2014-06-01

    Full Text Available Information on an object’s features bound to its location is very important for maintaining object representations in visual working memory. Interactions with dynamic multi-dimensional objects in an external environment require complex cognitive control, including the selective maintenance of feature-location binding. Here, we used event-related functional magnetic resonance imaging to investigate brain activity and functional connectivity related to the maintenance of complex feature-location binding. Participants were required to detect task-relevant changes in feature-location binding between objects defined by color, orientation, and location. We compared a complex binding task requiring complex feature-location binding (color-orientation-location with a simple binding task in which simple feature-location binding, such as color-location, was task-relevant and the other feature was task-irrelevant. Univariate analyses showed that the dorsolateral prefrontal cortex (DLPFC, hippocampus, and frontoparietal network were activated during the maintenance of complex feature-location binding. Functional connectivity analyses indicated cooperation between the inferior precentral sulcus (infPreCS, DLPFC, and hippocampus during the maintenance of complex feature-location binding. In contrast, the connectivity for the spatial updating of simple feature-location binding determined by reanalyzing the data from Takahama et al. (2010 demonstrated that the superior parietal lobule (SPL cooperated with the DLPFC and hippocampus. These results suggest that the connectivity for complex feature-location binding does not simply reflect general memory load and that the DLPFC and hippocampus flexibly modulate the dorsal frontoparietal network, depending on the task requirements, with the infPreCS involved in the maintenance of complex feature-location binding and the SPL involved in the spatial updating of simple feature-location binding.

  13. Functional connectivity supporting the selective maintenance of feature-location binding in visual working memory.

    Science.gov (United States)

    Takahama, Sachiko; Saiki, Jun

    2014-01-01

    Information on an object's features bound to its location is very important for maintaining object representations in visual working memory. Interactions with dynamic multi-dimensional objects in an external environment require complex cognitive control, including the selective maintenance of feature-location binding. Here, we used event-related functional magnetic resonance imaging to investigate brain activity and functional connectivity related to the maintenance of complex feature-location binding. Participants were required to detect task-relevant changes in feature-location binding between objects defined by color, orientation, and location. We compared a complex binding task requiring complex feature-location binding (color-orientation-location) with a simple binding task in which simple feature-location binding, such as color-location, was task-relevant and the other feature was task-irrelevant. Univariate analyses showed that the dorsolateral prefrontal cortex (DLPFC), hippocampus, and frontoparietal network were activated during the maintenance of complex feature-location binding. Functional connectivity analyses indicated cooperation between the inferior precentral sulcus (infPreCS), DLPFC, and hippocampus during the maintenance of complex feature-location binding. In contrast, the connectivity for the spatial updating of simple feature-location binding determined by reanalyzing the data from Takahama et al. (2010) demonstrated that the superior parietal lobule (SPL) cooperated with the DLPFC and hippocampus. These results suggest that the connectivity for complex feature-location binding does not simply reflect general memory load and that the DLPFC and hippocampus flexibly modulate the dorsal frontoparietal network, depending on the task requirements, with the infPreCS involved in the maintenance of complex feature-location binding and the SPL involved in the spatial updating of simple feature-location binding. PMID:24917833

  14. Multi Feature Content Based Video Retrieval Using High Level Semantic Concept

    Directory of Open Access Journals (Sweden)

    Hamdy K. Elminir

    2012-07-01

    Full Text Available Content-based retrieval allows finding information by searching its content rather than its attributes. The challenge facing content-based video retrieval (CBVR is to design systems that can accurately and automatically process large amounts of heterogeneous videos. Moreover, content-based video retrieval system requires in its first stage to segment the video stream into separate shots. Afterwards features are extracted for video shots representation. And finally, choose a similarity/distance metric and an algorithm that is efficient enough to retrieve query - related videos results. There are two main issues in this process; the first is how to determine the best way for video segmentation and key frame selection. The second is the features used for video representation. Various features can be extracted for this sake including either low or high level features. A key issue is how to bridge the gap between low and high level features. This paper proposes a system for a content based video retrieval system that tries to address the aforementioned issues by using adaptive threshold for video segmentation and key frame selection as well as using both low level features together with high level semantic object annotation for video representation. Experimental results show that the use of multi features increases both precision and recall rates by about 13% to 19 % than traditional system that uses only color feature for video retrieval.

  15. Features of underwater echo extraction based on signal sparse decomposition

    Institute of Scientific and Technical Information of China (English)

    YANG Bo; BU Yinyong; ZHAO Haiming

    2012-01-01

    In order to better realize sound echo recognition of underwater materials with heavily uneven surface, a features abstraction method based on the theory of signal sparse decomposition has been proposed. Instead of the common time frequency dictionary, sets of training echo samples are used directly as dictionary to realize echo sparse decomposition under L1 optimization and abstract a kind of energy features of the echo. Experiments on three kinds of bottom materials including the Cobalt Crust show that the Fisher distribution with this method is superior to that of edge features and of Singular Value Decomposition (SVD) features in wavelet domain. It means no doubt that much better classification result of underwater bottom materials can be obtained with the proposed energy features than the other two. It is concluded that echo samples used as a dictionary is feasible and the class information of echo introduced by this dictionary can help to obtain better echo features.

  16. Content Based Image Retrieval by Multi Features using Image Blocks

    Directory of Open Access Journals (Sweden)

    Arpita Mathur

    2013-12-01

    Full Text Available Content based image retrieval (CBIR is an effective method of retrieving images from large image resources. CBIR is a technique in which images are indexed by extracting their low level features like, color, texture, shape, and spatial location, etc. Effective and efficient feature extraction mechanisms are required to improve existing CBIR performance. This paper presents a novel approach of CBIR system in which higher retrieval efficiency is achieved by combining the information of image features color, shape and texture. The color feature is extracted using color histogram for image blocks, for shape feature Canny edge detection algorithm is used and the HSB extraction in blocks is used for texture feature extraction. The feature set of the query image are compared with the feature set of each image in the database. The experiments show that the fusion of multiple features retrieval gives better retrieval results than another approach used by Rao et al. This paper presents comparative study of performance of the two different approaches of CBIR system in which the image features color, shape and texture are used.

  17. An iterative feature selection method for GRNs inference by exploring topological properties

    CERN Document Server

    Lopes, Fabrício Martins; Barrera, Junior; Cesar-Jr, Roberto M

    2011-01-01

    An important problem in bioinformatics is the inference of gene regulatory networks (GRN) from temporal expression profiles. In general, the main limitations faced by GRN inference methods is the small number of samples with huge dimensionalities and the noisy nature of the expression measurements. In face of these limitations, alternatives are needed to get better accuracy on the GRNs inference problem. This work addresses this problem by presenting an alternative feature selection method that applies prior knowledge on its search strategy, called SFFS-BA. The proposed search strategy is based on the Sequential Floating Forward Selection (SFFS) algorithm, with the inclusion of a scale-free (Barab\\'asi-Albert) topology information in order to guide the search process to improve inference. The proposed algorithm explores the scale-free property by pruning the search space and using a power law as a weight for reducing it. In this way, the search space traversed by the SFFS-BA method combines a breadth-first se...

  18. Explore Interregional EEG Correlations Changed by Sport Training Using Feature Selection

    Directory of Open Access Journals (Sweden)

    Jia Gao

    2016-01-01

    Full Text Available This paper investigated the interregional correlation changed by sport training through electroencephalography (EEG signals using the techniques of classification and feature selection. The EEG data are obtained from students with long-time professional sport training and normal students without sport training as baseline. Every channel of the 19-channel EEG signals is considered as a node in the brain network and Pearson Correlation Coefficients are calculated between every two nodes as the new features of EEG signals. Then, the Partial Least Square (PLS is used to select the top 10 most varied features and Pearson Correlation Coefficients of selected features are compared to show the difference of two groups. Result shows that the classification accuracy of two groups is improved from 88.13% by the method using measurement of EEG overall energy to 97.19% by the method using EEG correlation measurement. Furthermore, the features selected reveal that the most important interregional EEG correlation changed by training is the correlation between left inferior frontal and left middle temporal with a decreased value.

  19. Texture feature selection with relevance learning to classify interstitial lung disease patterns

    Science.gov (United States)

    Huber, Markus B.; Bunte, Kerstin; Nagarajan, Mahesh B.; Biehl, Michael; Ray, Lawrence A.; Wismueller, Axel

    2011-03-01

    The Generalized Matrix Learning Vector Quantization (GMLVQ) is used to estimate the relevance of texture features in their ability to classify interstitial lung disease patterns in high-resolution computed tomography (HRCT) images. After a stochastic gradient descent, the GMLVQ algorithm provides a discriminative distance measure of relevance factors, which can account for pairwise correlations between different texture features and their importance for the classification of healthy and diseased patterns. Texture features were extracted from gray-level co-occurrence matrices (GLCMs), and were ranked and selected according to their relevance obtained by GMLVQ and, for comparison, to a mutual information (MI) criteria. A k-nearest-neighbor (kNN) classifier and a Support Vector Machine with a radial basis function kernel (SVMrbf) were optimized in a 10-fold crossvalidation for different texture feature sets. In our experiment with real-world data, the feature sets selected by the GMLVQ approach had a significantly better classification performance compared with feature sets selected by a MI ranking.

  20. DS926 Digital surfaces and thicknesses of selected hydrogeologic units of the Floridan aquifer system in Florida and parts of Georgia, Alabama, and South Carolina -- Point features used for the base of the Floridan aquifer system

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — Digital surfaces and thicknesses of selected hydrogeologic units of the Floridan aquifer system were developed to define an updated hydrogeologic framework as part...

  1. An Effective Combined Feature For Web Based Image Retrieval

    Directory of Open Access Journals (Sweden)

    H.M.R.B Herath

    2015-08-01

    Full Text Available Abstract Technology advances as well as the emergence of large scale multimedia applications and the revolution of the World Wide Web has changed the world into a digital age. Anybody can use their mobile phone to take a photo at any time anywhere and upload that image to ever growing image databases. Development of effective techniques for visual and multimedia retrieval systems is one of the most challenging and important directions of the future research. This paper proposes an effective combined feature for web based image retrieval. Frequently used colour and texture features are explored in order to develop a combined feature for this purpose. Widely used three colour features Colour moments Colour coherence vector and Colour Correlogram and three texture features Grey Level Co-occurrence matrix Tamura features and Gabor filter were analyzed for their performance. Precision and Recall were used to evaluate the performance of each of these techniques. By comparing precision and recall values the methods that performed best were taken and combined to form a hybrid feature. The developed combined feature was evaluated by developing a web based CBIR system. A web crawler was used to first crawl through Web sites and images found in those sites are downloaded and the combined feature representation technique was used to extract image features. The test results indicated that this web system can be used to index web images with the combined feature representation schema and to find similar images. Random image retrievals using the web system shows that the combined feature can be used to retrieve images belonging to the general image domain. Accuracy of the retrieval can be noted high for natural images like outdoor scenes images of flowers etc. Also images which have a similar colour and texture distribution were retrieved as similar even though the images were belonging to deferent semantic categories. This can be ideal for an artist who wants

  2. Classifying Features Selection and Classification Based on Mahalanobis Distance for Complex Short Time Power Quality Disturbances%短时电能质量复合扰动分类特征选取与马氏距离分类法

    Institute of Scientific and Technical Information of China (English)

    汪洋; 肖先勇; 刘阳; 刘勃江

    2014-01-01

    分类特征合理选取和分类方法是短时电能质量复合扰动分类的核心内容。以科学分类的可比性原则为出发点,引入分类特征概念,提出短时电能质量复合扰动分类特征3层选取策略,以主要频率点、幅值特征等为分类特征,研究分类特征提取算法。在突出不同类扰动分类特征差异性的同时,考虑同类扰动特征的相关性,提出基于马氏距离的短时电能质量复合扰动分类方法。对8大类单一短时扰动及其构成的复合扰动共16种进行仿真,并和支持向量机神经网络2种分类器进行比较,实验表明提出的方法实时性好且准确性高,有一定的工程应用前景。%Reasonable feature selection and classification method are the core contents of short-time power quality complex disturbances classification. Regarding comparability principle of scientific classification as the starting point and leading in the concept of classification feature, a three-layer classification feature selection strategy for short time power quality complex disturbances is proposed, and taking main frequency point and amplitude characteristic as classification features, the classification feature extraction algorithm is researched. Considering the correlation of disturbance features of the same kind, a Mahalanobis distance based short time power quality complex disturbance classification method is proposed while the diversity among disturbance classification features of different kinds is emphasized. Eight types of single disturbances and their complex disturbances, namely sixteen in total, are simulated and the comparison of the simulation results with those by two different classifiers respectively, i.e., the classifier based on support vector machine and the classifier based on neuron network, shows that the proposed method has the ability of good real time and the classification is more accurate,hasing a good application on engineering.

  3. Feature-based tolerancing for intelligent inspection process definition

    Energy Technology Data Exchange (ETDEWEB)

    Brown, C.W.

    1993-07-01

    This paper describes a feature-based tolerancing capability that complements a geometric solid model with an explicit representation of conventional and geometric tolerances. This capability is focused on supporting an intelligent inspection process definition system. The feature-based tolerance model`s benefits include advancing complete product definition initiatives (e.g., STEP -- Standard for Exchange of Product model dam), suppling computer-integrated manufacturing applications (e.g., generative process planning and automated part programming) with product definition information, and assisting in the solution of measurement performance issues. A feature-based tolerance information model was developed based upon the notion of a feature`s toleranceable aspects and describes an object-oriented scheme for representing and relating tolerance features, tolerances, and datum reference frames. For easy incorporation, the tolerance feature entities are interconnected with STEP solid model entities. This schema will explicitly represent the tolerance specification for mechanical products, support advanced dimensional measurement applications, and assist in tolerance-related methods divergence issues.

  4. Automated cervical precancerous cells screening system based on Fourier transform infrared spectroscopy features

    Science.gov (United States)

    Jusman, Yessi; Mat Isa, Nor Ashidi; Ng, Siew-Cheok; Hasikin, Khairunnisa; Abu Osman, Noor Azuan

    2016-07-01

    Fourier transform infrared (FTIR) spectroscopy technique can detect the abnormality of a cervical cell that occurs before the morphological change could be observed under the light microscope as employed in conventional techniques. This paper presents developed features extraction for an automated screening system for cervical precancerous cell based on the FTIR spectroscopy as a second opinion to pathologists. The automated system generally consists of the developed features extraction and classification stages. Signal processing techniques are used in the features extraction stage. Then, discriminant analysis and principal component analysis are employed to select dominant features for the classification process. The datasets of the cervical precancerous cells obtained from the feature selection process are classified using a hybrid multilayered perceptron network. The proposed system achieved 92% accuracy.

  5. How can selection of biologically inspired features improve the performance of a robust object recognition model?

    Directory of Open Access Journals (Sweden)

    Masoud Ghodrati

    Full Text Available Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

  6. Pattern Recognition by Dinamic Feature Analysis Based on PCA

    Directory of Open Access Journals (Sweden)

    Juliana Valencia-Aguirre

    2009-06-01

    Full Text Available Usually, in pattern recognition problems we represent the observations by mean of measures on appropriate variables of data set, these measures can be categorized as Static and Dynamic Features. Static features are not always an accurate representation of data. In these sense, many phenomena are better modeled by dynamic changes on their measures. The advantage of using an extended form (dynamic features is the inclusion of new information that allows us to get a better representation of the object. Nevertheless, sometimes it is difficult in a classification stage to deal with dynamic features, because the associated computational cost often can be higher than we deal with static features. For analyzing such representations, we use Principal Component Analysis (PCA, arranging dynamic data in such a way we can consider variations related to the intrinsic dynamic of observations. Therefore, the method made possible to evaluate the dynamic information about of the observations on a lower dimensionality feature space without decreasing the accuracy performance. Algorithms were tested on real data to classify pathological speech from normal voices, and using PCA for dynamic feature selection, as well.

  7. Fingerprints, Iris and DNA Features based Multimodal Systems: A Review

    Directory of Open Access Journals (Sweden)

    Prakash Chandra Srivastava

    2013-01-01

    Full Text Available Biometric systems are alternates to the traditional identification systems. This paper provides an overview of single feature and multiple features based biometric systems, including the performance of physiological characteristics (such as fingerprint, hand geometry, head recognition, iris, retina, face recognition, DNA recognition, palm prints, heartbeat, finger veins, palates etc and behavioral characteristics (such as body language, facial expression, signature verification, speech recognition, Gait Signature etc.. The fingerprints, iris image, and DNA features based multimodal systems and their performances are analyzed in terms of security, reliability, accuracy, and long-term stability. The strengths and weaknesses of various multiple features based biometric approaches published so far are analyzed. The directions of future research work for robust personal identification is outlined.

  8. Using listener-based perceptual features as intermediate representations in music information retrieval.

    Science.gov (United States)

    Friberg, Anders; Schoonderwaldt, Erwin; Hedblad, Anton; Fabiani, Marco; Elowsson, Anders

    2014-10-01

    The notion of perceptual features is introduced for describing general music properties based on human perception. This is an attempt at rethinking the concept of features, aiming to approach the underlying human perception mechanisms. Instead of using concepts from music theory such as tones, pitches, and chords, a set of nine features describing overall properties of the music was selected. They were chosen from qualitative measures used in psychology studies and motivated from an ecological approach. The perceptual features were rated in two listening experiments using two different data sets. They were modeled both from symbolic and audio data using different sets of computational features. Ratings of emotional expression were predicted using the perceptual features. The results indicate that (1) at least some of the perceptual features are reliable estimates; (2) emotion ratings could be predicted by a small combination of perceptual features with an explained variance from 75% to 93% for the emotional dimensions activity and valence; (3) the perceptual features could only to a limited extent be modeled using existing audio features. Results clearly indicated that a small number of dedicated features were superior to a "brute force" model using a large number of general audio features.

  9. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    Science.gov (United States)

    Zawbaa, Hossam M; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205

  10. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    Directory of Open Access Journals (Sweden)

    Hossam M Zawbaa

    Full Text Available Poly-lactide-co-glycolide (PLGA is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP, multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR. The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  11. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres—Focus on Feature Selection

    Science.gov (United States)

    Zawbaa, Hossam M.; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205

  12. Privacy preserving data publishing of categorical data through k-anonymity and feature selection.

    Science.gov (United States)

    Aristodimou, Aristos; Antoniades, Athos; Pattichis, Constantinos S

    2016-03-01

    In healthcare, there is a vast amount of patients' data, which can lead to important discoveries if combined. Due to legal and ethical issues, such data cannot be shared and hence such information is underused. A new area of research has emerged, called privacy preserving data publishing (PPDP), which aims in sharing data in a way that privacy is preserved while the information lost is kept at a minimum. In this Letter, a new anonymisation algorithm for PPDP is proposed, which is based on k-anonymity through pattern-based multidimensional suppression (kPB-MS). The algorithm uses feature selection for reducing the data dimensionality and then combines attribute and record suppression for obtaining k-anonymity. Five datasets from different areas of life sciences [RETINOPATHY, Single Proton Emission Computed Tomography imaging, gene sequencing and drug discovery (two datasets)], were anonymised with kPB-MS. The produced anonymised datasets were evaluated using four different classifiers and in 74% of the test cases, they produced similar or better accuracies than using the full datasets.

  13. Feature Selection Strategy for Classification of Single-Trial EEG Elicited by Motor Imagery

    DEFF Research Database (Denmark)

    Prasad, Swati; Tan, Zheng-Hua; Prasad, Ramjee;

    2011-01-01

    Brain-Computer Interface (BCI) provides new means of communication for people with motor disabilities by utilizing electroencephalographic activity. Selection of features from Electroencephalogram (EEG) signals for classification plays a key part in the development of BCI systems. In this paper, we...

  14. The Influence of Selected Personality and Workplace Features on Burnout among Nurse Academics

    Science.gov (United States)

    Kizilci, Sevgi; Erdogan, Vesile; Sozen, Emine

    2012-01-01

    This study aimed to determine the influence of selected individual and situational features on burnout among nurse academics. The Maslach Burnout Inventory was used to assess the burnout levels of academics. The sample population comprised 94 female participant. The emotion exhaustion (EE) score of the nurse academics was 16.43[plus or minus]5.97,…

  15. EMOTION ANALYSIS OF SONGS BASED ON LYRICAL AND AUDIO FEATURES

    Directory of Open Access Journals (Sweden)

    Adit Jamdar

    2015-05-01

    Full Text Available In this paper, a method is proposed to detect the emotion of a song based on its lyrical and audio features. Lyrical features are generated by segmentation of lyrics during the process of data extraction. ANEW and WordNet knowledge is then incorporated to compute Valence and Arousal values. In addition to this, linguistic association rules are applied to ensure that the issue of ambiguity is properly addressed. Audio features are used to supplement the lyrical ones and include attributes like energy, tempo, and danceability. These features are extracted from The Echo Nest, a widely used music intelligence platform. Construction of training and test sets is done on the basis of social tags extracted from the last.fm website. The classification is done by applying feature weighting and stepwise threshold reduction on the k-Nearest Neighbors algorithm to provide fuzziness in the classification.

  16. A survey on filter techniques for feature selection in gene expression microarray analysis.

    Science.gov (United States)

    Lazar, Cosmin; Taminau, Jonatan; Meganck, Stijn; Steenhoff, David; Coletta, Alain; Molter, Colin; de Schaetzen, Virginie; Duque, Robin; Bersini, Hugues; Nowé, Ann

    2012-01-01

    A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.

  17. Canonical feature selection for joint regression and multi-class identification in Alzheimer's disease diagnosis.

    Science.gov (United States)

    Zhu, Xiaofeng; Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2016-09-01

    Fusing information from different imaging modalities is crucial for more accurate identification of the brain state because imaging data of different modalities can provide complementary perspectives on the complex nature of brain disorders. However, most existing fusion methods often extract features independently from each modality, and then simply concatenate them into a long vector for classification, without appropriate consideration of the correlation among modalities. In this paper, we propose a novel method to transform the original features from different modalities to a common space, where the transformed features become comparable and easy to find their relation, by canonical correlation analysis. We then perform the sparse multi-task learning for discriminative feature selection by using the canonical features as regressors and penalizing a loss function with a canonical regularizer. In our experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, we use Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) images to jointly predict clinical scores of Alzheimer's Disease Assessment Scale-Cognitive subscale (ADAS-Cog) and Mini-Mental State Examination (MMSE) and also identify multi-class disease status for Alzheimer's disease diagnosis. The experimental results showed that the proposed canonical feature selection method helped enhance the performance of both clinical score prediction and disease status identification, outperforming the state-of-the-art methods. PMID:26254746

  18. Integrated Feature Selection and Clustering for Taxonomic Problems within Fish Species Complexes

    Directory of Open Access Journals (Sweden)

    Huimin Chen

    2008-07-01

    Full Text Available As computer and database technologies advance rapidly, biologists all over the world can share biologically meaningful data from images of specimens and use the data to classify the specimens taxonomically. Accurate shape analysis of a specimen from multiple views of 2D images is crucial for finding diagnostic features using geometric morphometric techniques. We propose an integrated feature selection and clustering framework that automatically identifies a set of feature variables to group specimens into a binary cluster tree. The candidate features are generated from reconstructed 3D shape and local saliency characteristics from 2D images of the specimens. A Gaussian mixture model is used to estimate the significance value of each feature and control the false discovery rate in the feature selection process so that the clustering algorithm can efficiently partition the specimen samples into clusters that may correspond to different species. The experiments on a taxonomic problem involving species of suckers in the genus Carpiodes demonstrate promising results using the proposed framework with only a small size of samples.

  19. Optimum location of external markers using feature selection algorithms for real-time tumor tracking in external-beam radiotherapy: a virtual phantom study.

    Science.gov (United States)

    Nankali, Saber; Esmaili Torshabi, Ahmad; Samadi Miandoab, Payam; Baghizadeh, Amin

    2016-01-01

    In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based feature selection algorithm, in

  20. Image feature extraction based multiple ant colonies cooperation

    Science.gov (United States)

    Zhang, Zhilong; Yang, Weiping; Li, Jicheng

    2015-05-01

    This paper presents a novel image feature extraction algorithm based on multiple ant colonies cooperation. Firstly, a low resolution version of the input image is created using Gaussian pyramid algorithm, and two ant colonies are spread on the source image and low resolution image respectively. The ant colony on the low resolution image uses phase congruency as its inspiration information, while the ant colony on the source image uses gradient magnitude as its inspiration information. These two ant colonies cooperate to extract salient image features through sharing a same pheromone matrix. After the optimization process, image features are detected based on thresholding the pheromone matrix. Since gradient magnitude and phase congruency of the input image are used as inspiration information of the ant colonies, our algorithm shows higher intelligence and is capable of acquiring more complete and meaningful image features than other simpler edge detectors.

  1. Sequential feature selection for detecting buried objects using forward looking ground penetrating radar

    Science.gov (United States)

    Shaw, Darren; Stone, Kevin; Ho, K. C.; Keller, James M.; Luke, Robert H.; Burns, Brian P.

    2016-05-01

    Forward looking ground penetrating radar (FLGPR) has the benefit of detecting objects at a significant standoff distance. The FLGPR signal is radiated over a large surface area and the radar signal return is often weak. Improving detection, especially for buried in road targets, while maintaining an acceptable false alarm rate remains to be a challenging task. Various kinds of features have been developed over the years to increase the FLGPR detection performance. This paper focuses on investigating the use of as many features as possible for detecting buried targets and uses the sequential feature selection technique to automatically choose the features that contribute most for improving performance. Experimental results using data collected at a government test site are presented.

  2. Prototype Theory Based Feature Representation for PolSAR Images

    OpenAIRE

    Huang Xiaojing; Yang Xiangli; Huang Pingping; Yang Wen

    2016-01-01

    This study presents a new feature representation approach for Polarimetric Synthetic Aperture Radar (PolSAR) image based on prototype theory. First, multiple prototype sets are generated using prototype theory. Then, regularized logistic regression is used to predict similarities between a test sample and each prototype set. Finally, the PolSAR image feature representation is obtained by ensemble projection. Experimental results of an unsupervised classification of PolSAR images show that our...

  3. Feature Learning for Fingerprint-Based Positioning in Indoor Environment

    OpenAIRE

    Zengwei Zheng; Yuanyi Chen; Tao He; Lin Sun; Dan Chen

    2015-01-01

    Recent years have witnessed a growing interest in using Wi-Fi received signal strength for indoor fingerprint-based positioning. However, previous study about this problem has primarily faced two main challenges. One is that positioning fingerprint feature using received signal strength is unstable due to heterogeneous devices and dynamic environment status, which will greatly degrade the positioning accuracy. Another is that some improved positioning fingerprint features will suffer the curs...

  4. Feature-based attention enhances performance by increasing response gain

    OpenAIRE

    Herrmann, Katrin; Heeger, David J.; Carrasco, Marisa

    2012-01-01

    Covert spatial attention can increase contrast sensitivity either by changes in contrast gain or by changes in response gain, depending on the size of the attention field and the size of the stimulus (Herrmann, Montaser-Kouhsari, Carrasco, & Heeger, 2010), as predicted by the normalization model of attention (Reynolds & Heeger, 2009). For feature-based attention, unlike spatial attention, the model predicts only changes in response gain, regardless of whether the featural extent of the attent...

  5. Frequency feature based quantification of defect depth and thickness

    Science.gov (United States)

    Tian, Shulin; Chen, Kai; Bai, Libing; Cheng, Yuhua; Tian, Lulu; Zhang, Hong

    2014-06-01

    This study develops a frequency feature based pulsed eddy current method. A frequency feature, termed frequency to zero, is proposed for subsurface defects and metal loss quantification in metallic specimens. A curve fitting method is also employed to generate extra frequency components and improve the accuracy of the proposed method. Experimental validation is carried out. Conclusions and further work are derived on the basis of the studies.

  6. Feature Selection and Fault Classification of Reciprocating Compressors using a Genetic Algorithm and a Probabilistic Neural Network

    Energy Technology Data Exchange (ETDEWEB)

    Ahmed, M; Gu, F; Ball, A, E-mail: M.Ahmed@hud.ac.uk [Diagnostic Engineering Research Group, University of Huddersfield, HD1 3DH (United Kingdom)

    2011-07-19

    Reciprocating compressors are widely used in industry for various purposes and faults occurring in them can degrade their performance, consume additional energy and even cause severe damage to the machine. Vibration monitoring techniques are often used for early fault detection and diagnosis, but it is difficult to prescribe a given set of effective diagnostic features because of the wide variety of operating conditions and the complexity of the vibration signals which originate from the many different vibrating and impact sources. This paper studies the use of genetic algorithms (GAs) and neural networks (NNs) to select effective diagnostic features for the fault diagnosis of a reciprocating compressor. A large number of common features are calculated from the time and frequency domains and envelope analysis. Applying GAs and NNs to these features found that envelope analysis has the most potential for differentiating three common faults: valve leakage, inter-cooler leakage and a loose drive belt. Simultaneously, the spread parameter of the probabilistic NN was also optimised. The selected subsets of features were examined based on vibration source characteristics. The approach developed and the trained NN are confirmed as possessing general characteristics for fault detection and diagnosis.

  7. Feature Selection and Fault Classification of Reciprocating Compressors using a Genetic Algorithm and a Probabilistic Neural Network

    Science.gov (United States)

    Ahmed, M.; Gu, F.; Ball, A.

    2011-07-01

    Reciprocating compressors are widely used in industry for various purposes and faults occurring in them can degrade their performance, consume additional energy and even cause severe damage to the machine. Vibration monitoring techniques are often used for early fault detection and diagnosis, but it is difficult to prescribe a given set of effective diagnostic features because of the wide variety of operating conditions and the complexity of the vibration signals which originate from the many different vibrating and impact sources. This paper studies the use of genetic algorithms (GAs) and neural networks (NNs) to select effective diagnostic features for the fault diagnosis of a reciprocating compressor. A large number of common features are calculated from the time and frequency domains and envelope analysis. Applying GAs and NNs to these features found that envelope analysis has the most potential for differentiating three common faults: valve leakage, inter-cooler leakage and a loose drive belt. Simultaneously, the spread parameter of the probabilistic NN was also optimised. The selected subsets of features were examined based on vibration source characteristics. The approach developed and the trained NN are confirmed as possessing general characteristics for fault detection and diagnosis.

  8. Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis

    Directory of Open Access Journals (Sweden)

    Zare Habil

    2013-01-01

    Full Text Available Abstract One challenge in applying bioinformatic tools to clinical or biological data is high number of features that might be provided to the learning algorithm without any prior knowledge on which ones should be used. In such applications, the number of features can drastically exceed the number of training instances which is often limited by the number of available samples for the study. The Lasso is one of many regularization methods that have been developed to prevent overfitting and improve prediction performance in high-dimensional settings. In this paper, we propose a novel algorithm for feature selection based on the Lasso and our hypothesis is that defining a scoring scheme that measures the "quality" of each feature can provide a more robust feature selection method. Our approach is to generate several samples from the training data by bootstrapping, determine the best relevance-ordering of the features for each sample, and finally combine these relevance-orderings to select highly relevant features. In addition to the theoretical analysis of our feature scoring scheme, we provided empirical evaluations on six real datasets from different fields to confirm the superiority of our method in exploratory data analysis and prediction performance. For example, we applied FeaLect, our feature scoring algorithm, to a lymphoma dataset, and according to a human expert, our method led to selecting more meaningful features than those commonly used in the clinics. This case study built a basis for discovering interesting new criteria for lymphoma diagnosis. Furthermore, to facilitate the use of our algorithm in other applications, the source code that implements our algorithm was released as FeaLect, a documented R package in CRAN.

  9. Incorporating Feature-Based Annotations into Automatically Generated Knowledge Representations

    Science.gov (United States)

    Lumb, L. I.; Lederman, J. I.; Aldridge, K. D.

    2006-12-01

    Earth Science Markup Language (ESML) is efficient and effective in representing scientific data in an XML- based formalism. However, features of the data being represented are not accounted for in ESML. Such features might derive from events (e.g., a gap in data collection due to instrument servicing), identifications (e.g., a scientifically interesting area/volume in an image), or some other source. In order to account for features in an ESML context, we consider them from the perspective of annotation, i.e., the addition of information to existing documents without changing the originals. Although it is possible to extend ESML to incorporate feature-based annotations internally (e.g., by extending the XML schema for ESML), there are a number of complicating factors that we identify. Rather than pursuing the ESML-extension approach, we focus on an external representation for feature-based annotations via XML Pointer Language (XPointer). In previous work (Lumb &Aldridge, HPCS 2006, IEEE, doi:10.1109/HPCS.2006.26), we have shown that it is possible to extract relationships from ESML-based representations, and capture the results in the Resource Description Format (RDF). Thus we explore and report on this same requirement for XPointer-based annotations of ESML representations. As in our past efforts, the Global Geodynamics Project (GGP) allows us to illustrate with a real-world example this approach for introducing annotations into automatically generated knowledge representations.

  10. Multivariate Feature Selection for Predicting Scour-Related Bridge Damage using a Genetic Algorithm

    Science.gov (United States)

    Anderson, I.

    2015-12-01

    Scour and hydraulic damage are the most common cause of bridge failure, reported to be responsible for over 60% of bridge failure nationwide. Scour is a complex process, and is likely an epistatic function of both bridge and stream conditions that are both stationary and in dynamic flux. Bridge inspections, conducted regularly on bridges nationwide, rate bridge health assuming a static stream condition, and typically do not include dynamically changing geomorphological adjustments. The Vermont Agency of Natural Resources stream geomorphic assessment data could add value into the current bridge inspection and scour design. The 2011 bridge damage from Tropical Storm Irene served as a case study for feature selection to improve bridge scour damage prediction in extreme events. The bridge inspection (with over 200 features on more than 300 damaged and 2,000 non-damaged bridges), and the stream geomorphic assessment (with over 300 features on more than 5000 stream reaches) constitute "Big Data", and together have the potential to generate large numbers of combined features ("epistatic relationships") that might better predict scour-related bridge damage. The potential combined features pose significant computational challenges for traditional statistical techniques (e.g., multivariate logistic regression). This study uses a genetic algorithm to perform a search of the multivariate feature space to identify epistatic relationships that are indicative of bridge scour damage. The combined features identified could be used to improve bridge scour design, and to better monitor and rate bridge scour vulnerability.

  11. Feature-based tolerancing for advanced manufacturing applications

    Energy Technology Data Exchange (ETDEWEB)

    Brown, C.W.; Kirk, W.J. III; Simons, W.R.; Ward, R.C.; Brooks, S.L.

    1994-11-01

    A primary requirement for the successful deployment of advanced manufacturing applications is the need for a complete and accessible definition of the product. This product definition must not only provide an unambiguous description of a product`s nominal shape but must also contain complete tolerance specification and general property attributes. Likewise, the product definition`s geometry, topology, tolerance data, and modeler manipulative routines must be fully accessible through a robust application programmer interface. This paper describes a tolerancing capability using features that complements a geometric solid model with a representation of conventional and geometric tolerances and non-shape property attributes. This capability guarantees a complete and unambiguous definition of tolerances for manufacturing applications. An object-oriented analysis and design of the feature-based tolerance domain was performed. The design represents and relates tolerance features, tolerances, and datum reference frames. The design also incorporates operations that verify correctness and check for the completeness of the overall tolerance definition. The checking algorithm is based upon the notion of satisfying all of a feature`s toleranceable aspects. Benefits from the feature-based tolerance modeler include: advancing complete product definition initiatives, incorporating tolerances in product data exchange, and supplying computer-integrated manufacturing applications with tolerance information.

  12. Supervised feature selection for linear and non-linear regression of L⁎a⁎b⁎ color from multispectral images of meat

    DEFF Research Database (Denmark)

    Sharifzadeh, Sara; Clemmensen, Line Katrine Harder; Borggaard, Claus;

    2014-01-01

    feature selection method outperforms the PCA for both linear and non-linear methods. The highest performance was obtained by linear ridge regression applied on the selected features from the proposed Elastic net (EN) -based feature selection strategy. All the best models use a reduced number...... of meat samples (430–970 nm) were used for training and testing of the L⁎a⁎b prediction models. Finding a sparse solution or the use of a minimum number of bands is of particular interest to make an industrial vision set-up simpler and cost effective. In this paper, a wide range of linear, non......-linear, kernel-based regression and sparse regression methods are compared. In order to improve the prediction results of these models, we propose a supervised feature selection strategy which is compared with the Principal component analysis (PCA) as a pre-processing step. The results showed that the proposed...

  13. Ant-cuckoo colony optimization for feature selection in digital mammogram.

    Science.gov (United States)

    Jona, J B; Nagaveni, N

    2014-01-15

    Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques. PMID:24783812

  14. Characterization of computer network events through simultaneous feature selection and clustering of intrusion alerts

    Science.gov (United States)

    Chen, Siyue; Leung, Henry; Dondo, Maxwell

    2014-05-01

    As computer network security threats increase, many organizations implement multiple Network Intrusion Detection Systems (NIDS) to maximize the likelihood of intrusion detection and provide a comprehensive understanding of intrusion activities. However, NIDS trigger a massive number of alerts on a daily basis. This can be overwhelming for computer network security analysts since it is a slow and tedious process to manually analyse each alert produced. Thus, automated and intelligent clustering of alerts is important to reveal the structural correlation of events by grouping alerts with common features. As the nature of computer network attacks, and therefore alerts, is not known in advance, unsupervised alert clustering is a promising approach to achieve this goal. We propose a joint optimization technique for feature selection and clustering to aggregate similar alerts and to reduce the number of alerts that analysts have to handle individually. More precisely, each identified feature is assigned a binary value, which reflects the feature's saliency. This value is treated as a hidden variable and incorporated into a likelihood function for clustering. Since computing the optimal solution of the likelihood function directly is analytically intractable, we use the Expectation-Maximisation (EM) algorithm to iteratively update the hidden variable and use it to maximize the expected likelihood. Our empirical results, using a labelled Defense Advanced Research Projects Agency (DARPA) 2000 reference dataset, show that the proposed method gives better results than the EM clustering without feature selection in terms of the clustering accuracy.

  15. EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION PREDICTION

    Directory of Open Access Journals (Sweden)

    Noura AlNuaimi

    2015-11-01

    Full Text Available Large amount of heterogeneous medical data is generated every day in various healthcare organizations. Those data could derive insights for improving monitoring and care delivery in the Intensive Care Unit. Conversely, these data presents a challenge in reducing this amount of data without information loss. Dimension reduction is considered the most popular approach for reducing data size and also to reduce noise and redundancies in data. In this paper, we are investigate the effect of the average laboratory test value and number of total laboratory in predicting patient deterioration in the Intensive Care Unit, where we consider laboratory tests as features. Choosing a subset of features would mean choosing the most important lab tests to perform. Thus, our approach uses state-of-the-art feature selection to identify the most discriminative attributes, where we would have a better understanding of patient deterioration problem. If the number of tests can be reduced by identifying the most important tests, then we could also identify the redundant tests. By omitting the redundant tests, observation time could be reduced and early treatment could be provided to avoid the risk. Additionally, unnecessary monetary cost would be avoided. We apply our technique on the publicly available MIMIC-II database and show the effectiveness of the feature selection. We also provide a detailed analysis of the best features identified by our approach.

  16. Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

    Directory of Open Access Journals (Sweden)

    Junbao Zheng

    2012-03-01

    Full Text Available Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor as well as its parallel channels (inner factor. The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.

  17. Feature-based attentional modulation of orientation perception in somatosensation

    Directory of Open Access Journals (Sweden)

    Meike Annika Schweisfurth

    2014-07-01

    Full Text Available In a reaction time study of human tactile orientation detection the effects of spatial attention and feature-based attention were investigated. Subjects had to give speeded responses to target orientations (parallel and orthogonal to the finger axis in a random stream of oblique tactile distractor orientations presented to their index and ring fingers. Before each block of trials, subjects received a tactile cue at one finger. By manipulating the validity of this cue with respect to its location and orientation (feature, we provided an incentive to subjects to attend spatially to the cued location and only there to the cued orientation. Subjects showed quicker responses to parallel compared to orthogonal targets, pointing to an orientation anisotropy in sensory processing. Also, faster reaction times were observed in location-matched trials, i.e. when targets appeared on the cued finger, representing a perceptual benefit of spatial attention. Most importantly, reaction times were shorter to orientations matching the cue, both at the cued and at the uncued location, documenting a global enhancement of tactile sensation by feature-based attention. This is the first report of a perceptual benefit of feature-based attention outside the spatial focus of attention in somatosensory perception. The similarity to effects of feature-based attention in visual perception supports the notion of matching attentional mechanisms across sensory domains.

  18. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chun-Wei Tung

    2013-01-01

    Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.

  19. Review on Feature Selection Techniques and the Impact of SVM for Cancer Classification using Gene Expression Profile

    CERN Document Server

    George, G Victo Sudha; 10.5121/ijcses.2011.2302

    2011-01-01

    The DNA microarray technology has modernized the approach of biology research in such a way that scientists can now measure the expression levels of thousands of genes simultaneously in a single experiment. Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. But compared to the number of genes involved, available training data sets generally have a fairly small sample size for classification. These training data limitations constitute a challenge to certain classification methodologies. Feature selection techniques can be used to extract the marker genes which influence the classification accuracy effectively by eliminating the un wanted noisy and redundant genes This paper presents a review of feature selection techniques that have been employed in micro array data based cancer classification and also the predominant role of SVM for cancer classification.

  20. REVIEW ON FEATURE SELECTION TECHNIQUES AND THE IMPACT OF SVM FOR CANCER CLASSIFICATION USING GENE EXPRESSION PROFILE

    Directory of Open Access Journals (Sweden)

    G.Victo Sudha George

    2011-09-01

    Full Text Available The DNA microarray technology has modernized the approach of biology research in such a way thatscientists can now measure the expression levels of thousands of genes simultaneously in a singleexperiment. Gene expression profiles, which represent the state of a cell at a molecular level, have greatpotential as a medical diagnosis tool. But compared to the number of genes involved, available trainingdata sets generally have a fairly small sample size for classification. These training data limitationsconstitute a challenge to certain classification methodologies. Feature selection techniques can be usedto extract the marker genes which influence the classification accuracy effectively by eliminating the unwanted noisy and redundant genes This paper presents a review of feature selection techniques that havebeen employed in micro array data based cancer classification and also the predominant role of SVMfor cancer classification.

  1. Electricity market price spike analysis by a hybrid data model and feature selection technique

    International Nuclear Information System (INIS)

    In a competitive electricity market, energy price forecasting is an important activity for both suppliers and consumers. For this reason, many techniques have been proposed to predict electricity market prices in the recent years. However, electricity price is a complex volatile signal owning many spikes. Most of electricity price forecast techniques focus on the normal price prediction, while price spike forecast is a different and more complex prediction process. Price spike forecasting has two main aspects: prediction of price spike occurrence and value. In this paper, a novel technique for price spike occurrence prediction is presented composed of a new hybrid data model, a novel feature selection technique and an efficient forecast engine. The hybrid data model includes both wavelet and time domain variables as well as calendar indicators, comprising a large candidate input set. The set is refined by the proposed feature selection technique evaluating both relevancy and redundancy of the candidate inputs. The forecast engine is a probabilistic neural network, which are fed by the selected candidate inputs of the feature selection technique and predict price spike occurrence. The efficiency of the whole proposed method for price spike occurrence forecasting is evaluated by means of real data from the Queensland and PJM electricity markets. (author)

  2. Iris Recognition System Based on Feature Level Fusion

    Directory of Open Access Journals (Sweden)

    Dr. S. R. Ganorkar

    2013-11-01

    Full Text Available Multibiometric systems utilize the evidence presented by multiple biometric sources (e.g., face and fingerprint, multiple fingers of a single user, multiple matchers, etc. in order to determine or verify the identity of an individual. Information from multiple sources can be consolidated in several distinct levels. But fusion of two different biometric traits are difficult due to (i the feature sets of multiple modalities may be incompatible (e.g., minutiae set of fingerprints and eigen-coefficients of face; (ii the relationship between the feature spaces of different biometric systems may not be known; (iii concatenating two feature vectors may result in a feature vector with very large dimensionality leading to the `curse of dimensionality problem, huge storage space and different processing algorithm. Also if we are use multiple images of single biometric trait, then it doesn’t show much variations. So in this paper, we present a efficient technique of feature-based fusion in a multimodal system where left eye and right eye are used as input. Iris recognition basically contains iris location, feature extraction, and identification. This algorithm uses canny edge detection to identify inner and outer boundary of iris. Then this image is feed to Gabor wavelet transform to extract the feature and finally matching is done by using indexing algorithm. The results from the analysis of works indicate that the proposed technique can lead to substantial improvement in performance.

  3. Image counter-forensics based on feature injection

    Science.gov (United States)

    Iuliani, M.; Rossetto, S.; Bianchi, T.; De Rosa, Alessia; Piva, A.; Barni, M.

    2014-02-01

    Starting from the concept that many image forensic tools are based on the detection of some features revealing a particular aspect of the history of an image, in this work we model the counter-forensic attack as the injection of a specific fake feature pointing to the same history of an authentic reference image. We propose a general attack strategy that does not rely on a specific detector structure. Given a source image x and a target image y, the adversary processes x in the pixel domain producing an attacked image ~x, perceptually similar to x, whose feature f(~x) is as close as possible to f(y) computed on y. Our proposed counter-forensic attack consists in the constrained minimization of the feature distance Φ(z) =│ f(z) - f(y)│ through iterative methods based on gradient descent. To solve the intrinsic limit due to the numerical estimation of the gradient on large images, we propose the application of a feature decomposition process, that allows the problem to be reduced into many subproblems on the blocks the image is partitioned into. The proposed strategy has been tested by attacking three different features and its performance has been compared to state-of-the-art counter-forensic methods.

  4. Hardwood species classification with DWT based hybrid texture feature extraction techniques

    Indian Academy of Sciences (India)

    Arvind R Yadav; R S Anand; M L Dewal; Sangeeta Gupta

    2015-12-01

    In this work, discrete wavelet transform (DWT) based hybrid texture feature extraction techniques have been used to categorize the microscopic images of hardwood species into 75 different classes. Initially, the DWT has been employed to decompose the image up to 7 levels using Daubechies (db3) wavelet as decomposition filter. Further, first-order statistics (FOS) and four variants of local binary pattern (LBP) descriptors are used to acquire distinct features of these images at various levels. The linear support vector machine (SVM), radial basis function (RBF) kernel SVM and random forest classifiers have been employed for classification. The classification accuracy obtained with state-of-the-art and DWT based hybrid texture features using various classifiers are compared. The DWT based FOS-uniform local binary pattern (DWTFOSLBPu2) texture features at the 4th level of image decomposition have produced best classification accuracy of 97.67 ± 0.79% and 98.40 ± 064% for grayscale and RGB images, respectively, using linear SVM classifier. Reduction in feature dataset by minimal redundancy maximal relevance (mRMR) feature selection method is achieved and the best classification accuracy of 99.00 ± 0.79% and 99.20 ± 0.42% have been obtained for DWT based FOS-LBP histogram Fourier features (DWTFOSLBP-HF) technique at the 5th and 6th levels of image decomposition for grayscale and RGB images, respectively, using linear SVM classifier. The DWTFOSLBP-HF features selected with mRMR method has also established superiority amongst the DWT based hybrid texture feature extraction techniques for randomly divided database into different proportions of training and test datasets.

  5. A flower image retrieval method based on ROI feature

    Institute of Scientific and Technical Information of China (English)

    洪安祥; 陈刚; 李均利; 池哲儒; 张亶

    2004-01-01

    Flower image retrieval is a very important step for computer-aided plant species recognition.In this paper,we propose an efficient segmentation method based on color clustering and domain knowledge to extract flower regions from flower images.For flower retrieval,we use the color histogram of a flower region to characterize the color features of flower and two shape-based features sets,Centroid-Contour Distance(CCD)and Angle Code Histogram(ACH),to characterize the shape features of a flower contour.Experimental results showed that our flower region extraction method based on color clustering and domain knowledge can produce accurate flower regions.Flower retrieval results on a database of 885 flower images collected from 14 plant species showed that our Region-of-Interest(ROD based retrieval approach using both color and shape features can perform better than a method based on the global color histogram proposed by Swain and Ballard(1991)and a method based on domain knowledge-driven segmentation and color names proposed by Das et al.(1999).

  6. A flower image retrieval method based on ROI feature

    Institute of Scientific and Technical Information of China (English)

    洪安祥; 陈刚; 李均利; 池哲儒; 张亶

    2004-01-01

    Flower image retrieval is a very important step for computer-aided plant species recognition. In this paper, we propose an efficient segmentation method based on color clustering and domain knowledge to extract flower regions from flower images. For flower retrieval, we use the color histogram of a flower region to characterize the color features of flower and two shape-based features sets, Centroid-Contour Distance (CCD) and Angle Code Histogram (ACH), to characterize the shape features of a flower contour. Experimental results showed that our flower region extraction method based on color clustering and domain knowledge can produce accurate flower regions. Flower retrieval results on a database of 885 flower images collected from 14 plant species showed that our Region-of-Interest (ROI) based retrieval approach using both color and shape features can perform better than a method based on the global color histogram proposed by Swain and Ballard (1991) and a method based on domain knowledge-driven segmentation and color names proposed by Das et al.(1999).

  7. Feature-Based Digital Modulation Recognition Using Compressive Sampling

    Directory of Open Access Journals (Sweden)

    Zhuo Sun

    2016-01-01

    Full Text Available Compressive sensing theory can be applied to reconstruct the signal with far fewer measurements than what is usually considered necessary, while in many scenarios, such as spectrum detection and modulation recognition, we only expect to acquire useful characteristics rather than the original signals, where selecting the feature with sparsity becomes the main challenge. With the aim of digital modulation recognition, the paper mainly constructs two features which can be recovered directly from compressive samples. The two features are the spectrum of received data and its nonlinear transformation and the compositional feature of multiple high-order moments of the received data; both of them have desired sparsity required for reconstruction from subsamples. Recognition of multiple frequency shift keying, multiple phase shift keying, and multiple quadrature amplitude modulation are considered in our paper and implemented in a unified procedure. Simulation shows that the two identification features can work effectively in the digital modulation recognition, even at a relatively low signal-to-noise ratio.

  8. Whispered speaker identification based on feature and model hybrid compensation

    Institute of Scientific and Technical Information of China (English)

    GU Xiaojiang; ZHAO Heming; Lu Gang

    2012-01-01

    In order to increase short time whispered speaker recognition rate in variable chan- nel conditions, the hybrid compensation in model and feature domains was proposed. This method is based on joint factor analysis in training model stage. It extracts speaker factor and eliminates channel factor by estimating training speech speaker and channel spaces. Then in the test stage, the test speech channel factor is projected into feature space to engage in feature compensation, so it can remove channel information both in model and feature domains in order to improve recognition rate. The experiment result shows that the hybrid compensation can obtain the similar recognition rate in the three different training channel conditions and this method is more effective than joint factor analysis in the test of short whispered speech.

  9. Remote Sensing Image Feature Extracting Based Multiple Ant Colonies Cooperation

    Directory of Open Access Journals (Sweden)

    Zhang Zhi-long

    2014-02-01

    Full Text Available This paper presents a novel feature extraction method for remote sensing imagery based on the cooperation of multiple ant colonies. First, multiresolution expression of the input remote sensing imagery is created, and two different ant colonies are spread on different resolution images. The ant colony in the low-resolution image uses phase congruency as the inspiration information, whereas that in the high-resolution image uses gradient magnitude. The two ant colonies cooperate to detect features in the image by sharing the same pheromone matrix. Finally, the image features are extracted on the basis of the pheromone matrix threshold. Because a substantial amount of information in the input image is used as inspiration information of the ant colonies, the proposed method shows higher intelligence and acquires more complete and meaningful image features than those of other simple edge detectors.

  10. Analysis of quantitative pore features based on mathematical morphology

    Institute of Scientific and Technical Information of China (English)

    QI Heng-nian; CHEN Feng-nong; WANG Hang-jun

    2008-01-01

    Wood id