WorldWideScience

Sample records for supervised feature selection

  1. Neural Gen Feature Selection for Supervised Learning Classifier

    Directory of Open Access Journals (Sweden)

    Mohammed Hasan Abdulameer

    2014-04-01

    Full Text Available Face recognition has recently received significant attention, especially during the past few years. Many face recognition techniques were developed such as PSO-SVM and LDA-SVM However, inefficient features in the face recognition may lead to inadequate in the recognition results. Hence, a new face recognition system based on Genetic Algorithm and FFBNN technique is proposed. Our proposed face recognition system initially performs the feature extraction and these optimal features are promoted to the recognition process. In the feature extraction, the optimal features are extracted from the face image database by Genetic Algorithm (GA with FFBNN and the computed optimal features are given to the FFBNN technique to carry out the training and testing process. The optimal features from the feature database are fed to the FFBNN for accomplishing the training process. The well trained FFBNN with the optimal features provide the recognition result. The optimal features in FFBNN by GA efficiently perform the face recognition process. The human face dataset called YALE is utilized to analyze the performance of our proposed GA-FFNN technique and also this GA-FFBNN is compared with standard SVM and PSO-SVM techniques.

  2. Supervised Feature Subset Selection based on Modified Fuzzy Relative Information Measure for classifier Cart

    Directory of Open Access Journals (Sweden)

    K.SAROJINI,

    2010-06-01

    Full Text Available Feature subset selection is an essential task in data mining. This paper presents a new method for dealing with supervised feature subset selection based on Modified Fuzzy Relative Information Measure (MFRIM. First, Discretization algorithm is applied to discretize numeric features to construct the membership functions of each fuzzy sets of a feature. Then the proposed MFRIM is applied to select the feature subset focusing on boundary samples. The proposed method can select feature subset with minimum number of features, which are relevant to get higher average classification accuracy for datasets. The experimental results with UCI datasets show that the proposed algorithm is effective and efficient in selecting subset with minimum number of features getting higher average classification accuracy than the consistency based feature subset selection method.

  3. Interactive prostate segmentation using atlas-guided semi-supervised learning and adaptive feature selection

    Energy Technology Data Exchange (ETDEWEB)

    Park, Sang Hyun [Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599 (United States); Gao, Yaozong, E-mail: yzgao@cs.unc.edu [Department of Computer Science, Department of Radiology, and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599 (United States); Shi, Yinghuan, E-mail: syh@nju.edu.cn [State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023 (China); Shen, Dinggang, E-mail: dgshen@med.unc.edu [Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599 and Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713 (Korea, Republic of)

    2014-11-01

    Purpose: Accurate prostate segmentation is necessary for maximizing the effectiveness of radiation therapy of prostate cancer. However, manual segmentation from 3D CT images is very time-consuming and often causes large intra- and interobserver variations across clinicians. Many segmentation methods have been proposed to automate this labor-intensive process, but tedious manual editing is still required due to the limited performance. In this paper, the authors propose a new interactive segmentation method that can (1) flexibly generate the editing result with a few scribbles or dots provided by a clinician, (2) fast deliver intermediate results to the clinician, and (3) sequentially correct the segmentations from any type of automatic or interactive segmentation methods. Methods: The authors formulate the editing problem as a semisupervised learning problem which can utilize a priori knowledge of training data and also the valuable information from user interactions. Specifically, from a region of interest near the given user interactions, the appropriate training labels, which are well matched with the user interactions, can be locally searched from a training set. With voting from the selected training labels, both confident prostate and background voxels, as well as unconfident voxels can be estimated. To reflect informative relationship between voxels, location-adaptive features are selected from the confident voxels by using regression forest and Fisher separation criterion. Then, the manifold configuration computed in the derived feature space is enforced into the semisupervised learning algorithm. The labels of unconfident voxels are then predicted by regularizing semisupervised learning algorithm. Results: The proposed interactive segmentation method was applied to correct automatic segmentation results of 30 challenging CT images. The correction was conducted three times with different user interactions performed at different time periods, in order to

  4. Supervised feature selection for linear and non-linear regression of L⁎a⁎b⁎ color from multispectral images of meat

    DEFF Research Database (Denmark)

    Sharifzadeh, Sara; Clemmensen, Line Katrine Harder; Borggaard, Claus

    2014-01-01

    feature selection method outperforms the PCA for both linear and non-linear methods. The highest performance was obtained by linear ridge regression applied on the selected features from the proposed Elastic net (EN) -based feature selection strategy. All the best models use a reduced number...... of meat samples (430–970 nm) were used for training and testing of the L⁎a⁎b prediction models. Finding a sparse solution or the use of a minimum number of bands is of particular interest to make an industrial vision set-up simpler and cost effective. In this paper, a wide range of linear, non-linear......, kernel-based regression and sparse regression methods are compared. In order to improve the prediction results of these models, we propose a supervised feature selection strategy which is compared with the Principal component analysis (PCA) as a pre-processing step. The results showed that the proposed...

  5. 基于链接关系的半监督特征选择算法%Linked Social Media Data Based Semi-Supervised Feature Selection Method

    Institute of Scientific and Technical Information of China (English)

    王亦兵; 潘志松; 吴君青; 贾波; 胡谷雨

    2014-01-01

    社会媒体网络产生的海量、高维无标记数据给数据处理工作带来巨大挑战,同时数据样本间构成的链接图信息在现有模式识别算法中难以有效利用。基于此,文中充分挖掘社会媒体网络数据链接关系图,结合部分监督信息提出一种基于链接关系的半监督特征选择算法( SSLFS)。该算法利用谱分析和稀疏约束,使得选出的特征子集保持原数据的局部流形和稀疏特性。在社会媒体数据集Flickr上的实验结果表明,SSLFS相比其他特征选择方法得到的特征子集在分类性能上有较显著提高。%Mountains of high-dimensional, unlabeled data are produced by the social media network, which brings tremendous challenges to the data processing. Meanwhile, the linked graph information between data samples can not be effectively used in the existing pattern recognition algorithms. A semi-supervised feature selection method ( SSLFS) based on linked relations is proposed combined with a little supervised information after mining the linked graph of social media network. Through spectral analysis and sparsity constraint, SSLFS selects feature subsets which maintain the characteristics of local manifold and sparsity. The experimental results on the Flickr dataset show that the subset obtained by SSLFS is more effective when applied to classification compared with those by other methods.

  6. Scale selection for supervised image segmentation

    DEFF Research Database (Denmark)

    Li, Yan; Tax, David M J; Loog, Marco

    2012-01-01

    Finding the right scales for feature extraction is crucial for supervised image segmentation based on pixel classification. There are many scale selection methods in the literature; among them the one proposed by Lindeberg is widely used for image structures such as blobs, edges and ridges. Those...... schemes are usually unsupervised, as they do not take into account the actual segmentation problem at hand. In this paper, we consider the problem of selecting scales, which aims at an optimal discrimination between user-defined classes in the segmentation. We show the deficiency of the classical...... our approach back to Lindeberg's original proposal. In the experiments, the max rule is applied to artificial and real-world image segmentation tasks, which is shown to choose the right scales for different problems and lead to better segmentation results. © 2012 Elsevier B.V....

  7. 融合PLS监督特征提取和虚假最近邻点的数据分类特征选择%Feature selection for data classification based on pls supervised feature extraction and false nearest neighbors

    Institute of Scientific and Technical Information of China (English)

    颜克胜; 李太福; 魏正元; 苏盈盈; 姚立忠

    2012-01-01

    The classifier is often led to the problem of low recognition accuracy and time and space overhead, due to the multicollinearity and redundant features and noise in the classification of high dimensional data. A feature selection method based on partial least squares(PLS) and false nearest neighbors(FNN) is proposed. Firstly, the partial least squares method is employed to extract the principal components of high-dimensional data and overcome difficulties encountered with the existing multicollinearity between the original features, and the independent principal components space which carries supervision information could be obtained. Then, the similarity measure based on FNN would be established by calculating the correlation in this space before and after each feature selection, furthermore, gets the original features ranking of interpretation to the dependent variable. Finally, the features which have weak explanatory ability could be removed in turn to construct various classification models, and uses recognition rate of Support Vector Machine(SVM) as a evaluation criterion of models to search out the classification model which not only has the highest recognition rate, but also contains the least number of features, the best feature subset is the just model. A series of experiments from different data models have been conducted. The simulation results show that this method has a good capability to select the best feature subset which is consistent with the nature of classification feature for the data set. Therefore, the research provides a new approach to the feature selection of data classification.%在高维数据分类中,针对多重共线性、冗余特征及噪声易导致分类器识别精度低和时空开销大的问题,提出融合偏最小二乘(Partial Least Squares,PLS)有监督特征提取和虚假最近邻点(False Nearest Neighbors,FNN)的特征选择方法:首先利用偏最小二乘对高维数据提取主元,消除特征之间的多重共

  8. Feature-space transformation improves supervised segmentation across scanners

    DEFF Research Database (Denmark)

    van Opbroek, Annegreet; Achterberg, Hakim C.; de Bruijne, Marleen

    2015-01-01

    Image-segmentation techniques based on supervised classification generally perform well on the condition that training and test samples have the same feature distribution. However, if training and test images are acquired with different scanners or scanning parameters, their feature distributions...

  9. Discovery of feature-based hot spots using supervised clustering

    Science.gov (United States)

    Ding, Wei; Stepinski, Tomasz F.; Parmar, Rachana; Jiang, Dan; Eick, Christoph F.

    2009-07-01

    Feature-based hot spots are localized regions where the attributes of objects attain high values. There is considerable interest in automatic identification of feature-based hot spots. This paper approaches the problem of finding feature-based hot spots from a data mining perspective, and describes a method that relies on supervised clustering to produce a list of hot spot regions. Supervised clustering uses a fitness function rewarding isolation of the hot spots to optimally subdivide the dataset. The clusters in the optimal division are ranked using the interestingness of clusters that encapsulate their utility for being hot spots. Hot spots are associated with the top ranked clusters. The effectiveness of supervised clustering as a hot spot identification method is evaluated for four conceptually different clustering algorithms using a dataset describing the spatial distribution of ground ice on Mars. Clustering solutions are visualized by specially developed raster approximations. Further assessment of the ability of different algorithms to yield hot spots is performed using raster approximations. Density-based clustering algorithm is found to be the most effective for hot spot identification. The results of the hot spot discovery by supervised clustering are comparable to those obtained using the G* statistic, but the new method offers a high degree of automation, making it an ideal tool for mining large datasets for the existence of potential hot spots.

  10. Unsupervised Feature Subset Selection

    DEFF Research Database (Denmark)

    Søndberg-Madsen, Nicolaj; Thomsen, C.; Pena, Jose

    2003-01-01

    This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some...... irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily....

  11. Integrated Financial Supervision: Experiences in Selected Countries

    OpenAIRE

    Edgardo Demaestri; Diego Sourrouille

    2003-01-01

    This paper represents one of the first comparative analyses of experiences of integrated supervision. It discusses how several countries around the world have developed the processes of integrating financial regulation and supervision, and covers numerous relevant technical issues as well as the policy options. It describes the scope of the activities, institutions, responsibilities, and regulatory powers that integrated supervisors are expected to cover. Issues related to the organizational ...

  12. [RVM supervised feature extraction and Seyfert spectra classification].

    Science.gov (United States)

    Li, Xiang-Ru; Hu, Zhan-Yi; Zhao, Yong-Heng; Li, Xiao-Ming

    2009-06-01

    With recent technological advances in wide field survey astronomy and implementation of several large-scale astronomical survey proposals (e. g. SDSS, 2dF and LAMOST), celestial spectra are becoming very abundant and rich. Therefore, research on automated classification methods based on celestial spectra has been attracting more and more attention in recent years. Feature extraction is a fundamental problem in automated spectral classification, which not only influences the difficulty and complexity of the problem, but also determines the performance of the designed classifying system. The available methods of feature extraction for spectra classification are usually unsupervised, e. g. principal components analysis (PCA), wavelet transform (WT), artificial neural networks (ANN) and Rough Set theory. These methods extract features not by their capability to classify spectra, but by some kind of power to approximate the original celestial spectra. Therefore, the extracted features by these methods usually are not the best ones for classification. In the present work, the authors pointed out the necessary to investigate supervised feature extraction by analyzing the characteristics of the spectra classification research in available literature and the limitations of unsupervised feature extracting methods. And the authors also studied supervised feature extracting based on relevance vector machine (RVM) and its application in Seyfert spectra classification. RVM is a recently introduced method based on Bayesian methodology, automatic relevance determination (ARD), regularization technique and hierarchical priors structure. By this method, the authors can easily fuse the information in training data, the authors' prior knowledge and belief in the problem, etc. And RVM could effectively extract the features and reduce the data based on classifying capability. Extensive experiments show its superior performance in dimensional reduction and feature extraction for Seyfert

  13. Feature selection in bioinformatics

    Science.gov (United States)

    Wang, Lipo

    2012-06-01

    In bioinformatics, there are often a large number of input features. For example, there are millions of single nucleotide polymorphisms (SNPs) that are genetic variations which determine the dierence between any two unrelated individuals. In microarrays, thousands of genes can be proled in each test. It is important to nd out which input features (e.g., SNPs or genes) are useful in classication of a certain group of people or diagnosis of a given disease. In this paper, we investigate some powerful feature selection techniques and apply them to problems in bioinformatics. We are able to identify a very small number of input features sucient for tasks at hand and we demonstrate this with some real-world data.

  14. Online feature selection with streaming features.

    Science.gov (United States)

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  15. The Importance of Feature Selection in Classification

    Directory of Open Access Journals (Sweden)

    Mrs.K. Moni Sushma Deep

    2014-01-01

    Full Text Available Feature Selection is an important technique for classification for reducing the dimensionality of feature space and it removes redundant, irrelevant, or noisy data. In this paper the feature are selected based on the ranking methods.(1 Information Gain (IG attribute evaluation, (2 Gain Ratio (GR attribute evaluation, (3 Symmetrical Uncertainty (SU attribute evaluation. This paper evaluates the features which are derived from the 3 methods using supervised learning algorithms K-Nearest Neighbor and Naïve Bayes. The measures used for the classifier are True Positive, False Positive, Accuracy and they compared between the algorithm for experimental results. we have taken 2 data sets Pima and Wine from UCI Repository database.

  16. SLEAS: Supervised Learning using Entropy as Attribute Selection Measure

    Directory of Open Access Journals (Sweden)

    Kishor Kumar Reddy C

    2014-10-01

    Full Text Available There is embryonic importance in scaling up the broadly used decision tree learning algorithms to huge datasets. Even though abundant diverse methodologies have been proposed, a fast tree growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential to a greater extent. This paper aims at improving the performance of the SLIQ (Supervised Learning in Quest decision tree algorithm for classification in data mining. In the present research, we adopted entropy as attribute selection measure, which overcomes the problems facing with Gini Index. Classification accuracy of the proposed supervised learning using entropy as attribute selection measure (SLEAS algorithm is compared with the existing SLIQ algorithm using twelve datasets taken from UCI Machine Learning Repository, and the results yields that the SLEAS outperforms when compared with SLIQ decision tree. Further, error rate is also computed and the results clearly show that the SLEAS algorithm is giving less error rate when compared with SLIQ decision tree.

  17. Feature Selection: Algorithms and Challenges

    Institute of Scientific and Technical Information of China (English)

    Xindong Wu; Yanglan Gan; Hao Wang; Xuegang Hu

    2006-01-01

    Feature selection is an active area in data mining research and development. It consists of efforts and contributions from a wide variety of communities, including statistics, machine learning, and pattern recognition. The diversity, on one hand, equips us with many methods and tools. On the other hand, the profusion of options causes confusion. This paper reviews various feature selection methods and identifies research challenges that are at the forefront of this exciting area.

  18. Supervised classification of solar features using prior information

    Directory of Open Access Journals (Sweden)

    De Visscher Ruben

    2015-01-01

    Full Text Available Context: The Sun as seen by Extreme Ultraviolet (EUV telescopes exhibits a variety of large-scale structures. Of particular interest for space-weather applications is the extraction of active regions (AR and coronal holes (CH. The next generation of GOES-R satellites will provide continuous monitoring of the solar corona in six EUV bandpasses that are similar to the ones provided by the SDO-AIA EUV telescope since May 2010. Supervised segmentations of EUV images that are consistent with manual segmentations by for example space-weather forecasters help in extracting useful information from the raw data. Aims: We present a supervised segmentation method that is based on the Maximum A Posteriori rule. Our method allows integrating both manually segmented images as well as other type of information. It is applied on SDO-AIA images to segment them into AR, CH, and the remaining Quiet Sun (QS part. Methods: A Bayesian classifier is applied on training masks provided by the user. The noise structure in EUV images is non-trivial, and this suggests the use of a non-parametric kernel density estimator to fit the intensity distribution within each class. Under the Naive Bayes assumption we can add information such as latitude distribution and total coverage of each class in a consistent manner. Those information can be prescribed by an expert or estimated with an Expectation-Maximization algorithm. Results: The segmentation masks are in line with the training masks given as input and show consistency over time. Introduction of additional information besides pixel intensity improves upon the quality of the final segmentation. Conclusions: Such a tool can aid in building automated segmentations that are consistent with some ground truth’ defined by the users.

  19. Adaptive feature selection for hyperspectral data analysis

    Science.gov (United States)

    Korycinski, Donna; Crawford, Melba M.; Barnes, J. Wesley

    2004-02-01

    Hyperspectral data can potentially provide greatly improved capability for discrimination between many land cover types, but new methods are required to process these data and extract the required information. Data sets are extremely large, and the data are not well distributed across these high dimensional spaces. The increased number and resolution of spectral bands, many of which are highly correlated, is problematic for supervised statistical classification techniques when the number of training samples is small relative to the dimension of the input vector. Selection of the most relevant subset of features is one means of mitigating these effects. A new algorithm based on the tabu search metaheuristic optimization technique was developed to perform subset feature selection and implemented within a binary hierarchical tree framework. Results obtained using the new approach were compared to those from a greedy common greedy selection technique and to a Fisher discriminant based feature extraction method, both of which were implemented in the same binary hierarchical tree classification scheme. The tabu search based method generally yielded higher classification accuracies with lower variability than these other methods in experiments using hyperspectral data acquired by the EO-1 Hyperion sensor over the Okavango Delta of Botswana.

  20. A New Feature Selection Method for Text Clustering

    Institute of Scientific and Technical Information of China (English)

    XU Junling; XU Baowen; ZHANG Weifeng; CUI Zifeng; ZHANG Wei

    2007-01-01

    Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the DaviesBouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.

  1. Automatic feature selection for model-based reinforcement learning in factored MDPs

    NARCIS (Netherlands)

    Kroon, M.; Whiteson, S.; Wani, M.A.; Kantardzic, M.; Palade, V.; Kurgan, L.; Qi, A.

    2009-01-01

    Feature selection is an important challenge in machine learning. Unfortunately, most methods for automating feature selection are designed for supervised learning tasks and are thus either inapplicable or impractical for reinforcement learning. This paper presents a new approach to feature selection

  2. Fast Localization in Large-Scale Environments Using Supervised Indexing of Binary Features.

    Science.gov (United States)

    Youji Feng; Lixin Fan; Yihong Wu

    2016-01-01

    The essence of image-based localization lies in matching 2D key points in the query image and 3D points in the database. State-of-the-art methods mostly employ sophisticated key point detectors and feature descriptors, e.g., Difference of Gaussian (DoG) and Scale Invariant Feature Transform (SIFT), to ensure robust matching. While a high registration rate is attained, the registration speed is impeded by the expensive key point detection and the descriptor extraction. In this paper, we propose to use efficient key point detectors along with binary feature descriptors, since the extraction of such binary features is extremely fast. The naive usage of binary features, however, does not lend itself to significant speedup of localization, since existing indexing approaches, such as hierarchical clustering trees and locality sensitive hashing, are not efficient enough in indexing binary features and matching binary features turns out to be much slower than matching SIFT features. To overcome this, we propose a much more efficient indexing approach for approximate nearest neighbor search of binary features. This approach resorts to randomized trees that are constructed in a supervised training process by exploiting the label information derived from that multiple features correspond to a common 3D point. In the tree construction process, node tests are selected in a way such that trees have uniform leaf sizes and low error rates, which are two desired properties for efficient approximate nearest neighbor search. To further improve the search efficiency, a probabilistic priority search strategy is adopted. Apart from the label information, this strategy also uses non-binary pixel intensity differences available in descriptor extraction. By using the proposed indexing approach, matching binary features is no longer much slower but slightly faster than matching SIFT features. Consequently, the overall localization speed is significantly improved due to the much faster key

  3. Enhanced features for supervised lecture video segmentation and indexing

    Science.gov (United States)

    Ma, Di; Agam, Gady

    2015-03-01

    Lecture videos are common and increase rapidly. Consequently, automatically and efficiently indexing such videos is an important task. Video segmentation is a crucial step of video indexing that directly affects the indexing quality. We are developing a system for automated video indexing and in this paper discuss our approach for video segmentation and classification of video segments. The novel contributions in this paper are twofold. First we develop a dynamic Gabor filter and use it to extract features for video frame classification. Second, we propose a recursive video segmentation algorithm that is capable of clustering video frames into video segments. We then use these to classify and index the video segments. The proposed approach results in a higher True Positive Rate(TPR) 89.5% and lower False Discovery Rate(FDR) 11.2% compared with the commercial system(TPR= 81.8%, FDR=39.4%) demonstrate that the performance is significantly improved by using enhanced features.

  4. Feature Selection and Effective Classifiers.

    Science.gov (United States)

    Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri

    1998-01-01

    Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…

  5. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu....... While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative...

  6. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu....... While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative...

  7. Feature Selection in Scientific Applications

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E; Newsam, S; Kamath, C

    2004-02-27

    Numerous applications of data mining to scientific data involve the induction of a classification model. In many cases, the collection of data is not performed with this task in mind, and therefore, the data might contain irrelevant or redundant features that affect negatively the accuracy of the induction algorithms. The size and dimensionality of typical scientific data make it difficult to use any available domain information to identify features that discriminate between the classes of interest. Similarly, exploratory data analysis techniques have limitations on the amount and dimensionality of the data that can be effectively processed. In this paper, we describe applications of efficient feature selection methods to data sets from astronomy, plasma physics, and remote sensing. We use variations of recently proposed filter methods as well as traditional wrapper approaches where practical. We discuss the importance of these applications, the general challenges of feature selection in scientific datasets, the strategies for success that were common among our diverse applications, and the lessons learned in solving these problems.

  8. Supervised feature evaluation by consistency analysis: application to measure sets used to characterise geographic objects

    CERN Document Server

    Taillandier, Patrick

    2012-01-01

    Nowadays, supervised learning is commonly used in many domains. Indeed, many works propose to learn new knowledge from examples that translate the expected behaviour of the considered system. A key issue of supervised learning concerns the description language used to represent the examples. In this paper, we propose a method to evaluate the feature set used to describe them. Our method is based on the computation of the consistency of the example base. We carried out a case study in the domain of geomatic in order to evaluate the sets of measures used to characterise geographic objects. The case study shows that our method allows to give relevant evaluations of measure sets.

  9. Fatigue Level Estimation of Bill Based on Acoustic Signal Feature by Supervised SOM

    Science.gov (United States)

    Teranishi, Masaru; Omatu, Sigeru; Kosaka, Toshihisa

    Fatigued bills have harmful influence on daily operation of Automated Teller Machine(ATM). To make the fatigued bills classification more efficient, development of an automatic fatigued bill classification method is desired. We propose a new method to estimate bending rigidity of bill from acoustic signal feature of banking machines. The estimated bending rigidities are used as continuous fatigue level for classification of fatigued bill. By using the supervised Self-Organizing Map(supervised SOM), we estimate the bending rigidity from only the acoustic energy pattern effectively. The experimental result with real bill samples shows the effectiveness of the proposed method.

  10. A New Heuristic for Feature Selection by Consistent Biclustering

    CERN Document Server

    Mucherino, Antonio

    2010-01-01

    Given a set of data, biclustering aims at finding simultaneous partitions in biclusters of its samples and of the features which are used for representing the samples. Consistent biclusterings allow to obtain correct classifications of the samples from the known classification of the features, and vice versa, and they are very useful for performing supervised classifications. The problem of finding consistent biclusterings can be seen as a feature selection problem, where the features that are not relevant for classification purposes are removed from the set of data, while the total number of features is maximized in order to preserve information. This feature selection problem can be formulated as a linear fractional 0-1 optimization problem. We propose a reformulation of this problem as a bilevel optimization problem, and we present a heuristic algorithm for an efficient solution of the reformulated problem. Computational experiments show that the presented algorithm is able to find better solutions with re...

  11. CBFS: high performance feature selection algorithm based on feature clearness.

    Directory of Open Access Journals (Sweden)

    Minseok Seo

    Full Text Available BACKGROUND: The goal of feature selection is to select useful features and simultaneously exclude garbage features from a given dataset for classification purposes. This is expected to bring reduction of processing time and improvement of classification accuracy. METHODOLOGY: In this study, we devised a new feature selection algorithm (CBFS based on clearness of features. Feature clearness expresses separability among classes in a feature. Highly clear features contribute towards obtaining high classification accuracy. CScore is a measure to score clearness of each feature and is based on clustered samples to centroid of classes in a feature. We also suggest combining CBFS and other algorithms to improve classification accuracy. CONCLUSIONS/SIGNIFICANCE: From the experiment we confirm that CBFS is more excellent than up-to-date feature selection algorithms including FeaLect. CBFS can be applied to microarray gene selection, text categorization, and image classification.

  12. Feature selection with the image grand tour

    Science.gov (United States)

    Marchette, David J.; Solka, Jeffrey L.

    2000-08-01

    The grand tour is a method for visualizing high dimensional data by presenting the user with a set of projections and the projected data. This idea was extended to multispectral images by viewing each pixel as a multidimensional value, and viewing the projections of the grand tour as an image. The user then looks for projections which provide a useful interpretation of the image, for example, separating targets from clutter. We discuss a modification of this which allows the user to select convolution kernels which provide useful discriminant ability, both in an unsupervised manner as in the image grand tour, or in a supervised manner using training data. This approach is extended to other window-based features. For example, one can define a generalization of the median filter as a linear combination of the order statistics within a window. Thus the median filter is that projection containing zeros everywhere except for the middle value, which contains a one. Using the convolution grand tour one can select projections on these order statistics to obtain new nonlinear filters.

  13. SELECTED FEATURES OF POLISH FARMERS

    Directory of Open Access Journals (Sweden)

    Grzegorz Spychalski

    2013-12-01

    Full Text Available The paper presents results of the research carried out among farm owners in Wielkopolskie voivodeship referring to selected features of social capital. The author identifies and estimates impact of some socio-professional factors on social capital quality and derives statistical conclusion. As a result there is a list of economic policy measures facilitating rural areas development in this aspect. The level of education, civic activity and tendency for collective activity are main conditions of social capital quality in Polish rural areas.

  14. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    LiuYongguo; LiXueming; 等

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database.However,some redundant irrelevant attributes,which result in low performance and high computing complexity,are included in the very large database in general.So,Feature Selection(FSS)becomes one important issue in the field of data mining.In this letter,an Fss model based on the filter approach is built,which uses the simulated annealing gentic algorithm.Experimental results show that convergence and stability of this algorithm are adequately achieved.

  15. THE FEATURE SUBSET SELECTION ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    Liu Yongguo; Li Xueming; Wu Zhongfu

    2003-01-01

    The motivation of data mining is how to extract effective information from huge data in very large database. However, some redundant and irrelevant attributes, which result in low performance and high computing complexity, are included in the very large database in general.So, Feature Subset Selection (FSS) becomes one important issue in the field of data mining. In this letter, an FSS model based on the filter approach is built, which uses the simulated annealing genetic algorithm. Experimental results show that convergence and stability of this algorithm are adequately achieved.

  16. Partial imputation to improve predictive modelling in insurance risk classification using a hybrid positive selection algorithm and correlation-based feature selection

    CSIR Research Space (South Africa)

    Duma, M

    2013-09-01

    Full Text Available We propose a hybrid missing data imputation technique using positive selection and correlation-based feature selection for insurance data. The hybrid is used to help supervised learning methods improve their classification accuracy and resilience...

  17. Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction

    Institute of Scientific and Technical Information of China (English)

    房磊; 刘彪; 黄民烈

    2015-01-01

    Product feature and opinion word extraction is very important for fine granular sentiment analysis. In this paper, we leverage large-scale unlabeled data for joint extraction of feature and opinion words under a knowledge poor setting, in which only a few feature-opinion pairs are utilized as weak supervision. Our major contributions are two-fold: first, we propose a data-driven approach to represent product features and opinion words as a list of corpus-level syntactic relations, which captures rich language structures;second, we build a simple yet robust unsupervised model with prior knowledge incorporated to extract new feature and opinion words, which obtains high performance robustly. The extraction process is based upon a bootstrapping framework which, to some extent, reduces error propagation under large data. Experimental results under various settings compared with state-of-the-art baselines demonstrate that our method is effective and promising.

  18. Rough set-based feature selection method

    Institute of Scientific and Technical Information of China (English)

    ZHAN Yanmei; ZENG Xiangyang; SUN Jincai

    2005-01-01

    A new feature selection method is proposed based on the discern matrix in rough set in this paper. The main idea of this method is that the most effective feature, if used for classification, can distinguish the most number of samples belonging to different classes. Experiments are performed using this method to select relevant features for artificial datasets and real-world datasets. Results show that the selection method proposed can correctly select all the relevant features of artificial datasets and drastically reduce the number of features at the same time. In addition, when this method is used for the selection of classification features of real-world underwater targets,the number of classification features after selection drops to 20% of the original feature set, and the classification accuracy increases about 6% using dataset after feature selection.

  19. Feature Extraction Using Supervised Independent Component Analysis by Maximizing Class Distance

    Science.gov (United States)

    Sakaguchi, Yoshinori; Ozawa, Seiichi; Kotani, Manabu

    Recently, Independent Component Analysis (ICA) has been applied to not only problems of blind signal separation, but also feature extraction of patterns. However, the effectiveness of pattern features extracted by conventional ICA algorithms depends on pattern sets; that is, how patterns are distributed in the feature space. As one of the reasons, we have pointed out that ICA features are obtained by increasing only their independence even if the class information is available. In this context, we can expect that more high-performance features can be obtained by introducing the class information into conventional ICA algorithms. In this paper, we propose a supervised ICA (SICA) that maximizes Mahalanobis distance between features of different classes as well as maximize their independence. In the first experiment, two-dimensional artificial data are applied to the proposed SICA algorithm to see how maximizing Mahalanobis distance works well in the feature extraction. As a result, we demonstrate that the proposed SICA algorithm gives good features with high separability as compared with principal component analysis and a conventional ICA. In the second experiment, the recognition performance of features extracted by the proposed SICA is evaluated using the three data sets of UCI Machine Learning Repository. From the results, we show that the better recognition accuracy is obtained using our proposed SICA. Furthermore, we show that pattern features extracted by SICA are better than those extracted by only maximizing the Mahalanobis distance.

  20. Regularized generalized eigen-decomposition with applications to sparse supervised feature extraction and sparse discriminant analysis

    DEFF Research Database (Denmark)

    Han, Xixuan; Clemmensen, Line Katrine Harder

    2015-01-01

    techniques, for instance, 2D-Linear Discriminant Analysis (2D-LDA). Furthermore, an iterative algorithm based on the alternating direction method of multipliers is developed. The algorithm approximately solves RGED with monotonically decreasing convergence and at an acceptable speed for results of modest......We propose a general technique for obtaining sparse solutions to generalized eigenvalue problems, and call it Regularized Generalized Eigen-Decomposition (RGED). For decades, Fisher's discriminant criterion has been applied in supervised feature extraction and discriminant analysis...... accuracy. Numerical experiments based on four data sets of different types of images show that RGED has competitive classification performance with existing multidimensional and sparse techniques of discriminant analysis....

  1. Supervised pixel classification using a feature space derived from an artificial visual system

    Science.gov (United States)

    Baxter, Lisa C.; Coggins, James M.

    1991-01-01

    Image segmentation involves labelling pixels according to their membership in image regions. This requires the understanding of what a region is. Using supervised pixel classification, the paper investigates how groups of pixels labelled manually according to perceived image semantics map onto the feature space created by an Artificial Visual System. Multiscale structure of regions are investigated and it is shown that pixels form clusters based on their geometric roles in the image intensity function, not by image semantics. A tentative abstract definition of a 'region' is proposed based on this behavior.

  2. Semi-supervised classification of emotional pictures based on feature combination

    Science.gov (United States)

    Li, Shuo; Zhang, Yu-Jin

    2011-02-01

    Can the abundant emotions reflected in pictures be classified automatically by computer? Only the visual features extracted from images are considered in the previous researches, which have the constrained capability to reveal various emotions. In addition, the training database utilized by previous methods is the subset of International Affective Picture System (IAPS) that has a relatively small scale, which exerts negative effects on the discrimination of emotion classifiers. To solve the above problems, this paper proposes a novel and practical emotional picture classification approach, using semi-supervised learning scheme with both visual feature and keyword tag information. Besides the IAPS with both emotion labels and keyword tags as part of the training dataset, nearly 2000 pictures with only keyword tags that are downloaded from the website Flickr form an auxiliary training dataset. The visual feature of the latent emotional semantic factors is extracted by probabilistic Latent Semantic Analysis (pLSA) model, while the text feature is described by binary vectors on the tag vocabulary. A first Linear Programming Boost (LPBoost) classifier which is trained on the samples from IAPS combines the above two features, and aims to label the other training samples from the internet. Then the second SVM classifier which is trained on all training images using only visual feature, focuses on the test images. In the experiment, the categorization performance of our approach is better than the latest methods.

  3. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

    Science.gov (United States)

    Zhou, Fuqun; Zhang, Aining

    2016-10-25

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.

  4. 用于co-training的特征选择技术%Feature selection for co-training

    Institute of Scientific and Technical Information of China (English)

    李国正; 天羽

    2008-01-01

    Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeleddata for each other and to predict the test sample together. Previous studies show that redundant information can helpimprove the ratio of prediction accuracy between semi-supervised learning methods and supervised learning methods. However,redundant information often practically hurts the performance of learning machines. This paper investigates what redundantfeatures have effect on the semi-supervised learning methods, e.g. co-training, and how to remove the redundant features aswell as the irrelevant features. Here, FESCOT (feature selection for co-training) is proposed to improve the generalizationperformance of co-training with feature selection. Experimental results on artificial and real world data sets show that FESCOThelps to remove irrelevant and redundant features that hurt the performance of the co-training method.

  5. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built...

  6. Genetic Feature Selection for Texture Classification

    Institute of Scientific and Technical Information of China (English)

    PAN Li; ZHENG Hong; ZHANG Zuxun; ZHANG Jianqing

    2004-01-01

    This paper presents a novel approach to feature subset selection using genetic algorithms. This approach has the ability to accommodate multiple criteria such as the accuracy and cost of classification into the process of feature selection and finds the effective feature subset for texture classification. On the basis of the effective feature subset selected, a method is described to extract the objects which are higher than their surroundings, such as trees or forest, in the color aerial images. The methodology presented in this paper is illustrated by its application to the problem of trees extraction from aerial images.

  7. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built....... The method is tested and compared against sequential forward feature selection and random search in a dataset derived from a game survey experiment which contains bimodal input features (physiological and gameplay) and expressed pairwise preferences of affect. Results suggest that the proposed method...

  8. Active constraints selection based semi-supervised dimensionality in ensemble subspaces

    Institute of Scientific and Technical Information of China (English)

    Jie Zeng; Wei Nie; Yong Zhang

    2015-01-01

    Semi-supervised dimensionality reduction (SSDR) has attracted an increasing amount of attention in this big-data era. Many algorithms have been developed with a smal number of pairwise constraints to achieve performances comparable to those of ful y supervised methods. However, one chal enging problem with semi-supervised approaches is the appropriate choice of the constraint set, including the cardinality and the composition of the constraint set, which to a large extent, affects the performance of the resulting algorithm. In this work, we address the problem by incorporating ensemble subspace and active learning into dimen-sionality reduction and propose a new algorithm, termed as global and local scatter based SSDR with active pairwise constraints selection in ensemble subspaces (SSGL-ESA). Unlike traditional methods that select the supervised information in one subspace, we pick up pairwise constraints in ensemble subspace, where a novel active learning algorithm is designed with both exploration and filtering to generate informative pairwise constraints. The auto-matic constraint selection approach proposed in this paper can be generalized to be used with al constraint-based semi-supervised learning algorithms. Comparative experiments are conducted on two face database and the results validate the effectiveness of the proposed method.

  9. Embedded Incremental Feature Selection for Reinforcement Learning

    Science.gov (United States)

    2012-05-01

    Classical reinforcement learning techniques become impractical in domains with large complex state spaces. The size of a domain’s state space is...require all the provided features. In this paper we present a feature selection algorithm for reinforcement learning called Incremental Feature

  10. Genetic Counseling Supervisors' Self-Efficacy for Select Clinical Supervision Competencies.

    Science.gov (United States)

    Finley, Sabra Ledare; Veach, Pat McCarthy; MacFarlane, Ian M; LeRoy, Bonnie S; Callanan, Nancy

    2016-04-01

    Supervision is a primary instructional vehicle for genetic counseling student clinical training. Approximately two-thirds of genetic counselors report teaching and education roles, which include supervisory roles. Recently, Eubanks Higgins and colleagues published the first comprehensive list of empirically-derived genetic counseling supervisor competencies. Studies have yet to evaluate whether supervisors possess these competencies and whether their competencies differ as a function of experience. This study investigated three research questions: (1) What are genetic counselor supervisors' perceptions of their capabilities (self-efficacy) for a select group of supervisor competencies?, (2) Are there differences in self-efficacy as a function of their supervision experience or their genetic counseling experience, and 3) What training methods do they use and prefer to develop supervision skills? One-hundred thirty-one genetic counselor supervisors completed an anonymous online survey assessing demographics, self-efficacy (self-perceived capability) for 12 goal setting and 16 feedback competencies (Scale: 0-100), competencies that are personally challenging, and supervision training experiences and preferences (open-ended). A MANOVA revealed significant positive effects of supervision experience but not genetic counseling experience on participants' self-efficacy. Although mean self-efficacy ratings were high (>83.7), participant comments revealed several challenging competencies (e.g., incorporating student's report of feedback from previous supervisors into goal setting, and providing feedback about student behavior rather than personal traits). Commonly preferred supervision training methods included consultation with colleagues, peer discussion, and workshops/seminars.

  11. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-13

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.

  12. Modeling neuron selectivity over simple midlevel features for image classification.

    Science.gov (United States)

    Shu Kong; Zhuolin Jiang; Qiang Yang

    2015-08-01

    We now know that good mid-level features can greatly enhance the performance of image classification, but how to efficiently learn the image features is still an open question. In this paper, we present an efficient unsupervised midlevel feature learning approach (MidFea), which only involves simple operations, such as k-means clustering, convolution, pooling, vector quantization, and random projection. We show this simple feature can also achieve good performance in traditional classification task. To further boost the performance, we model the neuron selectivity (NS) principle by building an additional layer over the midlevel features prior to the classifier. The NS-layer learns category-specific neurons in a supervised manner with both bottom-up inference and top-down analysis, and thus supports fast inference for a query image. Through extensive experiments, we demonstrate that this higher level NS-layer notably improves the classification accuracy with our simple MidFea, achieving comparable performances for face recognition, gender classification, age estimation, and object categorization. In particular, our approach runs faster in inference by an order of magnitude than sparse coding-based feature learning methods. As a conclusion, we argue that not only do carefully learned features (MidFea) bring improved performance, but also a sophisticated mechanism (NS-layer) at higher level boosts the performance further.

  13. Prominent feature selection of microarray data

    Institute of Scientific and Technical Information of China (English)

    Yihui Liu

    2009-01-01

    For wavelet transform, a set of orthogonal wavelet basis aims to detect the localized changing features contained in microarray data. In this research, we investigate the performance of the selected wavelet features based on wavelet detail coefficients at the second level and the third level. The genetic algorithm is performed to optimize wavelet detail coefficients to select the best discriminant features. Exper-iments are carried out on four microarray datasets to evaluate the performance of classification. Experimental results prove that wavelet features optimized from detail coefficients efficiently characterize the differences between normal tissues and cancer tissues.

  14. Stable Feature Selection for Biomarker Discovery

    CERN Document Server

    He, Zengyou

    2010-01-01

    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development.

  15. Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction

    Science.gov (United States)

    Su, Zuqiang; Xiao, Hong; Zhang, Yi; Tang, Baoping; Jiang, Yonghua

    2017-04-01

    Extraction of sensitive features is a challenging but key task in data-driven machinery running state identification. Aimed at solving this problem, a method for machinery running state identification that applies discriminant semi-supervised local tangent space alignment (DSS-LTSA) for feature fusion and extraction is proposed. Firstly, in order to extract more distinct features, the vibration signals are decomposed by wavelet packet decomposition WPD, and a mixed-domain feature set consisted of statistical features, autoregressive (AR) model coefficients, instantaneous amplitude Shannon entropy and WPD energy spectrum is extracted to comprehensively characterize the properties of machinery running state(s). Then, the mixed-dimension feature set is inputted into DSS-LTSA for feature fusion and extraction to eliminate redundant information and interference noise. The proposed DSS-LTSA can extract intrinsic structure information of both labeled and unlabeled state samples, and as a result the over-fitting problem of supervised manifold learning and blindness problem of unsupervised manifold learning are overcome. Simultaneously, class discrimination information is integrated within the dimension reduction process in a semi-supervised manner to improve sensitivity of the extracted fusion features. Lastly, the extracted fusion features are inputted into a pattern recognition algorithm to achieve the running state identification. The effectiveness of the proposed method is verified by a running state identification case in a gearbox, and the results confirm the improved accuracy of the running state identification.

  16. ECG Signal Feature Selection for Emotion Recognition

    Directory of Open Access Journals (Sweden)

    Lichen Xun

    2013-01-01

    Full Text Available This paper aims to study the selection of features based on ECG in emotion recognition. In the process of features selection, we start from existing feature selection algorithm, and pay special attention to some of the intuitive value on ECG waveform as well. Through the use of ANOVA and heuristic search, we picked out the different features to distinguish joy and pleasure these two emotions, then we combine this with pathological analysis of ECG signals by the view of the medical experts to discuss the logic corresponding relation between ECG waveform and emotion distinguish. Through experiment, using the method in this paper we only picked out five features and reached 92% of accuracy rate in the recognition of joy and pleasure.

  17. A Genetic Algorithm-Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Babatunde Oluleye

    2014-07-01

    Full Text Available This article details the exploration and application of Genetic Algorithm (GA for feature selection. Particularly a binary GA was used for dimensionality reduction to enhance the performance of the concerned classifiers. In this work, hundred (100 features were extracted from set of images found in the Flavia dataset (a publicly available dataset. The extracted features are Zernike Moments (ZM, Fourier Descriptors (FD, Lengendre Moments (LM, Hu 7 Moments (Hu7M, Texture Properties (TP and Geometrical Properties (GP. The main contributions of this article are (1 detailed documentation of the GA Toolbox in MATLAB and (2 the development of a GA-based feature selector using a novel fitness function (kNN-based classification error which enabled the GA to obtain a combinatorial set of feature giving rise to optimal accuracy. The results obtained were compared with various feature selectors from WEKA software and obtained better results in many ways than WEKA feature selectors in terms of classification accuracy

  18. Classification Using Markov Blanket for Feature Selection

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Luo, Jian

    2009-01-01

    Selecting relevant features is in demand when a large data set is of interest in a classification task. It produces a tractable number of features that are sufficient and possibly improve the classification performance. This paper studies a statistical method of Markov blanket induction algorithm...... induction as a feature selection method. In addition, we point out an important assumption behind the Markov blanket induction algorithm and show its effect on the classification performance....... for filtering features and then applies a classifier using the Markov blanket predictors. The Markov blanket contains a minimal subset of relevant features that yields optimal classification performance. We experimentally demonstrate the improved performance of several classifiers using a Markov blanket...

  19. Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection

    Directory of Open Access Journals (Sweden)

    Shengyu Liu

    2015-01-01

    Full Text Available Drug name recognition (DNR is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  20. Feature engineering for drug name recognition in biomedical texts: feature conjunction and feature selection.

    Science.gov (United States)

    Liu, Shengyu; Tang, Buzhou; Chen, Qingcai; Wang, Xiaolong; Fan, Xiaoming

    2015-01-01

    Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.

  1. Medical Image Feature, Extraction, Selection And Classification

    Directory of Open Access Journals (Sweden)

    M.VASANTHA,

    2010-06-01

    Full Text Available Breast cancer is the most common type of cancer found in women. It is the most frequent form of cancer and one in 22 women in India is likely to suffer from breast cancer. This paper proposes a image classifier to classify the mammogram images. Mammogram image is classified into normal image, benign image and malignant image. Totally 26 features including histogram intensity features and GLCM features are extracted from mammogram image. A hybrid approach of feature selection is proposed in this paper which reduces 75% of the features. Decision tree algorithms are applied to mammography lassification by using these reduced features. Experimental results have been obtained for a data set of 113 images taken from MIAS of different types. This technique of classification has not been attempted before and it reveals the potential of Data mining in medical treatment.

  2. Feature subset selection based on relevance

    Science.gov (United States)

    Wang, Hui; Bell, David; Murtagh, Fionn

    In this paper an axiomatic characterisation of feature subset selection is presented. Two axioms are presented: sufficiency axiom—preservation of learning information, and necessity axiom—minimising encoding length. The sufficiency axiom concerns the existing dataset and is derived based on the following understanding: any selected feature subset should be able to describe the training dataset without losing information, i.e. it is consistent with the training dataset. The necessity axiom concerns the predictability and is derived from Occam's razor, which states that the simplest among different alternatives is preferred for prediction. The two axioms are then restated in terms of relevance in a concise form: maximising both the r( X; Y) and r( Y; X) relevance. Based on the relevance characterisation, four feature subset selection algorithms are presented and analysed: one is exhaustive and the remaining three are heuristic. Experimentation is also presented and the results are encouraging. Comparison is also made with some well-known feature subset selection algorithms, in particular, with the built-in feature selection mechanism in C4.5.

  3. Are qualitative and quantitative sleep problems associated with delinquency when controlling for psychopathic features and parental supervision?

    Science.gov (United States)

    Backman, Heidi; Laajasalo, Taina; Saukkonen, Suvi; Salmi, Venla; Kivivuori, Janne; Aronen, Eeva T

    2015-10-01

    The aim of this study was to explore the relationship between sleep, including both qualitative and quantitative aspects, and delinquent behaviour while controlling for psychopathic features of adolescents and parental supervision at bedtime. We analysed data from a nationally representative sample of 4855 Finnish adolescents (mean age 15.3 years, 51% females). Sleep problems, hours of sleep and delinquency were evaluated via self-report. Psychopathic features were measured with the Antisocial Process Screening Device - Self-Report. In negative binomial regressions, gender and sleep-related variables acted as predictors for both property and violent crime after controlling for psychopathic features and parental supervision at bedtime. The results suggest that both sleep problems (at least three times per week, at least for a year) and an insufficient amount of sleep (less than 7 h) are associated with property crime and violent behaviour, and the relationship is not explained by gender, degree of parental supervision at bedtime or co-occurring psychopathic features. These results suggest that sleep difficulties and insufficient amount of sleep are associated with delinquent behaviour in adolescents. The significance of addressing sleep-related problems, both qualitative and quantitative, among adolescents is thus highlighted. Implications for a prevention technique of delinquent behaviour are discussed. © 2015 European Sleep Research Society.

  4. Discriminative feature selection for visual tracking

    Science.gov (United States)

    Ma, Junkai; Luo, Haibo; Zhou, Wei; Song, Yingchao; Hui, Bin; Chang, Zheng

    2017-06-01

    Visual tracking is an important role in computer vision tasks. The robustness of tracking algorithm is a challenge. Especially in complex scenarios such as clutter background, illumination variation and appearance changes etc. As an important component in tracking algorithm, the appropriateness of feature is closed related to the tracking precision. In this paper, an online discriminative feature selection is proposed to provide the tracker the most discriminative feature. Firstly, a feature pool which contains different information of the image such as gradient, gray value and edge is built. And when every frame is processed during tracking, all of these features will be extracted. Secondly, these features are ranked depend on their discrimination between target and background and the highest scored feature is chosen to represent the candidate image patch. Then, after obtaining the tracking result, the target model will be update to adapt the appearance variation. The experiment show that our method is robust when compared with other state-of-the-art algorithms.

  5. Entropy based unsupervised Feature Selection in digital mammogram image using rough set theory.

    Science.gov (United States)

    Velayutham, C; Thangavel, K

    2012-01-01

    Feature Selection (FS) is a process, which attempts to select features, which are more informative. In the supervised FS methods various feature subsets are evaluated using an evaluation function or metric to select only those features, which are related to the decision classes of the data under consideration. However, for many data mining applications, decision class labels are often unknown or incomplete, thus indicating the significance of unsupervised FS. However, in unsupervised learning, decision class labels are not provided. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. In this paper, a novel unsupervised FS in mammogram image, using rough set-based entropy measures, is proposed. A typical mammogram image processing system generally consists of mammogram image acquisition, pre-processing of image, segmentation, features extracted from the segmented mammogram image. The proposed method is used to select features from data set, the method is compared with the existing rough set-based supervised FS methods and classification performance of both methods are recorded and demonstrates the efficiency of the method.

  6. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    Science.gov (United States)

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  7. Coevolution of active vision and feature selection.

    Science.gov (United States)

    Floreano, Dario; Kato, Toshifumi; Marocco, Davide; Sauser, Eric

    2004-03-01

    We show that complex visual tasks, such as position- and size-invariant shape recognition and navigation in the environment, can be tackled with simple architectures generated by a coevolutionary process of active vision and feature selection. Behavioral machines equipped with primitive vision systems and direct pathways between visual and motor neurons are evolved while they freely interact with their environments. We describe the application of this methodology in three sets of experiments, namely, shape discrimination, car driving, and robot navigation. We show that these systems develop sensitivity to a number of oriented, retinotopic, visual-feature-oriented edges, corners, height, and a behavioral repertoire to locate, bring, and keep these features in sensitive regions of the vision system, resembling strategies observed in simple insects.

  8. Novel Feature Selection by Differential Evolution Algorithm

    Directory of Open Access Journals (Sweden)

    Ali Ghareaghaji

    2013-11-01

    Full Text Available Iris scan biometrics employs the unique characteristic and features of the human iris in order to verify the identity of in individual. In today's world, where terrorist attacks are on the rise employment of infallible security systems is a must. This makes Iris recognition systems unavoidable in emerging security. Authentication the objective function is minimized using Differential Evolutionary (DE Algorithm where the population vector is encoded using Binary Encoded Decimal to avoid the float number optimization problem. An automatic clustering of the possible values of the Lagrangian multiplier provides a detailed insight of the selected features during the proposed DE based optimization process. The classification accuracy of Support Vector Machine (SVM is used to measure the performance of the selected features. The proposed algorithm outperforms the existing DE based approaches when tested on IRIS, Wine, Wisconsin Breast Cancer, Sonar and Ionosphere datasets. The same algorithm when applied on gait based people identification, using skeleton data points obtained from Microsoft Kinect sensor, exceeds the previously reported accuracies.

  9. A Hybrid Feature Subset Selection using Metrics and Forward Selection

    Directory of Open Access Journals (Sweden)

    K. Fathima Bibi

    2015-04-01

    Full Text Available The aim of this study is to design a Feature Subset Selection Technique that speeds up the Feature Selection (FS process in high dimensional datasets with reduced computational cost and great efficiency. FS has become the focus of much research on decision support system areas for which data with tremendous number of variables are analyzed. Filters and wrappers are proposed techniques for the feature subset selection process. Filters make use of association based approach but wrappers adopt classification algorithms to identify important features. Filter method lacks the ability of minimization of simplification error while wrapper method burden weighty computational resource. To pull through these difficulties, a hybrid approach is proposed combining both filters and wrappers. Filter approach uses a permutation of ranker search methods and a wrapper which improves the learning accurateness and obtains a lessening in the memory requirements and finishing time. The UCI machine learning repository was chosen to experiment the approach. The classification accuracy resulted from our approach proves to be higher.

  10. Unsupervised Feature Selection for Latent Dirichlet Allocation

    Institute of Scientific and Technical Information of China (English)

    Xu Weiran; Du Gang; Chen Guang; Guo Jun; Yang Jie

    2011-01-01

    As a generative model Latent Dirichlet Allocation Model,which lacks optimization of topics' discrimination capability focuses on how to generate data,This paper aims to improve the discrimination capability through unsupervised feature selection.Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words.The discrimination capability of a word is approximated by the Information Gain of the word for topics,which is used to distinguish between “general word” and “special word” in LDA topics.Therefore,we add a constraint to the LDA objective function to let the “general words” only happen in “general topics”other than “special topics”.Then a heuristic algorithm is presented to get the solution.Experiments show that this method can not only improve the information gain of topics,but also make the topics easier to understand by human.

  11. Cloud detection in all-sky images via multi-scale neighborhood features and multiple supervised learning techniques

    Science.gov (United States)

    Cheng, Hsu-Yung; Lin, Chih-Lung

    2017-01-01

    Cloud detection is important for providing necessary information such as cloud cover in many applications. Existing cloud detection methods include red-to-blue ratio thresholding and other classification-based techniques. In this paper, we propose to perform cloud detection using supervised learning techniques with multi-resolution features. One of the major contributions of this work is that the features are extracted from local image patches with different sizes to include local structure and multi-resolution information. The cloud models are learned through the training process. We consider classifiers including random forest, support vector machine, and Bayesian classifier. To take advantage of the clues provided by multiple classifiers and various levels of patch sizes, we employ a voting scheme to combine the results to further increase the detection accuracy. In the experiments, we have shown that the proposed method can distinguish cloud and non-cloud pixels more accurately compared with existing works.

  12. Feature Selection for Chemical Sensor Arrays Using Mutual Information

    Science.gov (United States)

    Wang, X. Rosalind; Lizier, Joseph T.; Nowotny, Thomas; Berna, Amalia Z.; Prokopenko, Mikhail; Trowell, Stephen C.

    2014-01-01

    We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays. PMID:24595058

  13. A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer’s Disease Diagnosis

    Science.gov (United States)

    An, Le; Adeli, Ehsan; Liu, Mingxia; Zhang, Jun; Lee, Seong-Whan; Shen, Dinggang

    2017-01-01

    Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer’s disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals. PMID:28358032

  14. A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer’s Disease Diagnosis

    Science.gov (United States)

    An, Le; Adeli, Ehsan; Liu, Mingxia; Zhang, Jun; Lee, Seong-Whan; Shen, Dinggang

    2017-03-01

    Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer’s disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.

  15. A New Approach of Feature Selection for Text Categorization

    Institute of Scientific and Technical Information of China (English)

    CUI Zifeng; XU Baowen; ZHANG Weifeng; XU Junling

    2006-01-01

    This paper proposes a new approach of feature selection based on the independent measure between features for text categorization.A fundamental hypothesis that occurrence of the terms in documents is independent of each other,widely used in the probabilistic models for text categorization (TC), is discussed.However, the basic hypothesis is incomplete for independence of feature set.From the view of feature selection, a new independent measure between features is designed, by which a feature selection algorithm is given to obtain a feature subset.The selected subset is high in relevance with category and strong in independence between features,satisfies the basic hypothesis at maximum degree.Compared with other traditional feature selection method in TC (which is only taken into the relevance account), the performance of feature subset selected by our method is prior to others with experiments on the benchmark dataset of 20 Newsgroups.

  16. Feature Selection for Wheat Yield Prediction

    Science.gov (United States)

    Ruß, Georg; Kruse, Rudolf

    Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.

  17. Detecting Lo cal Manifold Structure for Unsup ervised Feature Selection

    Institute of Scientific and Technical Information of China (English)

    FENG Ding-Cheng; CHEN Feng; XU Wen-Li

    2014-01-01

    Unsupervised feature selection is fundamental in statistical pattern recognition, and has drawn persistent attention in the past several decades. Recently, much work has shown that feature selection can be formulated as nonlinear dimensionality reduction with discrete constraints. This line of research emphasizes utilizing the manifold learning techniques, where feature selection and learning can be studied based on the manifold assumption in data distribution. Many existing feature selection methods such as Laplacian score, SPEC (spectrum decomposition of graph Laplacian), TR (trace ratio) criterion, MSFS (multi-cluster feature selection) and EVSC (eigenvalue sensitive criterion) apply the basic properties of graph Laplacian, and select the optimal feature subsets which best preserve the manifold structure defined on the graph Laplacian. In this paper, we propose a new feature selection perspective from locally linear embedding (LLE), which is another popular manifold learning method. The main difficulty of using LLE for feature selection is that its optimization involves quadratic programming and eigenvalue decomposition, both of which are continuous procedures and different from discrete feature selection. We prove that the LLE objective can be decomposed with respect to data dimensionalities in the subset selection problem, which also facilitates constructing better coordinates from data using the principal component analysis (PCA) technique. Based on these results, we propose a novel unsupervised feature selection algorithm, called locally linear selection (LLS), to select a feature subset representing the underlying data manifold. The local relationship among samples is computed from the LLE formulation, which is then used to estimate the contribution of each individual feature to the underlying manifold structure. These contributions, represented as LLS scores, are ranked and selected as the candidate solution to feature selection. We further develop a

  18. SAR Target Recognition via Supervised Discriminative Dictionary Learning and Sparse Representation of the SAR-HOG Feature

    Directory of Open Access Journals (Sweden)

    Shengli Song

    2016-08-01

    Full Text Available Automatic target recognition (ATR in synthetic aperture radar (SAR images plays an important role in both national defense and civil applications. Although many methods have been proposed, SAR ATR is still very challenging due to the complex application environment. Feature extraction and classification are key points in SAR ATR. In this paper, we first design a novel feature, which is a histogram of oriented gradients (HOG-like feature for SAR ATR (called SAR-HOG. Then, we propose a supervised discriminative dictionary learning (SDDL method to learn a discriminative dictionary for SAR ATR and propose a strategy to simplify the optimization problem. Finally, we propose a SAR ATR classifier based on SDDL and sparse representation (called SDDLSR, in which both the reconstruction error and the classification error are considered. Extensive experiments are performed on the MSTAR database under standard operating conditions and extended operating conditions. The experimental results show that SAR-HOG can reliably capture the structures of targets in SAR images, and SDDL can further capture subtle differences among the different classes. By virtue of the SAR-HOG feature and SDDLSR, the proposed method achieves the state-of-the-art performance on MSTAR database. Especially for the extended operating conditions (EOC scenario “Training 17 ∘ —Testing 45 ∘ ”, the proposed method improves remarkably with respect to the previous works.

  19. EMD-Based Temporal and Spectral Features for the Classification of EEG Signals Using Supervised Learning.

    Science.gov (United States)

    Riaz, Farhan; Hassan, Ali; Rehman, Saad; Niazi, Imran Khan; Dremstrup, Kim

    2016-01-01

    This paper presents a novel method for feature extraction from electroencephalogram (EEG) signals using empirical mode decomposition (EMD). Its use is motivated by the fact that the EMD gives an effective time-frequency analysis of nonstationary signals. The intrinsic mode functions (IMF) obtained as a result of EMD give the decomposition of a signal according to its frequency components. We present the usage of upto third order temporal moments, and spectral features including spectral centroid, coefficient of variation and the spectral skew of the IMFs for feature extraction from EEG signals. These features are physiologically relevant given that the normal EEG signals have different temporal and spectral centroids, dispersions and symmetries when compared with the pathological EEG signals. The calculated features are fed into the standard support vector machine (SVM) for classification purposes. The performance of the proposed method is studied on a publicly available dataset which is designed to handle various classification problems including the identification of epilepsy patients and detection of seizures. Experiments show that good classification results are obtained using the proposed methodology for the classification of EEG signals. Our proposed method also compares favorably to other state-of-the-art feature extraction methods.

  20. Naive Bayes-Guided Bat Algorithm for Feature Selection

    Directory of Open Access Journals (Sweden)

    Ahmed Majid Taha

    2013-01-01

    Full Text Available When the amount of data and information is said to double in every 20 months or so, feature selection has become highly important and beneficial. Further improvements in feature selection will positively affect a wide array of applications in fields such as pattern recognition, machine learning, or signal processing. Bio-inspired method called Bat Algorithm hybridized with a Naive Bayes classifier has been presented in this work. The performance of the proposed feature selection algorithm was investigated using twelve benchmark datasets from different domains and was compared to three other well-known feature selection algorithms. Discussion focused on four perspectives: number of features, classification accuracy, stability, and feature generalization. The results showed that BANB significantly outperformed other algorithms in selecting lower number of features, hence removing irrelevant, redundant, or noisy features while maintaining the classification accuracy. BANB is also proven to be more stable than other methods and is capable of producing more general feature subsets.

  1. GLOBAL SELECTED AS PARTNER TO SUPERVISE WEST-TO-EAST GAS PROJECT

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    @@ On August 3, PetroChina West-to-East Gas Pipeline Company and US-based Global Engineering Service Company signed ìAdvisory Service Contract for Engineering Supervision of West-to-East Gas Pipeline(1) in Beijing.Signature of this contract marks the start of engineering supervision business for thepipeline project.

  2. Feature dimensionality reduction for myoelectric pattern recognition: a comparison study of feature selection and feature projection methods.

    Science.gov (United States)

    Liu, Jie

    2014-12-01

    This study investigates the effect of the feature dimensionality reduction strategies on the classification of surface electromyography (EMG) signals toward developing a practical myoelectric control system. Two dimensionality reduction strategies, feature selection and feature projection, were tested on both EMG feature sets, respectively. A feature selection based myoelectric pattern recognition system was introduced to select the features by eliminating the redundant features of EMG recordings instead of directly choosing a subset of EMG channels. The Markov random field (MRF) method and a forward orthogonal search algorithm were employed to evaluate the contribution of each individual feature to the classification, respectively. Our results from 15 healthy subjects indicate that, with a feature selection analysis, independent of the type of feature set, across all subjects high overall accuracies can be achieved in classification of seven different forearm motions with a small number of top ranked original EMG features obtained from the forearm muscles (average overall classification accuracy >95% with 12 selected EMG features). Compared to various feature dimensionality reduction techniques in myoelectric pattern recognition, the proposed filter-based feature selection approach is independent of the type of classification algorithms and features, which can effectively reduce the redundant information not only across different channels, but also cross different features in the same channel. This may enable robust EMG feature dimensionality reduction without needing to change ongoing, practical use of classification algorithms, an important step toward clinical utility.

  3. Efficient Generation and Selection of Combined Features for Improved Classification

    KAUST Repository

    Shono, Ahmad N.

    2014-05-01

    This study contributes a methodology and associated toolkit developed to allow users to experiment with the use of combined features in classification problems. Methods are provided for efficiently generating combined features from an original feature set, for efficiently selecting the most discriminating of these generated combined features, and for efficiently performing a preliminary comparison of the classification results when using the original features exclusively against the results when using the selected combined features. The potential benefit of considering combined features in classification problems is demonstrated by applying the developed methodology and toolkit to three sample data sets where the discovery of combined features containing new discriminating information led to improved classification results.

  4. A Novel Pre-Processing Technique for Original Feature Matrix of Electronic Nose Based on Supervised Locality Preserving Projections

    Directory of Open Access Journals (Sweden)

    Pengfei Jia

    2016-06-01

    Full Text Available An electronic nose (E-nose consisting of 14 metal oxide gas sensors and one electronic chemical gas sensor has been constructed to identify four different classes of wound infection. However, the classification results of the E-nose are not ideal if the original feature matrix containing the maximum steady-state response value of sensors is processed by the classifier directly, so a novel pre-processing technique based on supervised locality preserving projections (SLPP is proposed in this paper to process the original feature matrix before it is put into the classifier to improve the performance of the E-nose. SLPP is good at finding and keeping the nonlinear structure of data; furthermore, it can provide an explicit mapping expression which is unreachable by the traditional manifold learning methods. Additionally, some effective optimization methods are found by us to optimize the parameters of SLPP and the classifier. Experimental results prove that the classification accuracy of support vector machine (SVM combined with the data pre-processed by SLPP outperforms other considered methods. All results make it clear that SLPP has a better performance in processing the original feature matrix of the E-nose.

  5. Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection

    Directory of Open Access Journals (Sweden)

    Xiuquan Du

    2014-01-01

    Full Text Available Identifying cancer-associated mutations (driver mutations is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method.

  6. Principal Feature Analysis: A Multivariate Feature Selection Method for fMRI Data

    Directory of Open Access Journals (Sweden)

    Lijun Wang

    2013-01-01

    Full Text Available Brain decoding with functional magnetic resonance imaging (fMRI requires analysis of complex, multivariate data. Multivoxel pattern analysis (MVPA has been widely used in recent years. MVPA treats the activation of multiple voxels from fMRI data as a pattern and decodes brain states using pattern classification methods. Feature selection is a critical procedure of MVPA because it decides which features will be included in the classification analysis of fMRI data, thereby improving the performance of the classifier. Features can be selected by limiting the analysis to specific anatomical regions or by computing univariate (voxel-wise or multivariate statistics. However, these methods either discard some informative features or select features with redundant information. This paper introduces the principal feature analysis as a novel multivariate feature selection method for fMRI data processing. This multivariate approach aims to remove features with redundant information, thereby selecting fewer features, while retaining the most information.

  7. NEW FEATURE SELECTION METHOD IN MACHINE FAULT DIAGNOSIS

    Institute of Scientific and Technical Information of China (English)

    Wang Xinfeng; Qiu Jing; Liu Guanjun

    2005-01-01

    Aiming to deficiency of the filter and wrapper feature selection methods, a new method based on composite method of filter and wrapper method is proposed. First the method filters original features to form a feature subset which can meet classification correctness rate, then applies wrapper feature selection method select optimal feature subset. A successful technique for solving optimization problems is given by genetic algorithm (GA). GA is applied to the problem of optimal feature selection. The composite method saves computing time several times of the wrapper method with holding the classification accuracy in data simulation and experiment on bearing fault feature selection. So this method possesses excellent optimization property, can save more selection time, and has the characteristics of high accuracy and high efficiency.

  8. Supervised multi-view canonical correlation analysis (sMVCCA): integrating histologic and proteomic features for predicting recurrent prostate cancer.

    Science.gov (United States)

    Lee, George; Singanamalli, Asha; Wang, Haibo; Feldman, Michael D; Master, Stephen R; Shih, Natalie N C; Spangler, Elaine; Rebbeck, Timothy; Tomaszewski, John E; Madabhushi, Anant

    2015-01-01

    In this work, we present a new methodology to facilitate prediction of recurrent prostate cancer (CaP) following radical prostatectomy (RP) via the integration of quantitative image features and protein expression in the excised prostate. Creating a fused predictor from high-dimensional data streams is challenging because the classifier must 1) account for the "curse of dimensionality" problem, which hinders classifier performance when the number of features exceeds the number of patient studies and 2) balance potential mismatches in the number of features across different channels to avoid classifier bias towards channels with more features. Our new data integration methodology, supervised Multi-view Canonical Correlation Analysis (sMVCCA), aims to integrate infinite views of highdimensional data to provide more amenable data representations for disease classification. Additionally, we demonstrate sMVCCA using Spearman's rank correlation which, unlike Pearson's correlation, can account for nonlinear correlations and outliers. Forty CaP patients with pathological Gleason scores 6-8 were considered for this study. 21 of these men revealed biochemical recurrence (BCR) following RP, while 19 did not. For each patient, 189 quantitative histomorphometric attributes and 650 protein expression levels were extracted from the primary tumor nodule. The fused histomorphometric/proteomic representation via sMVCCA combined with a random forest classifier predicted BCR with a mean AUC of 0.74 and a maximum AUC of 0.9286. We found sMVCCA to perform statistically significantly (p state-of-the-art data fusion strategies for predicting BCR. Furthermore, Kaplan-Meier analysis demonstrated improved BCR-free survival prediction for the sMVCCA-fused classifier as compared to histology or proteomic features alone.

  9. A curriculum-based approach for feature selection

    Science.gov (United States)

    Kalavala, Deepthi; Bhagvati, Chakravarthy

    2017-06-01

    Curriculum learning is a learning technique in which a classifier learns from easy samples first and then from increasingly difficult samples. On similar lines, a curriculum based feature selection framework is proposed for identifying most useful features in a dataset. Given a dataset, first, easy and difficult samples are identified. In general, the number of easy samples is assumed larger than difficult samples. Then, feature selection is done in two stages. In the first stage a fast feature selection method which gives feature scores is used. Feature scores are then updated incrementally with the set of difficult samples. The existing feature selection methods are not incremental in nature; entire data needs to be used in feature selection. The use of curriculum learning is expected to decrease the time needed for feature selection with classification accuracy comparable to the existing methods. Curriculum learning also allows incremental refinements in feature selection as new training samples become available. Our experiments on a number of standard datasets demonstrate that feature selection is indeed faster without sacrificing classification accuracy.

  10. Feature selection with neighborhood entropy-based cooperative game theory.

    Science.gov (United States)

    Zeng, Kai; She, Kun; Niu, Xinzheng

    2014-01-01

    Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT) yield better performance than classical ones.

  11. An ensemble approach for feature selection of Cyber Attack Dataset

    CERN Document Server

    Singh, Shailendra

    2009-01-01

    Feature selection is an indispensable preprocessing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the Knearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset.

  12. Geochemical dynamics in selected Yellowstone hydrothermal features

    Science.gov (United States)

    Druschel, G.; Kamyshny, A.; Findlay, A.; Nuzzio, D.

    2010-12-01

    Yellowstone National Park has a wide diversity of thermal features, and includes springs with a range of pH conditions that significantly impact sulfur speciation. We have utilized a combination of voltammetric and spectroscopic techniques to characterize the intermediate sulfur chemistry of Cinder Pool, Evening Primrose, Ojo Caliente, Frying Pan, Azure, and Dragon thermal springs. These measurements additionally have demonstrated the geochemical dynamics inherent in these systems; significant variability in chemical speciation occur in many of these thermal features due to changes in gas supply rates, fluid discharge rates, and thermal differences that occur on second time scales. The dynamics of the geochemical settings shown may significantly impact how microorganisms interact with the sulfur forms in these systems.

  13. A Study on Feature Selection Techniques in Educational Data Mining

    CERN Document Server

    Ramaswami, M

    2009-01-01

    Educational data mining (EDM) is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the behaviors of students in the learning process. In this EDM, feature selection is to be made for the generation of subset of candidate variables. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. In this connection, the present study is devoted not only to investigate the most relevant subset features with minimum cardinality for achieving high predictive performance by adopting various filtered feature selection techniques in data mining but also to evaluate the goodness of subsets with different cardinalities and the quality of six filtered feature selection algorithms in terms of F-measure value and Receiver Operating Characteristics (ROC) value, generat...

  14. Aptamers overview: selection, features and applications.

    Science.gov (United States)

    Hernandez, Luiza I; Machado, Isabel; Schafer, Thomas; Hernandez, Frank J

    2015-01-01

    Apatamer technology has been around for a quarter of a century and the field had matured enough to start seeing real applications, especially in the medical field. Since their discovery, aptamers rapidly emerged as key players in many fields, such as diagnostics, drug discovery, food science, drug delivery and therapeutics. Because of their synthetic nature, aptamers are evolving at an exponential rate gaining from the newest advances in chemistry, nanotechnology, biology and medicine. This review is meant to give an overview of the aptamer field, by including general aspects of aptamer identification and applications as well as highlighting certain features that contribute to their quick deployment in the biomedical field.

  15. Bayesian feature selection to estimate customer survival

    OpenAIRE

    Figini, Silvia; Giudici, Paolo; Brooks, S P

    2006-01-01

    We consider the problem of estimating the lifetime value of customers, when a large number of features are present in the data. In order to measure lifetime value we use survival analysis models to estimate customer tenure. In such a context, a number of classical modelling challenges arise. We will show how our proposed Bayesian methods perform, and compare it with classical churn models on a real case study. More specifically, based on data from a media service company, our aim will be to p...

  16. FEATURE SELECTION USING GENETIC ALGORITHMS FOR HANDWRITTEN CHARACTER RECOGNITION

    NARCIS (Netherlands)

    Kim, G.; Kim, S.

    2004-01-01

    A feature selection method using genetic algorithms which are suitable means for selecting appropriate set of features from ones with huge dimension is proposed. SGA (Simple Genetic Algorithm) and its modified methods are applied to improve the recognition speed as well as the recognition accuracy.

  17. High Dimensional Data Clustering Using Fast Cluster Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Karthikeyan.P

    2014-03-01

    Full Text Available Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST using the Kruskal‟s Algorithm clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Index Terms—

  18. Feature Selection Criteria for Real Time EKF-SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando Auat Cheein

    2010-02-01

    Full Text Available This paper presents a seletion procedure for environmet features for the correction stage of a SLAM (Simultaneous Localization and Mapping algorithm based on an Extended Kalman Filter (EKF. This approach decreases the computational time of the correction stage which allows for real and constant-time implementations of the SLAM. The selection procedure consists in chosing the features the SLAM system state covariance is more sensible to. The entire system is implemented on a mobile robot equipped with a range sensor laser. The features extracted from the environment correspond to lines and corners. Experimental results of the real time SLAM algorithm and an analysis of the processing-time consumed by the SLAM with the feature selection procedure proposed are shown. A comparison between the feature selection approach proposed and the classical sequential EKF-SLAM along with an entropy feature selection approach is also performed.

  19. Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2015-01-01

    Full Text Available Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs. Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  20. Simultaneous channel and feature selection of fused EEG features based on Sparse Group Lasso.

    Science.gov (United States)

    Wang, Jin-Jia; Xue, Fang; Li, Hui

    2015-01-01

    Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs). Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  1. Evaluation of Feature Selection Approaches for Urdu Text Categorization

    Directory of Open Access Journals (Sweden)

    Tehseen Zia

    2015-05-01

    Full Text Available Efficient feature selection is an important phase of designing an effective text categorization system. Various feature selection methods have been proposed for selecting dissimilar feature sets. It is often essential to evaluate that which method is more effective for a given task and what size of feature set is an effective model selection choice. Aim of this paper is to answer these questions for designing Urdu text categorization system. Five widely used feature selection methods were examined using six well-known classification algorithms: naive Bays (NB, k-nearest neighbor (KNN, support vector machines (SVM with linear, polynomial and radial basis kernels and decision tree (i.e. J48. The study was conducted over two test collections: EMILLE collection and a naive collection. We have observed that three feature selection methods i.e. information gain, Chi statistics, and symmetrical uncertain, have performed uniformly in most of the cases if not all. Moreover, we have found that no single feature selection method is best for all classifiers. While gain ratio out-performed others for naive Bays and J48, information gain has shown top performance for KNN and SVM with polynomial and radial basis kernels. Overall, linear SVM with any of feature selection methods including information gain, Chi statistics or symmetric uncertain methods is turned-out to be first choice across other combinations of classifiers and feature selection methods on moderate size naive collection. On the other hand, naive Bays with any of feature selection method have shown its advantage for a small sized EMILLE corpus.

  2. A Features Selection for Crops Classification

    Science.gov (United States)

    Zhao, Lei; Chen, Erxue; Li, Zengyuan; Li, Lan; Gu, Xinzhi

    2016-08-01

    Polarization orientation angle (POA) is a major parameter of electromagnetic wave. This angle will be shift due to azimuth slopes, which will affect the radiometric quality of PolSAR data. Under the assumption of reflection symmetrical medium, the shift value of polarization orientation angle (POAs) can be estimated by Circular Polarization Method (CPM). Then, the shift angle can be used to compensate PolSAR data or extract DEM information. However, it is less effective when using high-frequency SAR (L-, C-band) in the forest area. The main reason is that the polarization orientation angle shift of forest area not only influenced by topography, but also affected by the forest canopy. Among them, the influence of the former belongs to the interference information should be removed, but the impact of the latter belongs to the polarization feature information needs to be retained. The ALOS2 PALSAR2 L-band full polarimetric SAR data was used in this study. Base on the Circular Polarization and DEM-based method, we analyzed the variation of shift value of polarization orientation angle and developed the polarization orientation shift estimation and compensation of PolSAR data in forest.

  3. Feature selection using genetic algorithms for fetal heart rate analysis.

    Science.gov (United States)

    Xu, Liang; Redman, Christopher W G; Payne, Stephen J; Georgieva, Antoniya

    2014-07-01

    The fetal heart rate (FHR) is monitored on a paper strip (cardiotocogram) during labour to assess fetal health. If necessary, clinicians can intervene and assist with a prompt delivery of the baby. Data-driven computerized FHR analysis could help clinicians in the decision-making process. However, selecting the best computerized FHR features that relate to labour outcome is a pressing research problem. The objective of this study is to apply genetic algorithms (GA) as a feature selection method to select the best feature subset from 64 FHR features and to integrate these best features to recognize unfavourable FHR patterns. The GA was trained on 404 cases and tested on 106 cases (both balanced datasets) using three classifiers, respectively. Regularization methods and backward selection were used to optimize the GA. Reasonable classification performance is shown on the testing set for the best feature subset (Cohen's kappa values of 0.45 to 0.49 using different classifiers). This is, to our knowledge, the first time that a feature selection method for FHR analysis has been developed on a database of this size. This study indicates that different FHR features, when integrated, can show good performance in predicting labour outcome. It also gives the importance of each feature, which will be a valuable reference point for further studies.

  4. Hadoop neural network for parallel and distributed feature selection.

    Science.gov (United States)

    Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

    2016-06-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. Feature Selection for Neural Network Based Stock Prediction

    Science.gov (United States)

    Sugunnasil, Prompong; Somhom, Samerkae

    We propose a new methodology of feature selection for stock movement prediction. The methodology is based upon finding those features which minimize the correlation relation function. We first produce all the combination of feature and evaluate each of them by using our evaluate function. We search through the generated set with hill climbing approach. The self-organizing map based stock prediction model is utilized as the prediction method. We conduct the experiment on data sets of the Microsoft Corporation, General Electric Co. and Ford Motor Co. The results show that our feature selection method can improve the efficiency of the neural network based stock prediction.

  6. Selective attention to temporal features on nested time scales.

    Science.gov (United States)

    Henry, Molly J; Herrmann, Björn; Obleser, Jonas

    2015-02-01

    Meaningful auditory stimuli such as speech and music often vary simultaneously along multiple time scales. Thus, listeners must selectively attend to, and selectively ignore, separate but intertwined temporal features. The current study aimed to identify and characterize the neural network specifically involved in this feature-selective attention to time. We used a novel paradigm where listeners judged either the duration or modulation rate of auditory stimuli, and in which the stimulation, working memory demands, response requirements, and task difficulty were held constant. A first analysis identified all brain regions where individual brain activation patterns were correlated with individual behavioral performance patterns, which thus supported temporal judgments generically. A second analysis then isolated those brain regions that specifically regulated selective attention to temporal features: Neural responses in a bilateral fronto-parietal network including insular cortex and basal ganglia decreased with degree of change of the attended temporal feature. Critically, response patterns in these regions were inverted when the task required selectively ignoring this feature. The results demonstrate how the neural analysis of complex acoustic stimuli with multiple temporal features depends on a fronto-parietal network that simultaneously regulates the selective gain for attended and ignored temporal features.

  7. A New Evolutionary-Incremental Framework for Feature Selection

    Directory of Open Access Journals (Sweden)

    Mohamad-Hoseyn Sigari

    2014-01-01

    Full Text Available Feature selection is an NP-hard problem from the viewpoint of algorithm design and it is one of the main open problems in pattern recognition. In this paper, we propose a new evolutionary-incremental framework for feature selection. The proposed framework can be applied on an ordinary evolutionary algorithm (EA such as genetic algorithm (GA or invasive weed optimization (IWO. This framework proposes some generic modifications on ordinary EAs to be compatible with the variable length of solutions. In this framework, the solutions related to the primary generations have short length. Then, the length of solutions may be increased through generations gradually. In addition, our evolutionary-incremental framework deploys two new operators called addition and deletion operators which change the length of solutions randomly. For evaluation of the proposed framework, we use that for feature selection in the application of face recognition. In this regard, we applied our feature selection method on a robust face recognition algorithm which is based on the extraction of Gabor coefficients. Experimental results show that our proposed evolutionary-incremental framework can select a few number of features from existing thousands features efficiently. Comparison result of the proposed methods with the previous methods shows that our framework is comprehensive, robust, and well-defined to apply on many EAs for feature selection.

  8. Feature Selection for Image Retrieval based on Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Preeti Kushwaha

    2016-12-01

    Full Text Available This paper describes the development and implementation of feature selection for content based image retrieval. We are working on CBIR system with new efficient technique. In this system, we use multi feature extraction such as colour, texture and shape. The three techniques are used for feature extraction such as colour moment, gray level co- occurrence matrix and edge histogram descriptor. To reduce curse of dimensionality and find best optimal features from feature set using feature selection based on genetic algorithm. These features are divided into similar image classes using clustering for fast retrieval and improve the execution time. Clustering technique is done by k-means algorithm. The experimental result shows feature selection using GA reduces the time for retrieval and also increases the retrieval precision, thus it gives better and faster results as compared to normal image retrieval system. The result also shows precision and recall of proposed approach compared to previous approach for each image class. The CBIR system is more efficient and better performs using feature selection based on Genetic Algorithm.

  9. Feature-selective attention in healthy old age: a selective decline in selective attention?

    Science.gov (United States)

    Quigley, Cliodhna; Müller, Matthias M

    2014-02-12

    Deficient selection against irrelevant information has been proposed to underlie age-related cognitive decline. We recently reported evidence for maintained early sensory selection when older and younger adults used spatial selective attention to perform a challenging task. Here we explored age-related differences when spatial selection is not possible and feature-selective attention must be deployed. We additionally compared the integrity of feedforward processing by exploiting the well established phenomenon of suppression of visual cortical responses attributable to interstimulus competition. Electroencephalogram was measured while older and younger human adults responded to brief occurrences of coherent motion in an attended stimulus composed of randomly moving, orientation-defined, flickering bars. Attention was directed to horizontal or vertical bars by a pretrial cue, after which two orthogonally oriented, overlapping stimuli or a single stimulus were presented. Horizontal and vertical bars flickered at different frequencies and thereby elicited separable steady-state visual-evoked potentials, which were used to examine the effect of feature-based selection and the competitive influence of a second stimulus on ongoing visual processing. Age differences were found in feature-selective attentional modulation of visual responses: older adults did not show consistent modulation of magnitude or phase. In contrast, the suppressive effect of a second stimulus was robust and comparable in magnitude across age groups, suggesting that bottom-up processing of the current stimuli is essentially unchanged in healthy old age. Thus, it seems that visual processing per se is unchanged, but top-down attentional control is compromised in older adults when space cannot be used to guide selection.

  10. Lazy learner text categorization algorithm based on embedded feature selection

    Institute of Scientific and Technical Information of China (English)

    Yan Peng; Zheng Xuefeng; Zhu Jianyong; Xiao Yunhong

    2009-01-01

    To avoid the curse of dimensionality, text categorization (TC) algorithms based on machine learning (ML) have to use an feature selection (FS) method to reduce the dimensionality of feature space. Although having been widely used, FS process will generally cause information losing and then have much side-effect on the whole performance of TC algorithms. On the basis of the sparsity characteristic of text vectors, a new TC algorithm based on lazy feature selection (LFS) is presented. As a new type of embedded feature selection approach, the LFS method can greatly reduce the dimension of features without any information losing, which can improve both efficiency and performance of algorithms greatly. The experiments show the new algorithm can simultaneously achieve much higher both performance and efficiency than some of other classical TC algorithms.

  11. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HU Yue-li; CAO Jia-lin; ZHAO Qian; FENG Xu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92 % using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  12. Features Selection for Skin Micro-Image Symptomatic Recognition

    Institute of Scientific and Technical Information of China (English)

    HUYue-li; CAOJia-lin; ZHAOQian; FENGXu

    2004-01-01

    Automatic recognition of skin micro-image symptom is important in skin diagnosis and treatment. Feature selection is to improve the classification performance of skin micro-image symptom.This paper proposes a hybrid approach based on the support vector machine (SVM) technique and genetic algorithm (GA) to select an optimum feature subset from the feature group extracted from the skin micro-images. An adaptive GA is introduced for maintaining the convergence rate. With the proposed method, the average cross validation accuracy is increased from 88.25% using all features to 96.92% using only selected features provided by a classifier for classification of 5 classes of skin symptoms. The experimental results are satisfactory.

  13. Feature selection for domain knowledge representation through multitask learning

    CSIR Research Space (South Africa)

    Rosman, Benjamin S

    2014-10-01

    Full Text Available -1 Feature selection for domain knowledge representation through multitask learning Benjamin Rosman Mobile Intelligent Autonomous Systems CSIR South Africa BRosman@csir.co.za Representation learning is a difficult and important problem...

  14. Kollegial supervision

    DEFF Research Database (Denmark)

    Andersen, Ole Dibbern; Petersson, Erling

    Publikationen belyser, hvordan kollegial supervision i en kan organiseres i en uddannelsesinstitution......Publikationen belyser, hvordan kollegial supervision i en kan organiseres i en uddannelsesinstitution...

  15. Optimized Image Steganalysis through Feature Selection using MBEGA

    CERN Document Server

    Geetha, S

    2010-01-01

    Feature based steganalysis, an emerging branch in information forensics, aims at identifying the presence of a covert communication by employing the statistical features of the cover and stego image as clues/evidences. Due to the large volumes of security audit data as well as complex and dynamic properties of steganogram behaviours, optimizing the performance of steganalysers becomes an important open problem. This paper is focussed at fine tuning the performance of six promising steganalysers in this field, through feature selection. We propose to employ Markov Blanket-Embedded Genetic Algorithm (MBEGA) for stego sensitive feature selection process. In particular, the embedded Markov blanket based memetic operators add or delete features (or genes) from a genetic algorithm (GA) solution so as to quickly improve the solution and fine-tune the search. Empirical results suggest that MBEGA is effective and efficient in eliminating irrelevant and redundant features based on both Markov blanket and predictive pow...

  16. Modeling Suspicious Email Detection using Enhanced Feature Selection

    OpenAIRE

    2013-01-01

    The paper presents a suspicious email detection model which incorporates enhanced feature selection. In the paper we proposed the use of feature selection strategies along with classification technique for terrorists email detection. The presented model focuses on the evaluation of machine learning algorithms such as decision tree (ID3), logistic regression, Na\\"ive Bayes (NB), and Support Vector Machine (SVM) for detecting emails containing suspicious content. In the literature, various algo...

  17. Ensemble feature selection integrating elitist roles and quantum game model

    Institute of Scientific and Technical Information of China (English)

    Weiping Ding; Jiandong Wang; Zhijin Guan; Quan Shi

    2015-01-01

    To accelerate the selection process of feature subsets in the rough set theory (RST), an ensemble elitist roles based quantum game (EERQG) algorithm is proposed for feature selec-tion. Firstly, the multilevel elitist roles based dynamics equilibrium strategy is established, and both immigration and emigration of elitists are able to be self-adaptive to balance between exploration and exploitation for feature selection. Secondly, the utility matrix of trust margins is introduced to the model of multilevel elitist roles to enhance various elitist roles’ performance of searching the optimal feature subsets, and the win-win utility solutions for feature selec-tion can be attained. Meanwhile, a novel ensemble quantum game strategy is designed as an intriguing exhibiting structure to perfect the dynamics equilibrium of multilevel elitist roles. Final y, the en-semble manner of multilevel elitist roles is employed to achieve the global minimal feature subset, which wil greatly improve the fea-sibility and effectiveness. Experiment results show the proposed EERQG algorithm has superiority compared to the existing feature selection algorithms.

  18. Effective Feature Selection for 5G IM Applications Traffic Classification

    Directory of Open Access Journals (Sweden)

    Muhammad Shafiq

    2017-01-01

    Full Text Available Recently, machine learning (ML algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.

  19. Feature selection for optimized skin tumor recognition using genetic algorithms.

    Science.gov (United States)

    Handels, H; Ross, T; Kreusch, J; Wolff, H H; Pöppl, S J

    1999-07-01

    In this paper, a new approach to computer supported diagnosis of skin tumors in dermatology is presented. High resolution skin surface profiles are analyzed to recognize malignant melanomas and nevocytic nevi (moles), automatically. In the first step, several types of features are extracted by 2D image analysis methods characterizing the structure of skin surface profiles: texture features based on cooccurrence matrices, Fourier features and fractal features. Then, feature selection algorithms are applied to determine suitable feature subsets for the recognition process. Feature selection is described as an optimization problem and several approaches including heuristic strategies, greedy and genetic algorithms are compared. As quality measure for feature subsets, the classification rate of the nearest neighbor classifier computed with the leaving-one-out method is used. Genetic algorithms show the best results. Finally, neural networks with error back-propagation as learning paradigm are trained using the selected feature sets. Different network topologies, learning parameters and pruning algorithms are investigated to optimize the classification performance of the neural classifiers. With the optimized recognition system a classification performance of 97.7% is achieved.

  20. Meaning Of The Term "Corruption Offense" As A Feature Of The Public Prosecutor's Supervision Over The Legislation On The Corruption Counteraction In The Municipal Governments Execution

    Directory of Open Access Journals (Sweden)

    Kseniya D. Okuneva

    2014-12-01

    Full Text Available In the present article theoretical and practical aspects of the corruption offense definition, which are being characteristic features of the methodology of prosecutorial supervision over the legislation on counteraction to corruption in local government are analyzed. Federal Law of Jan. 17, 1992 No. 2202-1 "On the Procuracy of the Russian Federation" (Article 21 establishes the public prosecutor's supervision over the legislation on combating corruption in local government execution, which is a special sub-cluster. On general terms of theoretical techniques of the prosecutor's supervision, taking into account its specific and complex nature of corruption prosecutors based activities in this area. Author emphasizes attention on characteristics of the corruption offense, as well as aspects of legal responsibility, which lie in the fact that it is applied in accordance with law to offender as measures of state coercion of personal, financial or organizational nature for the offense committed; responsibilities of the person, who committed the offense, to be subject to measures of state coercion. In the conclusion author notes that specifics of corruption offenses that are subject of prosecutorial supervision over the execution of legislation on combating corruption in local government is determined by the special status of the offense subjects, as well as the content of legal prohibitions and legal responsibilities in the field of ​​anti-corruption at the municipal level.

  1. Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data

    Indian Academy of Sciences (India)

    Debarka Sengupta; Indranil Aich; Sanghamitra Bandyopadhyay

    2015-10-01

    Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.

  2. Multi-task GLOH feature selection for human age estimation

    CERN Document Server

    Liang, Yixiong; Xu, Ying; Xiang, Yao; Zou, Beiji

    2011-01-01

    In this paper, we propose a novel age estimation method based on GLOH feature descriptor and multi-task learning (MTL). The GLOH feature descriptor, one of the state-of-the-art feature descriptor, is used to capture the age-related local and spatial information of face image. As the exacted GLOH features are often redundant, MTL is designed to select the most informative feature bins for age estimation problem, while the corresponding weights are determined by ridge regression. This approach largely reduces the dimensions of feature, which can not only improve performance but also decrease the computational burden. Experiments on the public available FG-NET database show that the proposed method can achieve comparable performance over previous approaches while using much fewer features.

  3. Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

    Energy Technology Data Exchange (ETDEWEB)

    Pon, R K; Cardenas, A F; Buttler, D J

    2007-09-19

    The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifier with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.

  4. Dominant Local Binary Pattern Based Face Feature Selection and Detection

    Directory of Open Access Journals (Sweden)

    Kavitha.T

    2010-04-01

    Full Text Available Face Detection plays a major role in Biometrics.Feature selection is a problem of formidable complexity. Thispaper proposes a novel approach to extract face features forface detection. The LBP features can be extracted faster in asingle scan through the raw image and lie in a lower dimensional space, whilst still retaining facial information efficiently. The LBP features are robust to low-resolution images. The dominant local binary pattern (DLBP is used to extract features accurately. A number of trainable methods are emerging in the empirical practice due to their effectiveness. The proposed method is a trainable system for selecting face features from over-completes dictionaries of imagemeasurements. After the feature selection procedure is completed the SVM classifier is used for face detection. The main advantage of this proposal is that it is trained on a very small training set. The classifier is used to increase the selection accuracy. This is not only advantageous to facilitate the datagathering stage, but, more importantly, to limit the training time. CBCL frontal faces dataset is used for training and validation.

  5. The Effect of Feature Selection on Phish Website Detection

    Directory of Open Access Journals (Sweden)

    Hiba Zuhair

    2015-10-01

    Full Text Available Recently, limited anti-phishing campaigns have given phishers more possibilities to bypass through their advanced deceptions. Moreover, failure to devise appropriate classification techniques to effectively identify these deceptions has degraded the detection of phishing websites. Consequently, exploiting as new; few; predictive; and effective features as possible has emerged as a key challenge to keep the detection resilient. Thus, some prior works had been carried out to investigate and apply certain selected methods to develop their own classification techniques. However, no study had generally agreed on which feature selection method that could be employed as the best assistant to enhance the classification performance. Hence, this study empirically examined these methods and their effects on classification performance. Furthermore, it recommends some promoting criteria to assess their outcomes and offers contribution on the problem at hand. Hybrid features, low and high dimensional datasets, different feature selection methods, and classification models were examined in this study. As a result, the findings displayed notably improved detection precision with low latency, as well as noteworthy gains in robustness and prediction susceptibilities. Although selecting an ideal feature subset was a challenging task, the findings retrieved from this study had provided the most advantageous feature subset as possible for robust selection and effective classification in the phishing detection domain.

  6. Selecting Optimal Subset of Features for Student Performance Model

    Directory of Open Access Journals (Sweden)

    Hany M. Harb

    2012-09-01

    Full Text Available Educational data mining (EDM is a new growing research area and the essence of data mining concepts are used in the educational field for the purpose of extracting useful information on the student behavior in the learning process. Classification methods like decision trees, rule mining, and Bayesian network, can be applied on the educational data for predicting the student behavior like performance in an examination. This prediction may help in student evaluation. As the feature selection influences the predictive accuracy of any performance model, it is essential to study elaborately the effectiveness of student performance model in connection with feature selection techniques. The main objective of this work is to achieve high predictive performance by adopting various feature selection techniques to increase the predictive accuracy with least number of features. The outcomes show a reduction in computational time and constructional cost in both training and classification phases of the student performance model.

  7. Protein fold classification with genetic algorithms and feature selection.

    Science.gov (United States)

    Chen, Peng; Liu, Chunmei; Burge, Legand; Mahmood, Mohammad; Southerland, William; Gloster, Clay

    2009-10-01

    Protein fold classification is a key step to predicting protein tertiary structures. This paper proposes a novel approach based on genetic algorithms and feature selection to classifying protein folds. Our dataset is divided into a training dataset and a test dataset. Each individual for the genetic algorithms represents a selection function of the feature vectors of the training dataset. A support vector machine is applied to each individual to evaluate the fitness value (fold classification rate) of each individual. The aim of the genetic algorithms is to search for the best individual that produces the highest fold classification rate. The best individual is then applied to the feature vectors of the test dataset and a support vector machine is built to classify protein folds based on selected features. Our experimental results on Ding and Dubchak's benchmark dataset of 27-class folds show that our approach achieves an accuracy of 71.28%, which outperforms current state-of-the-art protein fold predictors.

  8. DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS

    Directory of Open Access Journals (Sweden)

    A. A. Vorobeva

    2017-01-01

    Full Text Available The paper deals with identification and authentication of web users participating in the Internet information processes (based on features of online texts.In digital forensics web user identification based on various linguistic features can be used to discover identity of individuals, criminals or terrorists using the Internet to commit cybercrimes. Internet could be used as a tool in different types of cybercrimes (fraud and identity theft, harassment and anonymous threats, terrorist or extremist statements, distribution of illegal content and information warfare. Linguistic identification of web users is a kind of biometric identification, it can be used to narrow down the suspects, identify a criminal and prosecute him. Feature set includes various linguistic and stylistic features extracted from online texts. We propose dynamic feature selection for each web user identification task. Selection is based on calculating Manhattan distance to k-nearest neighbors (Relief-f algorithm. This approach improves the identification accuracy and minimizes the number of features. Experiments were carried out on several datasets with different level of class imbalance. Experiment results showed that features relevance varies in different set of web users (probable authors of some text; features selection for each set of web users improves identification accuracy by 4% at the average that is approximately 1% higher than with the use of static set of features. The proposed approach is most effective for a small number of training samples (messages per user.

  9. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  10. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  11. Mutual information-based feature selection for radiomics

    Science.gov (United States)

    Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine

    2016-03-01

    Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.

  12. Joint feature-sample selection and robust diagnosis of Parkinson's disease from MRI data.

    Science.gov (United States)

    Adeli, Ehsan; Shi, Feng; An, Le; Wee, Chong-Yaw; Wu, Guorong; Wang, Tao; Shen, Dinggang

    2016-11-01

    Parkinson's disease (PD) is an overwhelming neurodegenerative disorder caused by deterioration of a neurotransmitter, known as dopamine. Lack of this chemical messenger impairs several brain regions and yields various motor and non-motor symptoms. Incidence of PD is predicted to double in the next two decades, which urges more research to focus on its early diagnosis and treatment. In this paper, we propose an approach to diagnose PD using magnetic resonance imaging (MRI) data. Specifically, we first introduce a joint feature-sample selection (JFSS) method for selecting an optimal subset of samples and features, to learn a reliable diagnosis model. The proposed JFSS model effectively discards poor samples and irrelevant features. As a result, the selected features play an important role in PD characterization, which will help identify the most relevant and critical imaging biomarkers for PD. Then, a robust classification framework is proposed to simultaneously de-noise the selected subset of features and samples, and learn a classification model. Our model can also de-noise testing samples based on the cleaned training data. Unlike many previous works that perform de-noising in an unsupervised manner, we perform supervised de-noising for both training and testing data, thus boosting the diagnostic accuracy. Experimental results on both synthetic and publicly available PD datasets show promising results. To evaluate the proposed method, we use the popular Parkinson's progression markers initiative (PPMI) database. Our results indicate that the proposed method can differentiate between PD and normal control (NC), and outperforms the competing methods by a relatively large margin. It is noteworthy to mention that our proposed framework can also be used for diagnosis of other brain disorders. To show this, we have also conducted experiments on the widely-used ADNI database. The obtained results indicate that our proposed method can identify the imaging biomarkers and

  13. Hyperspectral image classification based on NMF Features Selection Method

    Science.gov (United States)

    Abe, Bolanle T.; Jordaan, J. A.

    2013-12-01

    Hyperspectral instruments are capable of collecting hundreds of images corresponding to wavelength channels for the same area on the earth surface. Due to the huge number of features (bands) in hyperspectral imagery, land cover classification procedures are computationally expensive and pose a problem known as the curse of dimensionality. In addition, higher correlation among contiguous bands increases the redundancy within the bands. Hence, dimension reduction of hyperspectral data is very crucial so as to obtain good classification accuracy results. This paper presents a new feature selection technique. Non-negative Matrix Factorization (NMF) algorithm is proposed to obtain reduced relevant features in the input domain of each class label. This aimed to reduce classification error and dimensionality of classification challenges. Indiana pines of the Northwest Indiana dataset is used to evaluate the performance of the proposed method through experiments of features selection and classification. The Waikato Environment for Knowledge Analysis (WEKA) data mining framework is selected as a tool to implement the classification using Support Vector Machines and Neural Network. The selected features subsets are subjected to land cover classification to investigate the performance of the classifiers and how the features size affects classification accuracy. Results obtained shows that performances of the classifiers are significant. The study makes a positive contribution to the problems of hyperspectral imagery by exploring NMF, SVMs and NN to improve classification accuracy. The performances of the classifiers are valuable for decision maker to consider tradeoffs in method accuracy versus method complexity.

  14. Tournament screening cum EBIC for feature selection with high-dimensional feature spaces

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics.The high dimensionality of the feature space causes serious diffculties:(i) the sample correlations between features become high even if the features are stochastically independent;(ii) the computation becomes intractable.These diffculties make conventional approaches either inapplicable or ine?cient.The reduction of dimensionality of the feature space followed by low dimensional approaches appears the only feasible way to tackle the problem.Along this line,we develop in this article a tournament screening cum EBIC approach for feature selection with high dimensional feature space.The procedure of tournament screening mimics that of a tournament.It is shown theoretically that the tournament screening has the sure screening property,a necessary property which should be satisfied by any valid screening procedure.It is demonstrated by numerical studies that the tournament screening cum EBIC approach enjoys desirable properties such as having higher positive selection rate and lower false discovery rate than other approaches.

  15. Feature selection versus feature compression in the building of calibration models from FTIR-spectrophotometry datasets.

    Science.gov (United States)

    Vergara, Alexander; Llobet, Eduard

    2012-01-15

    Undoubtedly, FTIR-spectrophotometry has become a standard in chemical industry for monitoring, on-the-fly, the different concentrations of reagents and by-products. However, representing chemical samples by FTIR spectra, which spectra are characterized by hundreds if not thousands of variables, conveys their own set of particular challenges because they necessitate to be analyzed in a high-dimensional feature space, where many of these features are likely to be highly correlated and many others surely affected by noise. Therefore, identifying a subset of features that preserves the classifier/regressor performance seems imperative prior any attempt to build an appropriate pattern recognition method. In this context, we investigate the benefit of utilizing two different dimensionality reduction methods, namely the minimum Redundancy-Maximum Relevance (mRMR) feature selection scheme and a new self-organized map (SOM) based feature compression, coupled to regression methods to quantitatively analyze two-component liquid samples utilizing FTIR spectrophotometry. Since these methods give us the possibility of selecting a small subset of relevant features from FTIR spectra preserving the statistical characteristics of the target variable being analyzed, we claim that expressing the FTIR spectra by these dimensionality-reduced set of features may be beneficial. We demonstrate the utility of these novel feature selection schemes in quantifying the distinct analytes within their binary mixtures utilizing a FTIR-spectrophotometer.

  16. Selective processing of multiple features in the human brain: effects of feature type and salience.

    Science.gov (United States)

    McGinnis, E Menton; Keil, Andreas

    2011-02-09

    Identifying targets in a stream of items at a given constant spatial location relies on selection of aspects such as color, shape, or texture. Such attended (target) features of a stimulus elicit a negative-going event-related brain potential (ERP), termed Selection Negativity (SN), which has been used as an index of selective feature processing. In two experiments, participants viewed a series of Gabor patches in which targets were defined as a specific combination of color, orientation, and shape. Distracters were composed of different combinations of color, orientation, and shape of the target stimulus. This design allows comparisons of items with and without specific target features. Consistent with previous ERP research, SN deflections extended between 160-300 ms. Data from the subsequent P3 component (300-450 ms post-stimulus) were also examined, and were regarded as an index of target processing. In Experiment A, predominant effects of target color on SN and P3 amplitudes were found, along with smaller ERP differences in response to variations of orientation and shape. Manipulating color to be less salient while enhancing the saliency of the orientation of the Gabor patch (Experiment B) led to delayed color selection and enhanced orientation selection. Topographical analyses suggested that the location of SN on the scalp reliably varies with the nature of the to-be-attended feature. No interference of non-target features on the SN was observed. These results suggest that target feature selection operates by means of electrocortical facilitation of feature-specific sensory processes, and that selective electrocortical facilitation is more effective when stimulus saliency is heightened.

  17. Selective processing of multiple features in the human brain: effects of feature type and salience.

    Directory of Open Access Journals (Sweden)

    E Menton McGinnis

    Full Text Available Identifying targets in a stream of items at a given constant spatial location relies on selection of aspects such as color, shape, or texture. Such attended (target features of a stimulus elicit a negative-going event-related brain potential (ERP, termed Selection Negativity (SN, which has been used as an index of selective feature processing. In two experiments, participants viewed a series of Gabor patches in which targets were defined as a specific combination of color, orientation, and shape. Distracters were composed of different combinations of color, orientation, and shape of the target stimulus. This design allows comparisons of items with and without specific target features. Consistent with previous ERP research, SN deflections extended between 160-300 ms. Data from the subsequent P3 component (300-450 ms post-stimulus were also examined, and were regarded as an index of target processing. In Experiment A, predominant effects of target color on SN and P3 amplitudes were found, along with smaller ERP differences in response to variations of orientation and shape. Manipulating color to be less salient while enhancing the saliency of the orientation of the Gabor patch (Experiment B led to delayed color selection and enhanced orientation selection. Topographical analyses suggested that the location of SN on the scalp reliably varies with the nature of the to-be-attended feature. No interference of non-target features on the SN was observed. These results suggest that target feature selection operates by means of electrocortical facilitation of feature-specific sensory processes, and that selective electrocortical facilitation is more effective when stimulus saliency is heightened.

  18. [Electroencephalogram Feature Selection Based on Correlation Coefficient Analysis].

    Science.gov (United States)

    Zhou, Jinzhi; Tang, Xiaofang

    2015-08-01

    In order to improve the accuracy of classification with small amount of motor imagery training data on the development of brain-computer interface (BCD systems, we proposed an analyzing method to automatically select the characteristic parameters based on correlation coefficient analysis. Throughout the five sample data of dataset IV a from 2005 BCI Competition, we utilized short-time Fourier transform (STFT) and correlation coefficient calculation to reduce the number of primitive electroencephalogram dimension, then introduced feature extraction based on common spatial pattern (CSP) and classified by linear discriminant analysis (LDA). Simulation results showed that the average rate of classification accuracy could be improved by using correlation coefficient feature selection method than those without using this algorithm. Comparing with support vector machine (SVM) optimization features algorithm, the correlation coefficient analysis can lead better selection parameters to improve the accuracy of classification.

  19. Informative Feature Selection for Object Recognition via Sparse PCA

    Science.gov (United States)

    2011-04-07

    the BMW database [17] are used for training. For each image pair in SfM, SURF features are deemed informative if the consensus of the corresponding...observe that the first two sparse PVs are sufficient for selecting in- formative features that lie on the foreground objects in the BMW database (as... BMW ) database [17]. The database consists of multiple-view images of 20 landmark buildings on the Berkeley campus. For each building, wide-baseline

  20. Multi-Stage Feature Selection by Using Genetic Algorithms for Fault Diagnosis in Gearboxes Based on Vibration Signal

    Directory of Open Access Journals (Sweden)

    Mariela Cerrada

    2015-09-01

    Full Text Available There are growing demands for condition-based monitoring of gearboxes, and techniques to improve the reliability, effectiveness and accuracy for fault diagnosis are considered valuable contributions. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance in the diagnosis system. The main aim of this research is to propose a multi-stage feature selection mechanism for selecting the best set of condition parameters on the time, frequency and time-frequency domains, which are extracted from vibration signals for fault diagnosis purposes in gearboxes. The selection is based on genetic algorithms, proposing in each stage a new subset of the best features regarding the classifier performance in a supervised environment. The selected features are augmented at each stage and used as input for a neural network classifier in the next step, while a new subset of feature candidates is treated by the selection process. As a result, the inherent exploration and exploitation of the genetic algorithms for finding the best solutions of the selection problem are locally focused. The Sensors 2015, 15 23904 approach is tested on a dataset from a real test bed with several fault classes under different running conditions of load and velocity. The model performance for diagnosis is over 98%.

  1. Selecting Features of Single Lead ECG Signal for Automatic Sleep Stages Classification using Correlation-based Feature Subset Selection

    Directory of Open Access Journals (Sweden)

    Ary Noviyanto

    2011-09-01

    Full Text Available Knowing about our sleep quality will help human life to maximize our life performance. ECG signal has potency to determine the sleep stages so that sleep quality can be measured. The data that used in this research is single lead ECG signal from the MIT-BIH Polysomnographic Database. The ECGs features can be derived from RR interval, EDR information and raw ECG signal. Correlation-based Feature Subset Selection (CFS is used to choose the features which are significant to determine the sleep stages. Those features will be evaluated using four different characteristic classifiers (Bayesian network, multilayer perceptron, IB1 and random forest. Performance evaluations by Bayesian network, IB1 and random forest show that CFS performs excellent. It can reduce the number of features significantly with small decreasing accuracy. The best classification result based on this research is a combination of the feature set derived from raw ECG signal and the random forest classifier.

  2. Feature Selection for Audio Surveillance in Urban Environment

    Directory of Open Access Journals (Sweden)

    KIKTOVA Eva

    2014-05-01

    Full Text Available This paper presents the work leading to the acoustic event detection system, which is designed to recognize two types of acoustic events (shot and breaking glass in urban environment. For this purpose, a huge front-end processing was performed for the effective parametric representation of an input sound. MFCC features and features computed during their extraction (MELSPEC and FBANK, then MPEG-7 audio descriptors and other temporal and spectral characteristics were extracted. High dimensional feature sets were created and in the next phase reduced by the mutual information based selection algorithms. Hidden Markov Model based classifier was applied and evaluated by the Viterbi decoding algorithm. Thus very effective feature sets were identified and also the less important features were found.

  3. Feature selection gait-based gender classification under different circumstances

    Science.gov (United States)

    Sabir, Azhin; Al-Jawad, Naseer; Jassim, Sabah

    2014-05-01

    This paper proposes a gender classification based on human gait features and investigates the problem of two variations: clothing (wearing coats) and carrying bag condition as addition to the normal gait sequence. The feature vectors in the proposed system are constructed after applying wavelet transform. Three different sets of feature are proposed in this method. First, Spatio-temporal distance that is dealing with the distance of different parts of the human body (like feet, knees, hand, Human Height and shoulder) during one gait cycle. The second and third feature sets are constructed from approximation and non-approximation coefficient of human body respectively. To extract these two sets of feature we divided the human body into two parts, upper and lower body part, based on the golden ratio proportion. In this paper, we have adopted a statistical method for constructing the feature vector from the above sets. The dimension of the constructed feature vector is reduced based on the Fisher score as a feature selection method to optimize their discriminating significance. Finally k-Nearest Neighbor is applied as a classification method. Experimental results demonstrate that our approach is providing more realistic scenario and relatively better performance compared with the existing approaches.

  4. Spatial selection of features within perceived and remembered objects

    Directory of Open Access Journals (Sweden)

    Duncan E Astle

    2009-04-01

    Full Text Available Our representation of the visual world can be modulated by spatially specific attentional biases that depend flexibly on task goals. We compared searching for task-relevant features in perceived versus remembered objects. When searching perceptual input, selected task-relevant and suppressed task-irrelevant features elicited contrasting spatiotopic ERP effects, despite them being perceptually identical. This was also true when participants searched a memory array, suggesting that memory had retained the spatial organisation of the original perceptual input and that this representation could be modulated in a spatially specific fashion. However, task-relevant selection and task-irrelevant suppression effects were of the opposite polarity when searching remembered compared to perceived objects. We suggest that this surprising result stems from the nature of feature- and object-based representations when stored in visual short-term memory. When stored, features are integrated into objects, meaning that the spatially specific selection mechanisms must operate upon objects rather than specific feature-level representations.

  5. Technical Evaluation Report 27: Educational Wikis: Features and selection criteria

    Directory of Open Access Journals (Sweden)

    Jim Rudolph

    2004-04-01

    Full Text Available This report discusses the educational uses of the ‘wiki,’ an increasingly popular approach to online community development. Wikis are defined and compared with ‘blogging’ methods; characteristics of major wiki engines are described; and wiki features and selection criteria are examined.

  6. Variance Ranklets : Orientation-selective rank features for contrast modulations

    NARCIS (Netherlands)

    Azzopardi, George; Smeraldi, Fabrizio

    2009-01-01

    We introduce a novel type of orientation–selective rank features that are sensitive to contrast modulations (second–order stimuli). Variance Ranklets are designed in close analogy with the standard Ranklets, but use the Siegel–Tukey statistics for dispersion instead of the Wilcoxon statistics. Their

  7. Emotion of Physiological Signals Classification Based on TS Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Wang Yujing; Mo Jianlin

    2015-01-01

    This paper propose a method of TS-MLP about emotion recognition of physiological signal.It can recognize emotion successfully by Tabu search which selects features of emotion’s physiological signals and multilayer perceptron that is used to classify emotion.Simulation shows that it has achieved good emotion classification performance.

  8. Magnetic Field Feature Extraction and Selection for Indoor Location Estimation

    Directory of Open Access Journals (Sweden)

    Carlos E. Galván-Tejada

    2014-06-01

    Full Text Available User indoor positioning has been under constant improvement especially with the availability of new sensors integrated into the modern mobile devices, which allows us to exploit not only infrastructures made for everyday use, such as WiFi, but also natural infrastructure, as is the case of natural magnetic field. In this paper we present an extension and improvement of our current indoor localization model based on the feature extraction of 46 magnetic field signal features. The extension adds a feature selection phase to our methodology, which is performed through Genetic Algorithm (GA with the aim of optimizing the fitness of our current model. In addition, we present an evaluation of the final model in two different scenarios: home and office building. The results indicate that performing a feature selection process allows us to reduce the number of signal features of the model from 46 to 5 regardless the scenario and room location distribution. Further, we verified that reducing the number of features increases the probability of our estimator correctly detecting the user’s location (sensitivity and its capacity to detect false positives (specificity in both scenarios.

  9. Using PSO-Based Hierarchical Feature Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Zhiwei Ji

    2014-01-01

    Full Text Available Hepatocellular carcinoma (HCC is one of the most common malignant tumors. Clinical symptoms attributable to HCC are usually absent, thus often miss the best therapeutic opportunities. Traditional Chinese Medicine (TCM plays an active role in diagnosis and treatment of HCC. In this paper, we proposed a particle swarm optimization-based hierarchical feature selection (PSOHFS model to infer potential syndromes for diagnosis of HCC. Firstly, the hierarchical feature representation is developed by a three-layer tree. The clinical symptoms and positive score of patient are leaf nodes and root in the tree, respectively, while each syndrome feature on the middle layer is extracted from a group of symptoms. Secondly, an improved PSO-based algorithm is applied in a new reduced feature space to search an optimal syndrome subset. Based on the result of feature selection, the causal relationships of symptoms and syndromes are inferred via Bayesian networks. In our experiment, 147 symptoms were aggregated into 27 groups and 27 syndrome features were extracted. The proposed approach discovered 24 syndromes which obviously improved the diagnosis accuracy. Finally, the Bayesian approach was applied to represent the causal relationships both at symptom and syndrome levels. The results show that our computational model can facilitate the clinical diagnosis of HCC.

  10. Auditory-model based robust feature selection for speech recognition.

    Science.gov (United States)

    Koniaris, Christos; Kuropatwinski, Marcin; Kleijn, W Bastiaan

    2010-02-01

    It is shown that robust dimension-reduction of a feature set for speech recognition can be based on a model of the human auditory system. Whereas conventional methods optimize classification performance, the proposed method exploits knowledge implicit in the auditory periphery, inheriting its robustness. Features are selected to maximize the similarity of the Euclidean geometry of the feature domain and the perceptual domain. Recognition experiments using mel-frequency cepstral coefficients (MFCCs) confirm the effectiveness of the approach, which does not require labeled training data. For noisy data the method outperforms commonly used discriminant-analysis based dimension-reduction methods that rely on labeling. The results indicate that selecting MFCCs in their natural order results in subsets with good performance.

  11. Review and Evaluation of Feature Selection Algorithms in Synthetic Problems

    CERN Document Server

    Belanche, L A

    2011-01-01

    The main purpose of Feature Subset Selection is to find a reduced subset of attributes from a data set described by a feature set. The task of a feature selection algorithm (FSA) is to provide with a computational solution motivated by a certain definition of relevance or by a reliable evaluation measure. In this paper several fundamental algorithms are studied to assess their performance in a controlled experimental scenario. A measure to evaluate FSAs is devised that computes the degree of matching between the output given by a FSA and the known optimal solutions. An extensive experimental study on synthetic problems is carried out to assess the behaviour of the algorithms in terms of solution accuracy and size as a function of the relevance, irrelevance, redundancy and size of the data samples. The controlled experimental conditions facilitate the derivation of better-supported and meaningful conclusions.

  12. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles

    2012-04-26

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  13. Direct and two-stage data analysis procedures based on PCA, PLS-DA and ANN for ISE-based electronic tongue-Effect of supervised feature extraction.

    Science.gov (United States)

    Ciosek, P; Brzózka, Z; Wróblewski, W; Martinelli, E; Di Natale, C; D'Amico, A

    2005-09-15

    A novel strategy of data analysis for artificial taste and odour systems is presented in this work. It is demonstrated that using a supervised method also in feature extraction phase enhances fruit juice classification capability of sensor array developed at Warsaw University of Technology. Comparison of direct processing (raw data processed by Artificial Neural Network (ANN), raw data processed by Partial Least Squares-Discriminant Analysis (PLS-DA)) and two-stage processing (Principal Components Analysis (PCA) outputs processed by ANN, PLS-DA outputs processed by ANN) is presented. It is shown that considerable increase of classification capability occurred in the case of the new method proposed by the authors.

  14. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    Directory of Open Access Journals (Sweden)

    Liao Li

    2010-10-01

    Full Text Available Abstract Background Protein-protein interaction (PPI plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs, based on domains represented as interaction profile hidden Markov models (ipHMM where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB. Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD. Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure, an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on

  15. Economic indicators selection for crime rates forecasting using cooperative feature selection

    Science.gov (United States)

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Salleh Sallehuddin, Roselina

    2013-04-01

    Features selection in multivariate forecasting model is very important to ensure that the model is accurate. The purpose of this study is to apply the Cooperative Feature Selection method for features selection. The features are economic indicators that will be used in crime rate forecasting model. The Cooperative Feature Selection combines grey relational analysis and artificial neural network to establish a cooperative model that can rank and select the significant economic indicators. Grey relational analysis is used to select the best data series to represent each economic indicator and is also used to rank the economic indicators according to its importance to the crime rate. After that, the artificial neural network is used to select the significant economic indicators for forecasting the crime rates. In this study, we used economic indicators of unemployment rate, consumer price index, gross domestic product and consumer sentiment index, as well as data rates of property crime and violent crime for the United States. Levenberg-Marquardt neural network is used in this study. From our experiments, we found that consumer price index is an important economic indicator that has a significant influence on the violent crime rate. While for property crime rate, the gross domestic product, unemployment rate and consumer price index are the influential economic indicators. The Cooperative Feature Selection is also found to produce smaller errors as compared to Multiple Linear Regression in forecasting property and violent crime rates.

  16. An Improved Particle Swarm Optimization for Feature Selection

    Institute of Scientific and Technical Information of China (English)

    Yuanning Liu; Gang Wang; Huiling Chen; Hao Dong; Xiaodong Zhu; Sujing Wang

    2011-01-01

    Particle Swarm Optimization (PSO) is a popular and bionic algorithm based on the social behavior associated with bird flocking for optimization problems. To maintain the diversity of swarms, a few studies of multi-swarm strategy have been reported. However, the competition among swarms, reservation or destruction of a swarm, has not been considered further. In this paper, we formulate four rules by introducing the mechanism for survival of the fittest, which simulates the competition among the swarms. Based on the mechanism, we design a modified Multi-Swarm PSO (MSPSO) to solve discrete problems,which consists of a number of sub-swarms and a multi-swarm scheduler that can monitor and control each sub-swarm using the rules. To further settle the feature selection problems, we propose an Improved Feature Selection (IFS) method by integrating MSPSO, Support Vector Machines (SVM) with F-score method. The IFS method aims to achieve higher generalization capability through performing kernel parameter optimization and feature selection simultaneously. The performance of the proposed method is compared with that of the standard PSO based, Genetic Algorithm (GA) based and the grid search based methods on 10 benchmark datasets, taken from UCI machine learning and StatLog databases. The numerical results and statistical analysis show that the proposed IFS method performs significantly better than the other three methods in terms of prediction accuracy with smaller subset of features.

  17. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Roy Kaushik

    2008-01-01

    Full Text Available Abstract The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  18. Optimal Features Subset Selection and Classification for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Prabir Bhattacharya

    2008-06-01

    Full Text Available The selection of the optimal features subset and the classification have become an important issue in the field of iris recognition. We propose a feature selection scheme based on the multiobjectives genetic algorithm (MOGA to improve the recognition accuracy and asymmetrical support vector machine for the classification of iris patterns. We also suggest a segmentation scheme based on the collarette area localization. The deterministic feature sequence is extracted from the iris images using the 1D log-Gabor wavelet technique, and the extracted feature sequence is used to train the support vector machine (SVM. The MOGA is applied to optimize the features sequence and to increase the overall performance based on the matching accuracy of the SVM. The parameters of SVM are optimized to improve the overall generalization performance, and the traditional SVM is modified to an asymmetrical SVM to treat the false accept and false reject cases differently and to handle the unbalanced data of a specific class with respect to the other classes. Our experimental results indicate that the performance of SVM as a classifier is better than the performance of the classifiers based on the feedforward neural network, the k-nearest neighbor, and the Hamming and the Mahalanobis distances. The proposed technique is computationally effective with recognition rates of 99.81% and 96.43% on CASIA and ICE datasets, respectively.

  19. Making Trillion Correlations Feasible in Feature Grouping and Selection.

    Science.gov (United States)

    Zhai, Yiteng; Ong, Yew-Soon; Tsang, Ivor W

    2016-12-01

    Today, modern databases with "Big Dimensionality" are experiencing a growing trend. Existing approaches that require the calculations of pairwise feature correlations in their algorithmic designs have scored miserably on such databases, since computing the full correlation matrix (i.e., square of dimensionality in size) is computationally very intensive (i.e., million features would translate to trillion correlations). This poses a notable challenge that has received much lesser attention in the field of machine learning and data mining research. Thus, this paper presents a study to fill in this gap. Our findings on several established databases with big dimensionality across a wide spectrum of domains have indicated that an extremely small portion of the feature pairs contributes significantly to the underlying interactions and there exists feature groups that are highly correlated. Inspired by the intriguing observations, we introduce a novel learning approach that exploits the presence of sparse correlations for the efficient identifications of informative and correlated feature groups from big dimensional data that translates to a reduction in complexity from O(m(2)n) to O(mlogm + Ka mn), where Ka strategy, designed to filter out the large number of non-contributing correlations that could otherwise confuse the classifier while identifying the correlated and informative feature groups, forms one of the highlights of our approach. We also demonstrated the proposed method on one-class learning, where notable speedup can be observed when solving one-class problem on big dimensional data. Further, to identify robust informative features with minimal sampling bias, our feature selection strategy embeds the V-fold cross validation in the learning model, so as to seek for features that exhibit stable or consistent performance accuracy on multiple data folds. Extensive empirical studies on both synthetic and several real-world datasets comprising up to 30 million

  20. Feature selection and survival modeling in The Cancer Genome Atlas

    Directory of Open Access Journals (Sweden)

    Kim H

    2013-09-01

    Full Text Available Hyunsoo Kim,1 Markus Bredel2 1Department of Pathology, The University of Alabama at Birmingham, Birmingham, AL, USA; 2Department of Radiation Oncology, and Comprehensive Cancer Center, The University of Alabama at Birmingham, Birmingham, AL, USA Purpose: Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling. Patients and methods: The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1 1-nearest neighbor (1-NN survival prediction method; (2 random patient selection method and a Cox-based regression method with nested cross-validation; (3 least absolute shrinkage and selection operator (LASSO optimization using whole-genome gene expression profiles; or (4 gene expression profiles of cancer pathway genes. Results: The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone. Conclusion: The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology. Keywords: brain, feature selection

  1. Feature Extraction and Selection From the Perspective of Explosive Detection

    Energy Technology Data Exchange (ETDEWEB)

    Sengupta, S K

    2009-09-01

    ) digitized 3-dimensional attenuation images with a voxel resolution of the order of one quarter of a milimeter. In the task of feature extraction and subsequent selection of an appropriate subset thereof, several important factors need to be considered. Foremost among them are: (1) Definition of the sampling unit from which the features will be extracted for the purpose of detection/ identification of the explosives. (2) The choice of features ( given the sampling unit) to be extracted that can be used to signal the existence / identity of the explosive. (3) Robustness of the computed features under different inspection conditions. To attain robustness, invariance under the transformations of translation, scaling, rotation and change of orientation is highly desirable. (4) The computational costs in the process of feature extraction, selection and their use in explosive detection/ identification In the search for extractable features, we have done a thorough literature survey with the above factors in mind and come out with a list of features that could possibly help us in meeting our objective. We are assuming that features will be based on sampling units that are single CT slices of the target. This may however change when appropriate modifications should be made to the feature extraction process. We indicate below some of the major types of features in 2- or 3-dimensional images that have been used in the literature on application of pattern recognition (PR) techniques in image understanding and are possibly pertinent to our study. In the following paragraph, we briefly indicate the motivation that guided us in the choice of these features, and identify the nature of the constraints. The principal feature types derivable from an image will be discussed in section 2. Once the features are extracted, one must select a subset of this feature set that will retain the most useful information and remove any redundant and irrelevant information that may have a detrimental effect

  2. Feature Selection for Generator Excitation Neurocontroller Development Using Filter Technique

    Directory of Open Access Journals (Sweden)

    Abdul Ghani Abro

    2011-09-01

    Full Text Available Essentially, motive behind using control system is to generate suitable control signal for yielding desired response of a physical process. Control of synchronous generator has always remained very critical in power system operation and control. For certain well known reasons power generators are normally operated well below their steady state stability limit. This raises demand for efficient and fast controllers. Artificial intelligence has been reported to give revolutionary outcomes in the field of control engineering. Artificial Neural Network (ANN, a branch of artificial intelligence has been used for nonlinear and adaptive control, utilizing its inherent observability. The overall performance of neurocontroller is dependent upon input features too. Selecting optimum features to train a neurocontroller optimally is very critical. Both quality and size of data are of equal importance for better performance. In this work filter technique is employed to select independent factors for ANN training.

  3. Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

    CERN Document Server

    Donalek, Ciro; Djorgovski, S G; Mahabal, Ashish A; Graham, Matthew J; Fuchs, Thomas J; Turmon, Michael J; Philip, N Sajeeth; Yang, Michael Ting-Chang; Longo, Giuseppe

    2013-01-01

    The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection is quickly becoming a crucial task in analyzing astronomical data sets. Using data sets extracted from the ongoing Catalina Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a variety of feature selection strategies used to identify the subsets that give the most information and the results achieved applying these techniques to three major astronomical problems.

  4. Acute Exercise Modulates Feature-selective Responses in Human Cortex.

    Science.gov (United States)

    Bullock, Tom; Elliott, James C; Serences, John T; Giesbrecht, Barry

    2017-04-01

    An organism's current behavioral state influences ongoing brain activity. Nonhuman mammalian and invertebrate brains exhibit large increases in the gain of feature-selective neural responses in sensory cortex during locomotion, suggesting that the visual system becomes more sensitive when actively exploring the environment. This raises the possibility that human vision is also more sensitive during active movement. To investigate this possibility, we used an inverted encoding model technique to estimate feature-selective neural response profiles from EEG data acquired from participants performing an orientation discrimination task. Participants (n = 18) fixated at the center of a flickering (15 Hz) circular grating presented at one of nine different orientations and monitored for a brief shift in orientation that occurred on every trial. Participants completed the task while seated on a stationary exercise bike at rest and during low- and high-intensity cycling. We found evidence for inverted-U effects; such that the peak of the reconstructed feature-selective tuning profiles was highest during low-intensity exercise compared with those estimated during rest and high-intensity exercise. When modeled, these effects were driven by changes in the gain of the tuning curve and in the profile bandwidth during low-intensity exercise relative to rest. Thus, despite profound differences in visual pathways across species, these data show that sensitivity in human visual cortex is also enhanced during locomotive behavior. Our results reveal the nature of exercise-induced gain on feature-selective coding in human sensory cortex and provide valuable evidence linking the neural mechanisms of behavior state across species.

  5. HYBRID FEATURE SELECTION ALGORITHM FOR INTRUSION DETECTION SYSTEM

    Directory of Open Access Journals (Sweden)

    Seyed Reza Hasani

    2014-01-01

    Full Text Available Network security is a serious global concern. Usefulness Intrusion Detection Systems (IDS are increasing incredibly in Information Security research using Soft computing techniques. In the previous researches having irrelevant and redundant features are recognized causes of increasing the processing speed of evaluating the known intrusive patterns. In addition, an efficient feature selection method eliminates dimension of data and reduce redundancy and ambiguity caused by none important attributes. Therefore, feature selection methods are well-known methods to overcome this problem. There are various approaches being utilized in intrusion detections, they are able to perform their method and relatively they are achieved with some improvements. This work is based on the enhancement of the highest Detection Rate (DR algorithm which is Linear Genetic Programming (LGP reducing the False Alarm Rate (FAR incorporates with Bees Algorithm. Finally, Support Vector Machine (SVM is one of the best candidate solutions to settle IDSs problems. In this study four sample dataset containing 4000 random records are excluded randomly from this dataset for training and testing purposes. Experimental results show that the LGP_BA method improves the accuracy and efficiency compared with the previous related research and the feature subcategory offered by LGP_BA gives a superior representation of data.

  6. Online Feature Selection of Class Imbalance via PA Algorithm

    Institute of Scientific and Technical Information of China (English)

    Chao Han; Yun-Kun Tan; Jin-Hui Zhu; Yong Guo; Jian Chen; Qing-Yao Wu

    2016-01-01

    Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.

  7. Feature selection for face recognition: a memetic algorithmic approach

    Institute of Scientific and Technical Information of China (English)

    Dinesh KUMAR; Shakti KUMAR; C. S. RAI

    2009-01-01

    The eigenface method that uses principal component analysis (PCA) has been the standard and popular method used in face recognition. This paper presents a PCA-memetic algorithm (PCA-MA) approach for feature selection. PCA has been extended by MAs where the former was used for feature extraction/dimensionality reduction and the latter exploited for feature selection. Simulations were performed over ORL and YaleB face databases using Euclidean norm as the classifier. It was found that as far as the recognition rate is concerned, PCA-MA completely outperforms the eigenface method. We compared the performance of PCA extended with genetic algorithm (PCA-GA) with our proposed PCA-MA method. The results also clearly established the supremacy of the PCA-MA method over the PCA-GA method. We further extended linear discriminant analysis (LDA) and kernel principal component analysis (KPCA) approaches with the MA and observed significant improvement in recognition rate with fewer features. This paper also compares the performance of PCA-MA, LDA-MA and KPCA-MA approaches.

  8. Use of genetic algorithm for the selection of EEG features

    Science.gov (United States)

    Asvestas, P.; Korda, A.; Kostopoulos, S.; Karanasiou, I.; Ouzounoglou, A.; Sidiropoulos, K.; Ventouras, E.; Matsopoulos, G.

    2015-09-01

    Genetic Algorithm (GA) is a popular optimization technique that can detect the global optimum of a multivariable function containing several local optima. GA has been widely used in the field of biomedical informatics, especially in the context of designing decision support systems that classify biomedical signals or images into classes of interest. The aim of this paper is to present a methodology, based on GA, for the selection of the optimal subset of features that can be used for the efficient classification of Event Related Potentials (ERPs), which are recorded during the observation of correct or incorrect actions. In our experiment, ERP recordings were acquired from sixteen (16) healthy volunteers who observed correct or incorrect actions of other subjects. The brain electrical activity was recorded at 47 locations on the scalp. The GA was formulated as a combinatorial optimizer for the selection of the combination of electrodes that maximizes the performance of the Fuzzy C Means (FCM) classification algorithm. In particular, during the evolution of the GA, for each candidate combination of electrodes, the well-known (Σ, Φ, Ω) features were calculated and were evaluated by means of the FCM method. The proposed methodology provided a combination of 8 electrodes, with classification accuracy 93.8%. Thus, GA can be the basis for the selection of features that discriminate ERP recordings of observations of correct or incorrect actions.

  9. Processing of Feature Selectivity in Cortical Networks with Specific Connectivity.

    Directory of Open Access Journals (Sweden)

    Sadra Sadeh

    Full Text Available Although non-specific at the onset of eye opening, networks in rodent visual cortex attain a non-random structure after eye opening, with a specific bias for connections between neurons of similar preferred orientations. As orientation selectivity is already present at eye opening, it remains unclear how this specificity in network wiring contributes to feature selectivity. Using large-scale inhibition-dominated spiking networks as a model, we show that feature-specific connectivity leads to a linear amplification of feedforward tuning, consistent with recent electrophysiological single-neuron recordings in rodent neocortex. Our results show that optimal amplification is achieved at an intermediate regime of specific connectivity. In this configuration a moderate increase of pairwise correlations is observed, consistent with recent experimental findings. Furthermore, we observed that feature-specific connectivity leads to the emergence of orientation-selective reverberating activity, and entails pattern completion in network responses. Our theoretical analysis provides a mechanistic understanding of subnetworks' responses to visual stimuli, and casts light on the regime of operation of sensory cortices in the presence of specific connectivity.

  10. Evaluation of entropy and JM-distance criterions as features selection methods using spectral and spatial features derived from LANDSAT images

    Science.gov (United States)

    Parada, N. D. J. (Principal Investigator); Dutra, L. V.; Mascarenhas, N. D. A.; Mitsuo, Fernando Augusta, II

    1984-01-01

    A study area near Ribeirao Preto in Sao Paulo state was selected, with predominance in sugar cane. Eight features were extracted from the 4 original bands of LANDSAT image, using low-pass and high-pass filtering to obtain spatial features. There were 5 training sites in order to acquire the necessary parameters. Two groups of four channels were selected from 12 channels using JM-distance and entropy criterions. The number of selected channels was defined by physical restrictions of the image analyzer and computacional costs. The evaluation was performed by extracting the confusion matrix for training and tests areas, with a maximum likelihood classifier, and by defining performance indexes based on those matrixes for each group of channels. Results show that in spatial features and supervised classification, the entropy criterion is better in the sense that allows a more accurate and generalized definition of class signature. On the other hand, JM-distance criterion strongly reduces the misclassification within training areas.

  11. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  12. An Optimal SVM with Feature Selection Using Multiobjective PSO

    Directory of Open Access Journals (Sweden)

    Iman Behravan

    2016-01-01

    Full Text Available Support vector machine is a classifier, based on the structured risk minimization principle. The performance of the SVM depends on different parameters such as penalty factor, C, and the kernel factor, σ. Also choosing an appropriate kernel function can improve the recognition score and lower the amount of computation. Furthermore, selecting the useful features among several features in dataset not only increases the performance of the SVM, but also reduces the computational time and complexity. So this is an optimization problem which can be solved by heuristic algorithm. In some cases besides the recognition score, the reliability of the classifier’s output is important. So in such cases a multiobjective optimization algorithm is needed. In this paper we have got the MOPSO algorithm to optimize the parameters of the SVM, choose appropriate kernel function, and select the best feature subset simultaneously in order to optimize the recognition score and the reliability of the SVM concurrently. Nine different datasets, from UCI machine learning repository, are used to evaluate the power and the effectiveness of the proposed method (MOPSO-SVM. The results of the proposed method are compared to those which are achieved by single SVM, RBF, and MLP neural networks.

  13. Feature selection applied to ultrasound carotid images segmentation.

    Science.gov (United States)

    Rosati, Samanta; Molinari, Filippo; Balestra, Gabriella

    2011-01-01

    The automated tracing of the carotid layers on ultrasound images is complicated by noise, different morphology and pathology of the carotid artery. In this study we benchmarked four methods for feature selection on a set of variables extracted from ultrasound carotid images. The main goal was to select those parameters containing the highest amount of information useful to classify the pixels in the carotid regions they belong to. Six different classes of pixels were identified: lumen, lumen-intima interface, intima-media complex, media-adventitia interface, adventitia and adventitia far boundary. The performances of QuickReduct Algorithm (QRA), Entropy-Based Algorithm (EBR), Improved QuickReduct Algorithm (IQRA) and Genetic Algorithm (GA) were compared using Artificial Neural Networks (ANNs). All methods returned subsets with a high dependency degree, even if the average classification accuracy was about 50%. Among all classes, the best results were obtained for the lumen. Overall, the four methods for feature selection assessed in this study return comparable results. Despite the need for accuracy improvement, this study could be useful to build a pre-classifier stage for the optimization of segmentation performance in ultrasound automated carotid segmentation.

  14. Information Theory for Gabor Feature Selection for Face Recognition

    Directory of Open Access Journals (Sweden)

    Shen Linlin

    2006-01-01

    Full Text Available A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004, our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  15. Information Theory for Gabor Feature Selection for Face Recognition

    Science.gov (United States)

    Shen, Linlin; Bai, Li

    2006-12-01

    A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004), our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  16. Recursive Feature Selection with Significant Variables of Support Vectors

    Directory of Open Access Journals (Sweden)

    Chen-An Tsai

    2012-01-01

    Full Text Available The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE and recursive support vector machine (RSVM. The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  17. Improving permafrost distribution modelling using feature selection algorithms

    Science.gov (United States)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2016-04-01

    The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its

  18. Feature-Selective Attentional Modulations in Human Frontoparietal Cortex.

    Science.gov (United States)

    Ester, Edward F; Sutterer, David W; Serences, John T; Awh, Edward

    2016-08-03

    Control over visual selection has long been framed in terms of a dichotomy between "source" and "site," where top-down feedback signals originating in frontoparietal cortical areas modulate or bias sensory processing in posterior visual areas. This distinction is motivated in part by observations that frontoparietal cortical areas encode task-level variables (e.g., what stimulus is currently relevant or what motor outputs are appropriate), while posterior sensory areas encode continuous or analog feature representations. Here, we present evidence that challenges this distinction. We used fMRI, a roving searchlight analysis, and an inverted encoding model to examine representations of an elementary feature property (orientation) across the entire human cortical sheet while participants attended either the orientation or luminance of a peripheral grating. Orientation-selective representations were present in a multitude of visual, parietal, and prefrontal cortical areas, including portions of the medial occipital cortex, the lateral parietal cortex, and the superior precentral sulcus (thought to contain the human homolog of the macaque frontal eye fields). Additionally, representations in many-but not all-of these regions were stronger when participants were instructed to attend orientation relative to luminance. Collectively, these findings challenge models that posit a strict segregation between sources and sites of attentional control on the basis of representational properties by demonstrating that simple feature values are encoded by cortical regions throughout the visual processing hierarchy, and that representations in many of these areas are modulated by attention. Influential models of visual attention posit a distinction between top-down control and bottom-up sensory processing networks. These models are motivated in part by demonstrations showing that frontoparietal cortical areas associated with top-down control represent abstract or categorical stimulus

  19. Unsupervised Feature Selection Based on the Morisita Index

    Science.gov (United States)

    Golay, Jean; Kanevski, Mikhail

    2016-04-01

    Recent breakthroughs in technology have radically improved our ability to collect and store data. As a consequence, the size of datasets has been increasing rapidly both in terms of number of variables (or features) and number of instances. Since the mechanism of many phenomena is not well known, too many variables are sampled. A lot of them are redundant and contribute to the emergence of three major challenges in data mining: (1) the complexity of result interpretation, (2) the necessity to develop new methods and tools for data processing, (3) the possible reduction in the accuracy of learning algorithms because of the curse of dimensionality. This research deals with a new algorithm for selecting the smallest subset of features conveying all the information of a dataset (i.e. an algorithm for removing redundant features). It is a new version of the Fractal Dimensionality Reduction (FDR) algorithm [1] and it relies on two ideas: (a) In general, data lie on non-linear manifolds of much lower dimension than that of the spaces where they are embedded. (b) The situation describes in (a) is partly due to redundant variables, since they do not contribute to increasing the dimension of manifolds, called Intrinsic Dimension (ID). The suggested algorithm implements these ideas by selecting only the variables influencing the data ID. Unlike the FDR algorithm, it resorts to a recently introduced ID estimator [2] based on the Morisita index of clustering and to a sequential forward search strategy. Consequently, in addition to its ability to capture non-linear dependences, it can deal with large datasets and its implementation is straightforward in any programming environment. Many real world case studies are considered. They are related to environmental pollution and renewable resources. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158

  20. Discussion on Selection and Appointment of Teaching Supervision Staff in Military Academy%略论军校教学督导员的选聘

    Institute of Scientific and Technical Information of China (English)

    杜菲菲; 曹树聪; 孟丽

    2013-01-01

      为建立健全军校教学督导员选聘机制,在分析军校教学督导员选聘工作现状的基础上,指出督导员选聘工作中存在的问题和原因,提出从选聘理念、标准、程序和方法等4个方面改进督导员选聘工作,把好督导员选聘关,促进督导队伍机构的合理化,使教学督导工作真正成为推动军校教育教学转型、提升军校育人质量的积极力量。%In order to develop and perfect the supervision staff selection and appointment mechanism,according to the anal-ysis of the actuality of supervision staff selection and appointment of military college,the main problems and the correspond-ing causes in the work is presented. As a probable solution,several measures are proposed from the points of concepts,stan-dards,procedures and methods to normalize the supervision staff selection and appointment work and to improve the institu-tion of supervision staff group,thus making the teaching supervision a propellant for the transformation in military education and teaching as well as the improvement of military colleges’ education quality.

  1. [Feature extraction for breast cancer data based on geometric algebra theory and feature selection using differential evolution].

    Science.gov (United States)

    Li, Jing; Hong, Wenxue

    2014-12-01

    The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.

  2. Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

    Energy Technology Data Exchange (ETDEWEB)

    Balabin, Roman M., E-mail: balabin@org.chem.ethz.ch [Department of Chemistry and Applied Biosciences, ETH Zurich, 8093 Zurich (Switzerland); Smirnov, Sergey V. [Unimilk Joint Stock Co., 143421 Moscow Region (Russian Federation)

    2011-04-29

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm{sup -1}) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  3. Clinical supervision.

    Science.gov (United States)

    Goorapah, D

    1997-05-01

    The introduction of clinical supervision to a wider sphere of nursing is being considered from a professional and organizational point of view. Positive views are being expressed about adopting this concept, although there are indications to suggest that there are also strong reservations. This paper examines the potential for its success amidst the scepticism that exists. One important question raised is whether clinical supervision will replace or run alongside other support systems.

  4. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection

    Science.gov (United States)

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Objectives Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Methods Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Results Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. Conclusion The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying

  5. Automatic ICD-10 multi-class classification of cause of death from plaintext autopsy reports through expert-driven feature selection.

    Science.gov (United States)

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2017-01-01

    Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models. Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death [according to internal classification of disease version 10 (ICD-10) classification system] through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system. Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines. The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings

  6. A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks

    Data.gov (United States)

    National Aeronautics and Space Administration — In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used...

  7. BUILDING ROBUST APPEARANCE MODELS USING ON-LINE FEATURE SELECTION

    Energy Technology Data Exchange (ETDEWEB)

    PORTER, REID B. [Los Alamos National Laboratory; LOVELAND, ROHAN [Los Alamos National Laboratory; ROSTEN, ED [Los Alamos National Laboratory

    2007-01-29

    In many tracking applications, adapting the target appearance model over time can improve performance. This approach is most popular in high frame rate video applications where latent variables, related to the objects appearance (e.g., orientation and pose), vary slowly from one frame to the next. In these cases the appearance model and the tracking system are tightly integrated, and latent variables are often included as part of the tracking system's dynamic model. In this paper we describe our efforts to track cars in low frame rate data (1 frame/second) acquired from a highly unstable airborne platform. Due to the low frame rate, and poor image quality, the appearance of a particular vehicle varies greatly from one frame to the next. This leads us to a different problem: how can we build the best appearance model from all instances of a vehicle we have seen so far. The best appearance model should maximize the future performance of the tracking system, and maximize the chances of reacquiring the vehicle once it leaves the field of view. We propose an online feature selection approach to this problem and investigate the performance and computational trade-offs with a real-world dataset.

  8. GAIN RATIO BASED FEATURE SELECTION METHOD FOR PRIVACY PRESERVATION

    Directory of Open Access Journals (Sweden)

    R. Praveena Priyadarsini

    2011-04-01

    Full Text Available Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional datasets like adult and census. since, both the data sets are high dimensional, feature subset selection method like Gain Ratio is applied and the attributes of the datasets are ranked and low ranking attributes are filtered to form new reduced data subsets. K-anonymization privacy preservation technique is then applied on reduced datasets. The accuracy of the privacy preserved reduced datasets and the original datasets are compared for their accuracy on the two functionalities of data mining namely classification and clustering using naïve Bayesian and k-means algorithm respectively. Experimental results show that classification and clustering accuracy are comparatively the same for reduced k-anonym zed datasets and the original data sets.

  9. Soft computing based feature selection for environmental sound classification

    NARCIS (Netherlands)

    Shakoor, A.; May, T.M.; Van Schijndel, N.H.

    2010-01-01

    Environmental sound classification has a wide range of applications,like hearing aids, mobile communication devices, portable media players, and auditory protection devices. Sound classification systemstypically extract features from the input sound. Using too many features increases complexity unne

  10. Soft computing based feature selection for environmental sound classification

    NARCIS (Netherlands)

    Shakoor, A.; May, T.M.; Van Schijndel, N.H.

    2010-01-01

    Environmental sound classification has a wide range of applications,like hearing aids, mobile communication devices, portable media players, and auditory protection devices. Sound classification systemstypically extract features from the input sound. Using too many features increases complexity unne

  11. A bidirectional feature selection method based on mutual information and redundancy-synergy coefficient

    Institute of Scientific and Technical Information of China (English)

    YANG Sheng; ZHANG Zhi; SHI Peng-fei

    2006-01-01

    Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calculate the mutual information of feature subset. The feature relevancy is defined by mutual information. Redundancy-synergy coefficient, a novel redundancy and synergy measure for features to describe the class feature, is defined. In terms of information maximization rule, a bidirectional heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient is presented. This study' s experiments show the good performance of the new method.

  12. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    Directory of Open Access Journals (Sweden)

    Yu-Xiang Zhao

    2016-06-01

    Full Text Available In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters.

  13. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition

    Science.gov (United States)

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-01-01

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346

  14. Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition.

    Science.gov (United States)

    Zhao, Yu-Xiang; Chou, Chien-Hsing

    2016-06-14

    In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters.

  15. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    Science.gov (United States)

    Wang, Xiaojia; Mao, Qirong; Zhan, Yongzhao

    2008-11-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  16. Whither Supervision?

    Directory of Open Access Journals (Sweden)

    Duncan Waite

    2006-11-01

    Full Text Available This paper inquires if the school supervision is in decadence. Dr. Waite responds that the answer will depend on which perspective you look at it. Dr. Waite suggests taking in consideration three elements that are related: the field itself, the expert in the field (the professor, the theorist, the student and the administrator, and the context. When these three elements are revised, it emphasizes that there is not a consensus about the field of supervision, but there are coincidences related to its importance and that it is related to the improvement of the practice of the students in the school for their benefit. Dr. Waite suggests that the practice on this field is not always in harmony with what the theorists affirm. When referring to the supervisor or the skilled person, the author indicates that his or her perspective depends on his or her epistemological believes or in the way he or she conceives the learning; that is why supervision can be understood in different ways. About the context, Waite suggests that there have to be taken in consideration the social or external forces that influent the people and the society, because through them the education is affected. Dr. Waite concludes that the way to understand the supervision depends on the performer’s perspective. He responds to the initial question saying that the supervision authorities, the knowledge on this field, the performers, and its practice, are maybe spread but not extinct because the supervision will always be part of the great enterprise that we called education.

  17. Feature selection method based on multi-fractal dimension and harmony search algorithm and its application

    Science.gov (United States)

    Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na

    2016-10-01

    Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.

  18. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2017-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  19. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2016-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... model in the feature selection process. PCA is often used in machine learning litterature and can be considered the default feature selection method. RDESF outperformed PCA in both experiments in both prediction error and computational speed. RDESF is a new step into filter-based automatic feature...

  20. A Rank Aggregation Algorithm for Ensemble of Multiple Feature Selection Techniques in Credit Risk Evaluation

    Directory of Open Access Journals (Sweden)

    Shashi Dahiya

    2016-10-01

    Full Text Available In credit risk evaluation the accuracy of a classifier is very significant for classifying the high-risk loan applicants correctly. Feature selection is one way of improving the accuracy of a classifier. It provides the classifier with important and relevant features for model development. This study uses the ensemble of multiple feature ranking techniques for feature selection of credit data. It uses five individual rank based feature selection methods. It proposes a novel rank aggregation algorithm for combining the ranks of the individual feature selection methods of the ensemble. This algorithm uses the rank order along with the rank score of the features in the ranked list of each feature selection method for rank aggregation. The ensemble of multiple feature selection techniques uses the novel rank aggregation algorithm and selects the relevant features using the 80%, 60%, 40% and 20% thresholds from the top of the aggregated ranked list for building the C4.5, MLP, C4.5 based Bagging and MLP based Bagging models. It was observed that the performance of models using the ensemble of multiple feature selection techniques is better than the performance of 5 individual rank based feature selection methods. The average performance of all the models was observed as best for the ensemble of feature selection techniques at 60% threshold. Also, the bagging based models outperformed the individual models most significantly for the 60% threshold. This increase in performance is more significant from the fact that the number of features were reduced by 40% for building the highest performing models. This reduces the data dimensions and hence the overall data size phenomenally for model building. The use of the ensemble of feature selection techniques using the novel aggregation algorithm provided more accurate models which are simpler, faster and easy to interpret.

  1. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  2. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.

    Science.gov (United States)

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad

    2014-11-01

    Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.

  3. Training, supervision and quality of care in selected integrated community case management (iCCM) programmes: A scoping review of programmatic evidence

    Science.gov (United States)

    Bosch–Capblanch, Xavier; Marceau, Claudine

    2014-01-01

    Aim To describe the training, supervision and quality of care components of integrated Community Case Management (iCCM) programmes and to draw lessons learned from existing evaluations of those programmes. Methods Scoping review of reports from 29 selected iCCM programmes purposively provided by stakeholders containing any information relevant to understand quality of care issues. Results The number of people reached by iCCM programmes varied from the tens of thousands to more than a million. All programmes aimed at improving access of vulnerable populations to health care, focusing on the main childhood illnesses, managed by Community Health Workers (CHW), often selected bycommunities. Training and supervision were widely implemented, in different ways and intensities, and often complemented with tools (eg, guides, job aids), supplies, equipment and incentives. Quality of care was measured using many outcomes (eg, access or appropriate treatment). Overall, there seemed to be positive effects for those strategies that involved policy change, organisational change, standardisation of clinical practices and alignment with other programmes. Positive effects were mostly achieved in large multi–component programmes. Mild or no effects have been described on mortality reduction amongst the few programmes for which data on this outcome was available to us. Promising strategies included teaming–up of CHW, micro–franchising or social franchising. On–site training and supervision of CHW have been shown to improve clinical practices. Effects on caregivers seemed positive, with increases in knowledge, care seeking behaviour, or caregivers’ basic disease management. Evidence on iCCM is often of low quality, cannot relate specific interventions or the ways they are implemented with outcomes and lacks standardisation; this limits the capacity to identify promising strategies to improve quality of care. Conclusion Large, multi–faceted, iCCM programmes, with strong

  4. A Meta-Heuristic Regression-Based Feature Selection for Predictive Analytics

    Directory of Open Access Journals (Sweden)

    Bharat Singh

    2014-11-01

    Full Text Available A high-dimensional feature selection having a very large number of features with an optimal feature subset is an NP-complete problem. Because conventional optimization techniques are unable to tackle large-scale feature selection problems, meta-heuristic algorithms are widely used. In this paper, we propose a particle swarm optimization technique while utilizing regression techniques for feature selection. We then use the selected features to classify the data. Classification accuracy is used as a criterion to evaluate classifier performance, and classification is accomplished through the use of k-nearest neighbour (KNN and Bayesian techniques. Various high dimensional data sets are used to evaluate the usefulness of the proposed approach. Results show that our approach gives better results when compared with other conventional feature selection algorithms.

  5. Feature selection and validated predictive performance in the domain of Legionella pneumophila: A comparative study

    NARCIS (Netherlands)

    T. van der Ploeg (Tjeerd); E.W. Steyerberg (Ewout)

    2016-01-01

    textabstractBackground: Genetic comparisons of clinical and environmental Legionella strains form an essential part of outbreak investigations. DNA microarrays often comprise many DNA markers (features). Feature selection and the development of prediction models are particularly challenging in this

  6. Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

    Science.gov (United States)

    Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor

    2010-08-01

    Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods.

  7. UNLABELED SELECTED SAMPLES IN FEATURE EXTRACTION FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES WITH LIMITED TRAINING SAMPLES

    Directory of Open Access Journals (Sweden)

    A. Kianisarkaleh

    2015-12-01

    Full Text Available Feature extraction plays a key role in hyperspectral images classification. Using unlabeled samples, often unlimitedly available, unsupervised and semisupervised feature extraction methods show better performance when limited number of training samples exists. This paper illustrates the importance of selecting appropriate unlabeled samples that used in feature extraction methods. Also proposes a new method for unlabeled samples selection using spectral and spatial information. The proposed method has four parts including: PCA, prior classification, posterior classification and sample selection. As hyperspectral image passes these parts, selected unlabeled samples can be used in arbitrary feature extraction methods. The effectiveness of the proposed unlabeled selected samples in unsupervised and semisupervised feature extraction is demonstrated using two real hyperspectral datasets. Results show that through selecting appropriate unlabeled samples, the proposed method can improve the performance of feature extraction methods and increase classification accuracy.

  8. Electrophysiological correlates of early attentional feature selection and distractor filtering

    NARCIS (Netherlands)

    Akyürek, Elkan G.; Schubö, Anna

    Using electrophysiology, the attentional functions of target selection and distractor filtering were investigated during visual search. Observers searched for multiple tilted line segments amidst vertical distractors. In different conditions, observers were either looking for a specific line

  9. Feature and Model Selection in Feedforward Neural Networks

    Science.gov (United States)

    1994-06-01

    smaller than those experienced with the derivative-based saliencies. However, a minimal number of nodes were used to analyze the FLUIR problem, these...A4m. 101 Table 15. FLUIR Problem: Saliency Metric Loadings after Varimax Rotation Features Saliency Metrics 1 2 3 4 5 6 7181 1.__ _ 1_1 1 2 1 1 1 1 1

  10. Linear feature selection in texture analysis - A PLS based method

    DEFF Research Database (Denmark)

    Marques, Joselene; Igel, Christian; Lillholm, Martin

    2013-01-01

    We present a texture analysis methodology that combined uncommitted machine-learning techniques and partial least square (PLS) in a fully automatic framework. Our approach introduces a robust PLS-based dimensionality reduction (DR) step to specifically address outliers and high-dimensional featur...

  11. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    Science.gov (United States)

    Aalaei, Shokoufeh; Shahraki, Hadi; Rowhanimanesh, Alireza; Eslami, Saeid

    2016-01-01

    Objective(s): This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we employed three different classifiers artificial neural network (ANN) and PS-classifier and genetic algorithm based classifier (GA-classifier) on Wisconsin breast cancer datasets include Wisconsin breast cancer dataset (WBC), Wisconsin diagnosis breast cancer (WDBC), and Wisconsin prognosis breast cancer (WPBC). Results: For WBC dataset, it is observed that feature selection improved the accuracy of all classifiers expect of ANN and the best accuracy with feature selection achieved by PS-classifier. For WDBC and WPBC, results show feature selection improved accuracy of all three classifiers and the best accuracy with feature selection achieved by ANN. Also specificity and sensitivity improved after feature selection. Conclusion: The results show that feature selection can improve accuracy, specificity and sensitivity of classifiers. Result of this study is comparable with the other studies on Wisconsin breast cancer datasets. PMID:27403253

  12. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    Directory of Open Access Journals (Sweden)

    Shokoufeh Aalaei

    2016-05-01

    Full Text Available Objective(s: This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. Materials and Methods: To evaluate effectiveness of proposed feature selection method, we employed three different classifiers artificial neural network (ANN and PS-classifier and genetic algorithm based classifier (GA-classifier on Wisconsin breast cancer datasets include Wisconsin breast cancer dataset (WBC, Wisconsin diagnosis breast cancer (WDBC, and Wisconsin prognosis breast cancer (WPBC. Results: For WBC dataset, it is observed that feature selection improved the accuracy of all classifiers expect of ANN and the best accuracy with feature selection achieved by PS-classifier. For WDBC and WPBC, results show feature selection improved accuracy of all three classifiers and the best accuracy with feature selection achieved by ANN. Also specificity and sensitivity improved after feature selection. Conclusion: The results show that feature selection can improve accuracy, specificity and sensitivity of classifiers. Result of this study is comparable with the other studies on Wisconsin breast cancer datasets.

  13. Feature selection based on mutual information and redundancy-synergy coefficient

    Institute of Scientific and Technical Information of China (English)

    杨胜; 顾钧

    2004-01-01

    Mutual information is an important information measure for feature subset. In this paper, a hashing mechanism is proposed to calculate the mutual information on the feature subset. Redundancy-synergy coefficient, a novel redundancy and synergy measure of features to express the class feature, is defined by mutual information. The information maximization rule was applied to derive the heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient. Our experiment results showed the good performance of the new feature selection method.

  14. Our Selections and Decisions: Inherent Features of the Nervous System?

    Science.gov (United States)

    Rösler, Frank

    The chapter summarizes findings on the neuronal bases of decisionmaking. Taking the phenomenon of selection it will be explained that systems built only from excitatory and inhibitory neuron (populations) have the emergent property of selecting between different alternatives. These considerations suggest that there exists a hierarchical architecture with central selection switches. However, in such a system, functions of selection and decision-making are not localized, but rather emerge from an interaction of several participating networks. These are, on the one hand, networks that process specific input and output representations and, on the other hand, networks that regulate the relative activation/inhibition of the specific input and output networks. These ideas are supported by recent empirical evidence. Moreover, other studies show that rather complex psychological variables, like subjective probability estimates, expected gains and losses, prediction errors, etc., do have biological correlates, i.e., they can be localized in time and space as activation states of neural networks and single cells. These findings suggest that selections and decisions are consequences of an architecture which, seen from a biological perspective, is fully deterministic. However, a transposition of such nomothetic functional principles into the idiographic domain, i.e., using them as elements for comprehensive 'mechanistic' explanations of individual decisions, seems not to be possible because of principle limitations. Therefore, individual decisions will remain predictable by means of probabilistic models alone.

  15. Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

    Science.gov (United States)

    Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu

    Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

  16. Multi-Objective Feature Subset Selection using Non-dominated Sorting Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    A. Khan

    2015-02-01

    Full Text Available This paper presents an evolutionary algorithm based technique to solve multi-objective feature subset selection problem. The data used for classification contains large number of features called attributes. Some of these attributes are not relevant and needs to be eliminated. In classification procedure, each feature has an effect on the accuracy, cost and learning time of the classifier. So, there is a strong requirement to select a subset of the features before building the classifier. This proposed technique treats feature subset selection as multi-objective optimization problem. This research uses one of the latest multi-objective genetic algorithms (NSGA - II. The fitness value of a particular feature subset is measured by using ID3. The testing accuracy acquired is then assigned to the fitness value. This technique is tested on several datasets taken from the UCI machine repository. The experiments demonstrate the feasibility of using NSGA-II for feature subset selection.

  17. Neighbourhood search feature selection method for content-based mammogram retrieval.

    Science.gov (United States)

    Chandy, D Abraham; Christinal, A Hepzibah; Theodore, Alwyn John; Selvan, S Easter

    2017-03-01

    Content-based image retrieval plays an increasing role in the clinical process for supporting diagnosis. This paper proposes a neighbourhood search method to select the near-optimal feature subsets for the retrieval of mammograms from the Mammographic Image Analysis Society (MIAS) database. The features based on grey level cooccurrence matrix, Daubechies-4 wavelet, Gabor, Cohen-Daubechies-Feauveau 9/7 wavelet and Zernike moments are extracted from mammograms available in the MIAS database to form the combined or fused feature set for testing various feature selection methods. The performance of feature selection methods is evaluated using precision, storage requirement and retrieval time measures. Using the proposed method, a significant improvement is achieved in mean precision rate and feature dimension. The results show that the proposed method outperforms the state-of-the-art feature selection methods.

  18. A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

    Directory of Open Access Journals (Sweden)

    Yong Wang

    2016-02-01

    Full Text Available Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

  19. A feature selection method based on multiple kernel learning with expression profiles of different types.

    Science.gov (United States)

    Du, Wei; Cao, Zhongbo; Song, Tianci; Li, Ying; Liang, Yanchun

    2017-01-01

    With the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number of samples and hundreds or thousands features, how to extract informative features from expression data effectively and robustly using feature selection technique is challenging and crucial. So far, a mass of many feature selection approaches have been proposed and applied to analyse expression data of different types. However, most of these methods only are limited to measure the performances on one single type of expression data by accuracy or error rate of classification. In this article, we propose a hybrid feature selection method based on Multiple Kernel Learning (MKL) and evaluate the performance on expression datasets of different types. Firstly, the relevance between features and classifying samples is measured by using the optimizing function of MKL. In this step, an iterative gradient descent process is used to perform the optimization both on the parameters of Support Vector Machine (SVM) and kernel confidence. Then, a set of relevant features is selected by sorting the optimizing function of each feature. Furthermore, we apply an embedded scheme of forward selection to detect the compact feature subsets from the relevant feature set. We not only compare the classification accuracy with other methods, but also compare the stability, similarity and consistency of different algorithms. The proposed method has a satisfactory capability of feature selection for analysing expression datasets of different types using different performance measurements.

  20. Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Thanh-Tung Nguyen

    2015-01-01

    Full Text Available Random forests (RFs have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  1. Unbiased feature selection in learning random forests for high-dimensional data.

    Science.gov (United States)

    Nguyen, Thanh-Tung; Huang, Joshua Zhexue; Nguyen, Thuy Thi

    2015-01-01

    Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features using p-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.

  2. Comparison of Feature Selection Techniques in Machine Learning for Anatomical Brain MRI in Dementia.

    Science.gov (United States)

    Tohka, Jussi; Moradi, Elaheh; Huttunen, Heikki

    2016-07-01

    We present a comparative split-half resampling analysis of various data driven feature selection and classification methods for the whole brain voxel-based classification analysis of anatomical magnetic resonance images. We compared support vector machines (SVMs), with or without filter based feature selection, several embedded feature selection methods and stability selection. While comparisons of the accuracy of various classification methods have been reported previously, the variability of the out-of-training sample classification accuracy and the set of selected features due to independent training and test sets have not been previously addressed in a brain imaging context. We studied two classification problems: 1) Alzheimer's disease (AD) vs. normal control (NC) and 2) mild cognitive impairment (MCI) vs. NC classification. In AD vs. NC classification, the variability in the test accuracy due to the subject sample did not vary between different methods and exceeded the variability due to different classifiers. In MCI vs. NC classification, particularly with a large training set, embedded feature selection methods outperformed SVM-based ones with the difference in the test accuracy exceeding the test accuracy variability due to the subject sample. The filter and embedded methods produced divergent feature patterns for MCI vs. NC classification that suggests the utility of the embedded feature selection for this problem when linked with the good generalization performance. The stability of the feature sets was strongly correlated with the number of features selected, weakly correlated with the stability of classification accuracy, and uncorrelated with the average classification accuracy.

  3. Feature Selection Based on the SVM Weight Vector for Classification of Dementia.

    Science.gov (United States)

    Bron, Esther E; Smits, Marion; Niessen, Wiro J; Klein, Stefan

    2015-09-01

    Computer-aided diagnosis of dementia using a support vector machine (SVM) can be improved with feature selection. The relevance of individual features can be quantified from the SVM weights as a significance map (p-map). Although these p-maps previously showed clusters of relevant voxels in dementia-related brain regions, they have not yet been used for feature selection. Therefore, we introduce two novel feature selection methods based on p-maps using a direct approach (filter) and an iterative approach (wrapper). To evaluate these p-map feature selection methods, we compared them with methods based on the SVM weight vector directly, t-statistics, and expert knowledge. We used MRI data from the Alzheimer's disease neuroimaging initiative classifying Alzheimer's disease (AD) patients, mild cognitive impairment (MCI) patients who converted to AD (MCIc), MCI patients who did not convert to AD (MCInc), and cognitively normal controls (CN). Features for each voxel were derived from gray matter morphometry. Feature selection based on the SVM weights gave better results than t-statistics and expert knowledge. The p-map methods performed slightly better than those using the weight vector. The wrapper method scored better than the filter method. Recursive feature elimination based on the p-map improved most for AD-CN: the area under the receiver-operating-characteristic curve (AUC) significantly increased from 90.3% without feature selection to 92.0% when selecting 1.5%-3% of the features. This feature selection method also improved the other classifications: AD-MCI 0.1% improvement in AUC (not significant), MCI-CN 0.7%, and MCIc-MCInc 0.1% (not significant). Although the performance improvement due to feature selection was limited, the methods based on the p-map generally had the best performance, and were therefore better in estimating the relevance of individual features.

  4. An Approach for Optimal Feature Subset Selection using a New Term Weighting Scheme and Mutual Information

    Directory of Open Access Journals (Sweden)

    Shine N Das

    2011-01-01

    Full Text Available With the development of the web, large numbers of documents are available on the Internet and they are growing drastically day by day. Hence automatic text categorization becomes more and more important for dealing with massive data. However the major problem of document categorization is the high dimensionality of feature space.  The measures to decrease the feature dimension under not decreasing recognition effect are called the problems of feature optimum extraction or selection. Dealing with reduced relevant feature set can be more efficient and effective. The objective of feature selection is to find a subset of features that have all characteristics of the full features set. Instead Dependency among features is also important for classification. During past years, various metrics have been proposed to measure the dependency among different features. A popular approach to realize dependency is maximal relevance feature selection: selecting the features with the highest relevance to the target class. A new feature weighting scheme, we proposed have got a tremendous improvements in dimensionality reduction of the feature space. The experimental results clearly show that this integrated method works far better than the others.

  5. Feature Subset Selection by Estimation of Distribution Algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Cantu-Paz, E

    2002-01-17

    This paper describes the application of four evolutionary algorithms to the identification of feature subsets for classification problems. Besides a simple GA, the paper considers three estimation of distribution algorithms (EDAs): a compact GA, an extended compact GA, and the Bayesian Optimization Algorithm. The objective is to determine if the EDAs present advantages over the simple GA in terms of accuracy or speed in this problem. The experiments used a Naive Bayes classifier and public-domain and artificial data sets. In contrast with previous studies, we did not find evidence to support or reject the use of EDAs for this problem.

  6. Selecting Testlet Features With Predictive Value for the Testlet Effect

    Directory of Open Access Journals (Sweden)

    Muirne C. S. Paap

    2015-04-01

    Full Text Available High-stakes tests often consist of sets of questions (i.e., items grouped around a common stimulus. Such groupings of items are often called testlets. A basic assumption of item response theory (IRT, the mathematical model commonly used in the analysis of test data, is that individual items are independent of one another. The potential dependency among items within a testlet is often ignored in practice. In this study, a technique called tree-based regression (TBR was applied to identify key features of stimuli that could properly predict the dependence structure of testlet data for the Analytical Reasoning section of a high-stakes test. Relevant features identified included Percentage of “If” Clauses, Number of Entities, Theme/Topic, and Predicate Propositional Density; the testlet effect was smallest for stimuli that contained 31% or fewer “if” clauses, contained 9.8% or fewer verbs, and had Media or Animals as the main theme. This study illustrates the merits of TBR in the analysis of test data.

  7. Machine Learning Feature Selection for Tuning Memory Page Swapping

    Science.gov (United States)

    2013-09-01

    erroneous and generally results in useful pages being paged out too early, only to be paged back in shortly there after. [1] The first in/first out ( FIFO ...the tail of the queue are selected. This algorithm has been shown to have significant shortcomings. When using a FIFO PRA, it is possible to encounter a...page which was just paged out. FIFO is therefore, a sub-optimal page replacement algorithm. Least recently used (LRU) is incredibly simple in concept

  8. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

    Directory of Open Access Journals (Sweden)

    Daniel Peralta

    2015-01-01

    Full Text Available Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.

  9. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

    Science.gov (United States)

    Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

    2015-01-01

    In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

  10. Feature selection for anomaly–based network intrusion detection using cluster validity indices

    CSIR Research Space (South Africa)

    Naidoo, T

    2015-09-01

    Full Text Available A feature selection algorithm that is novel in the context of anomaly–based network intrusion detection is proposed in this paper. The distinguishing factor of the proposed feature selection algorithm is its complete lack of dependency on labelled...

  11. Feature Selection in Classification of Eye Movements Using Electrooculography for Activity Recognition

    Directory of Open Access Journals (Sweden)

    S. Mala

    2014-01-01

    Full Text Available Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE, a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG. Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.

  12. Effect of feature-selective attention on neuronal responses in macaque area MT.

    Science.gov (United States)

    Chen, X; Hoffmann, K-P; Albright, T D; Thiele, A

    2012-03-01

    Attention influences visual processing in striate and extrastriate cortex, which has been extensively studied for spatial-, object-, and feature-based attention. Most studies exploring neural signatures of feature-based attention have trained animals to attend to an object identified by a certain feature and ignore objects/displays identified by a different feature. Little is known about the effects of feature-selective attention, where subjects attend to one stimulus feature domain (e.g., color) of an object while features from different domains (e.g., direction of motion) of the same object are ignored. To study this type of feature-selective attention in area MT in the middle temporal sulcus, we trained macaque monkeys to either attend to and report the direction of motion of a moving sine wave grating (a feature for which MT neurons display strong selectivity) or attend to and report its color (a feature for which MT neurons have very limited selectivity). We hypothesized that neurons would upregulate their firing rate during attend-direction conditions compared with attend-color conditions. We found that feature-selective attention significantly affected 22% of MT neurons. Contrary to our hypothesis, these neurons did not necessarily increase firing rate when animals attended to direction of motion but fell into one of two classes. In one class, attention to color increased the gain of stimulus-induced responses compared with attend-direction conditions. The other class displayed the opposite effects. Feature-selective activity modulations occurred earlier in neurons modulated by attention to color compared with neurons modulated by attention to motion direction. Thus feature-selective attention influences neuronal processing in macaque area MT but often exhibited a mismatch between the preferred stimulus dimension (direction of motion) and the preferred attention dimension (attention to color).

  13. A robust and accurate method for feature selection and prioritization from multi-class OMICs data.

    Directory of Open Access Journals (Sweden)

    Vittorio Fortino

    Full Text Available Selecting relevant features is a common task in most OMICs data analysis, where the aim is to identify a small set of key features to be used as biomarkers. To this end, two alternative but equally valid methods are mainly available, namely the univariate (filter or the multivariate (wrapper approach. The stability of the selected lists of features is an often neglected but very important requirement. If the same features are selected in multiple independent iterations, they more likely are reliable biomarkers. In this study, we developed and evaluated the performance of a novel method for feature selection and prioritization, aiming at generating robust and stable sets of features with high predictive power. The proposed method uses the fuzzy logic for a first unbiased feature selection and a Random Forest built from conditional inference trees to prioritize the candidate discriminant features. Analyzing several multi-class gene expression microarray data sets, we demonstrate that our technique provides equal or better classification performance and a greater stability as compared to other Random Forest-based feature selection methods.

  14. An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem

    Directory of Open Access Journals (Sweden)

    Jing Bian

    2016-01-01

    Full Text Available In the era of big data, feature selection is an essential process in machine learning. Although the class imbalance problem has recently attracted a great deal of attention, little effort has been undertaken to develop feature selection techniques. In addition, most applications involving feature selection focus on classification accuracy but not cost, although costs are important. To cope with imbalance problems, we developed a cost-sensitive feature selection algorithm that adds the cost-based evaluation function of a filter feature selection using a chaos genetic algorithm, referred to as CSFSG. The evaluation function considers both feature-acquiring costs (test costs and misclassification costs in the field of network security, thereby weakening the influence of many instances from the majority of classes in large-scale datasets. The CSFSG algorithm reduces the total cost of feature selection and trades off both factors. The behavior of the CSFSG algorithm is tested on a large-scale dataset of network security, using two kinds of classifiers: C4.5 and k-nearest neighbor (KNN. The results of the experimental research show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.

  15. Entropy-Based and Weighted Selective SIFT Clustering as an Energy Aware Framework for Supervised Visual Recognition of Man-Made Structures

    Directory of Open Access Journals (Sweden)

    Ayman El Mobacher

    2013-01-01

    Full Text Available Using local invariant features has been proven by published literature to be powerful for image processing and pattern recognition tasks. However, in energy aware environments, these invariant features would not scale easily because of their computational requirements. Motivated to find an efficient building recognition algorithm based on scale invariant feature transform (SIFT keypoints, we present in this paper uSee, a supervised learning framework which exploits the symmetrical and repetitive structural patterns in buildings to identify subsets of relevant clusters formed by these keypoints. Once an image is captured by a smart phone, uSee preprocesses it using variations in gradient angle- and entropy-based measures before extracting the building signature and comparing its representative SIFT keypoints against a repository of building images. Experimental results on 2 different databases confirm the effectiveness of uSee in delivering, at a greatly reduced computational cost, the high matching scores for building recognition that local descriptors can achieve. With only 14.3% of image SIFT keypoints, uSee exceeded prior literature results by achieving an accuracy of 99.1% on the Zurich Building Database with no manual rotation; thus saving significantly on the computational requirements of the task at hand.

  16. COMPUTATIONALLY INEXPENSIVE SEQUENTIAL FORWARD FLOATING SELECTION FOR ACQUIRING SIGNIFICANT FEATURES FOR AUTHORSHIP INVARIANCENESS IN WRITER IDENTIFICATION

    OpenAIRE

    Satrya Fajri Pratama; Azah Kamilah Muda; Yun-Huoy Choo; and Noor Azilah Muda

    2011-01-01

    Handwriting is individualistic. The uniqueness of shape and style of handwriting can be used to identify the significant features in authenticating the author of writing. Acquiring these significant features leads to an important research in Writer Identification domain where to find the unique features of individual which also known as Individuality of Handwriting. This paper proposes an improved Sequential Forward Floating Selection method besides the exploration of significant features for...

  17. New evolutions in TRNSYS : a selection of version 16 features

    Energy Technology Data Exchange (ETDEWEB)

    Bradley, D. [Thermal Energy System Specialists, Madison, WI (United States); Kummert, M. [Wisconsin Univ., Madison, WI (United States). Solar Energy Laboratory

    2005-07-01

    TRNSYS is a transient energy simulation package that has undergone continuous improvement since its development in 1975. TRNSYS was initially developed for the simulation of solar thermal processes, but has since expanded into a total energy modeling package. It models each component of an energy system as an individual black box component. Simulating a system involves connecting the inputs and outputs of the components to one another. If certain models are missing, they are quickly developed and added to the package by the international group of developers and users which include the Solar Energy Laboratory at the University of Wisconsin in Madison United States, the Centre Scientifique et Technique du Batiment in Nice France, and Transsolar Energietechnik GmBH in Stuttgart, Germany. This paper presented some of the issues faced by the users in updating the TRNSYS simulation tool to meet the challenges posed by new technologies and to make use of better algorithms and updated computing resources. In particular, it focused on adding new component models to the program and on increasing the ease of use of the program and continuing the trend to move TRNSYS from an academic research tool to a manageable commercial tool. The subset of the features that were added to the sixteenth version of the simulation package in November 2004 were presented. These include modeling the energy transfer between a conditioned building and the surrounding ground, implementing ASHRAE's effective Heat Flow method into TRNSYS, and implementing combined thermal/air flow simulations using a software link between TRNSYS and COMIS or CONTAM for the air flow simulation. A brief description of the hydrogen system components which model hydrogen power systems was also included along with graphical interface enhancements, and a description of simulation engine modifications such as starting time and the drop-in dynamic link libraries (DLL). 12 refs., 5 figs.

  18. Evaluation of Meta-Heuristic Algorithms for Stable Feature Selection

    Directory of Open Access Journals (Sweden)

    Maysam Toghraee

    2016-07-01

    Full Text Available Now a days, developing the science and technology and technology tools, the ability of reviewing and saving the important data has been provided. It is needed to have knowledge for searching the data to reach the necessary useful results. Data mining is searching for big data sources automatically to find patterns and dependencies which are not done by simple statistical analysis. The scope is to study the predictive role and usage domain of data mining in medical science and suggesting a frame for creating, assessing and exploiting the data mining patterns in this field. As it has been found out from previous researches that assessing methods can not be used to specify the data discrepancies, our suggestion is a new approach for assessing the data similarities to find out the relations between the variation in data and stability in selection. Therefore we have chosen meta heuristic methods to be able to choose the best and the stable algorithms among a set of algorithms

  19. Feature Selection for Bayesian Evaluation of Trauma Death Risk

    CERN Document Server

    Jakaite, L

    2008-01-01

    In the last year more than 70,000 people have been brought to the UK hospitals with serious injuries. Each time a clinician has to urgently take a patient through a screening procedure to make a reliable decision on the trauma treatment. Typically, such procedure comprises around 20 tests; however the condition of a trauma patient remains very difficult to be tested properly. What happens if these tests are ambiguously interpreted, and information about the severity of the injury will come misleading? The mistake in a decision can be fatal: using a mild treatment can put a patient at risk of dying from posttraumatic shock, while using an overtreatment can also cause death. How can we reduce the risk of the death caused by unreliable decisions? It has been shown that probabilistic reasoning, based on the Bayesian methodology of averaging over decision models, allows clinicians to evaluate the uncertainty in decision making. Based on this methodology, in this paper we aim at selecting the most important screeni...

  20. Adaptive feature selection using v-shaped binary particle swarm optimization

    Science.gov (United States)

    Dong, Hongbin; Zhou, Xiurong

    2017-01-01

    Feature selection is an important preprocessing method in machine learning and data mining. This process can be used not only to reduce the amount of data to be analyzed but also to build models with stronger interpretability based on fewer features. Traditional feature selection methods evaluate the dependency and redundancy of features separately, which leads to a lack of measurement of their combined effect. Moreover, a greedy search considers only the optimization of the current round and thus cannot be a global search. To evaluate the combined effect of different subsets in the entire feature space, an adaptive feature selection method based on V-shaped binary particle swarm optimization is proposed. In this method, the fitness function is constructed using the correlation information entropy. Feature subsets are regarded as individuals in a population, and the feature space is searched using V-shaped binary particle swarm optimization. The above procedure overcomes the hard constraint on the number of features, enables the combined evaluation of each subset as a whole, and improves the search ability of conventional binary particle swarm optimization. The proposed algorithm is an adaptive method with respect to the number of feature subsets. The experimental results show the advantages of optimizing the feature subsets using the V-shaped transfer function and confirm the effectiveness and efficiency of the feature subsets obtained under different classifiers. PMID:28358850

  1. Feature-Selective Attention Adaptively Shifts Noise Correlations in Primary Auditory Cortex.

    Science.gov (United States)

    Downer, Joshua D; Rapone, Brittany; Verhein, Jessica; O'Connor, Kevin N; Sutter, Mitchell L

    2017-05-24

    Sensory environments often contain an overwhelming amount of information, with both relevant and irrelevant information competing for neural resources. Feature attention mediates this competition by selecting the sensory features needed to form a coherent percept. How attention affects the activity of populations of neurons to support this process is poorly understood because population coding is typically studied through simulations in which one sensory feature is encoded without competition. Therefore, to study the effects of feature attention on population-based neural coding, investigations must be extended to include stimuli with both relevant and irrelevant features. We measured noise correlations (rnoise) within small neural populations in primary auditory cortex while rhesus macaques performed a novel feature-selective attention task. We found that the effect of feature-selective attention on rnoise depended not only on the population tuning to the attended feature, but also on the tuning to the distractor feature. To attempt to explain how these observed effects might support enhanced perceptual performance, we propose an extension of a simple and influential model in which shifts in rnoise can simultaneously enhance the representation of the attended feature while suppressing the distractor. These findings present a novel mechanism by which attention modulates neural populations to support sensory processing in cluttered environments.SIGNIFICANCE STATEMENT Although feature-selective attention constitutes one of the building blocks of listening in natural environments, its neural bases remain obscure. To address this, we developed a novel auditory feature-selective attention task and measured noise correlations (rnoise) in rhesus macaque A1 during task performance. Unlike previous studies showing that the effect of attention on rnoise depends on population tuning to the attended feature, we show that the effect of attention depends on the tuning to the

  2. Object learning improves feature extraction but does not improve feature selection.

    Directory of Open Access Journals (Sweden)

    Linus Holm

    Full Text Available A single glance at your crowded desk is enough to locate your favorite cup. But finding an unfamiliar object requires more effort. This superiority in recognition performance for learned objects has at least two possible sources. For familiar objects observers might: 1 select more informative image locations upon which to fixate their eyes, or 2 extract more information from a given eye fixation. To test these possibilities, we had observers localize fragmented objects embedded in dense displays of random contour fragments. Eight participants searched for objects in 600 images while their eye movements were recorded in three daily sessions. Performance improved as subjects trained with the objects: The number of fixations required to find an object decreased by 64% across the 3 sessions. An ideal observer model that included measures of fragment confusability was used to calculate the information available from a single fixation. Comparing human performance to the model suggested that across sessions information extraction at each eye fixation increased markedly, by an amount roughly equal to the extra information that would be extracted following a 100% increase in functional field of view. Selection of fixation locations, on the other hand, did not improve with practice.

  3. Analysis of Different Feature Selection Criteria Based on a Covariance Convergence Perspective for a SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando A. Auat Cheein

    2010-12-01

    Full Text Available This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment  composed by trees, although the results shown herein are not restricted to a special type of features.

  4. Analysis of different feature selection criteria based on a covariance convergence perspective for a SLAM algorithm.

    Science.gov (United States)

    Auat Cheein, Fernando A; Carelli, Ricardo

    2011-01-01

    This paper introduces several non-arbitrary feature selection techniques for a Simultaneous Localization and Mapping (SLAM) algorithm. The feature selection criteria are based on the determination of the most significant features from a SLAM convergence perspective. The SLAM algorithm implemented in this work is a sequential EKF (Extended Kalman filter) SLAM. The feature selection criteria are applied on the correction stage of the SLAM algorithm, restricting it to correct the SLAM algorithm with the most significant features. This restriction also causes a decrement in the processing time of the SLAM. Several experiments with a mobile robot are shown in this work. The experiments concern the map reconstruction and a comparison between the different proposed techniques performance. The experiments were carried out at an outdoor environment composed by trees, although the results shown herein are not restricted to a special type of features.

  5. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information.

    Science.gov (United States)

    Mortazavi, Atiyeh; Moattar, Mohammad Hossein

    2016-01-01

    High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  6. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information

    Directory of Open Access Journals (Sweden)

    Atiyeh Mortazavi

    2016-01-01

    Full Text Available High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  7. Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection

    Science.gov (United States)

    Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline

    2014-07-01

    Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.

  8. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    Science.gov (United States)

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  9. Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters.

    Science.gov (United States)

    Li, Yifeng; Chen, Chih-Yu; Wasserman, Wyeth W

    2016-05-01

    Sparse linear models approximate target variable(s) by a sparse linear combination of input variables. Since they are simple, fast, and able to select features, they are widely used in classification and regression. Essentially they are shallow feed-forward neural networks that have three limitations: (1) incompatibility to model nonlinearity of features, (2) inability to learn high-level features, and (3) unnatural extensions to select features in a multiclass case. Deep neural networks are models structured by multiple hidden layers with nonlinear activation functions. Compared with linear models, they have two distinctive strengths: the capability to (1) model complex systems with nonlinear structures and (2) learn high-level representation of features. Deep learning has been applied in many large and complex systems where deep models significantly outperform shallow ones. However, feature selection at the input level, which is very helpful to understand the nature of a complex system, is still not well studied. In genome research, the cis-regulatory elements in noncoding DNA sequences play a key role in the expression of genes. Since the activity of regulatory elements involves highly interactive factors, a deep tool is strongly needed to discover informative features. In order to address the above limitations of shallow and deep models for selecting features of a complex system, we propose a deep feature selection (DFS) model that (1) takes advantages of deep structures to model nonlinearity and (2) conveniently selects a subset of features right at the input level for multiclass data. Simulation experiments convince us that this model is able to correctly identify both linear and nonlinear features. We applied this model to the identification of active enhancers and promoters by integrating multiple sources of genomic information. Results show that our model outperforms elastic net in terms of size of discriminative feature subset and classification accuracy.

  10. Discussion on the Key Points of Teaching Supervision Construction Based on the Features of Independent Col- leges%基于独立学院特色谈教学督导建设的着力点

    Institute of Scientific and Technical Information of China (English)

    杨华萍

    2014-01-01

    The teaching quality of independent colleges cannot be improved in isolation from the role of teaching supervision, as in-dependent colleges, different from common public colleges, pos-sess many distinctive features and characteristics in education and teaching. This paper focused on the characteristics and cur-rent situation of teaching supervision construction in independent colleges, analyzed the key and different points for supervisors in their work, and proposed personal views on the construction of supervision system in independent colleges.%独立院校教学质量的提高离不开教学督导作用的发挥,独立院校不同于一般的公立学校,有着许多自身办学、教学的特色和特点。本文重点阐述了独立院校督导建设的特点及现状,分析了督导员在开展工作过程中遇到的重点和难点,并就独立院校督导制度的建设提出自己的观点与看法。

  11. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

    KAUST Repository

    Abusamra, Heba

    2013-05-01

    Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in

  12. Regression-Based Feature Selection on Large Scale Human Activity Recognition

    Directory of Open Access Journals (Sweden)

    Hussein Mazaar

    2016-02-01

    Full Text Available In this paper, we present an approach for regression-based feature selection in human activity recognition. Due to high dimensional features in human activity recognition, the model may have over-fitting and can’t learn parameters well. Moreover, the features are redundant or irrelevant. The goal is to select important discriminating features to recognize the human activities in videos. R-Squared regression criterion can identify the best features based on the ability of a feature to explain the variations in the target class. The features are significantly reduced, nearly by 99.33%, resulting in better classification accuracy. Support Vector Machine with a linear kernel is used to classify the activities. The experiments are tested on UCF50 dataset. The results show that the proposed model significantly outperforms state-of-the-art methods.

  13. Emotional textile image classification based on cross-domain convolutional sparse autoencoders with feature selection

    Science.gov (United States)

    Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin

    2017-01-01

    We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.

  14. Eigenvalue-weighting and feature selection for computer-aided polyp detection in CT colonography

    Science.gov (United States)

    Zhu, Hongbin; Wang, Su; Fan, Yi; Lu, Hongbing; Liang, Zhengrong

    2010-03-01

    With the development of computer-aided polyp detection towards virtual colonoscopy screening, the trade-off between detection sensitivity and specificity has gained increasing attention. An optimum detection, with least number of false positives and highest true positive rate, is desirable and involves interdisciplinary knowledge, such as feature extraction, feature selection as well as machine learning. Toward that goal, various geometrical and textural features, associated with each suspicious polyp candidate, have been individually extracted and stacked together as a feature vector. However, directly inputting these high-dimensional feature vectors into a learning machine, e.g., neural network, for polyp detection may introduce redundant information due to feature correlation and induce the curse of dimensionality. In this paper, we explored an indispensable building block of computer-aided polyp detection, i.e., principal component analysis (PCA)-weighted feature selection for neural network classifier of true and false positives. The major concepts proposed in this paper include (1) the use of PCA to reduce the feature correlation, (2) the scheme of adaptively weighting each principal component (PC) by the associated eigenvalue, and (3) the selection of feature combinations via the genetic algorithm. As such, the eigenvalue is also taken as part of the characterizing feature, and the necessary number of features can be exposed to mitigate the curse of dimensionality. Learned and tested by radial basis neural network, the proposed computer-aided polyp detection has achieved 95% sensitivity at a cost of average 2.99 false positives per polyp.

  15. Artificial immune system based on adaptive clonal selection for feature selection and parameters optimisation of support vector machines

    Science.gov (United States)

    Sadat Hashemipour, Maryam; Soleimani, Seyed Ali

    2016-01-01

    Artificial immune system (AIS) algorithm based on clonal selection method can be defined as a soft computing method inspired by theoretical immune system in order to solve science and engineering problems. Support vector machine (SVM) is a popular pattern classification method with many diverse applications. Kernel parameter setting in the SVM training procedure along with the feature selection significantly impacts on the classification accuracy rate. In this study, AIS based on Adaptive Clonal Selection (AISACS) algorithm has been used to optimise the SVM parameters and feature subset selection without degrading the SVM classification accuracy. Several public datasets of University of California Irvine machine learning (UCI) repository are employed to calculate the classification accuracy rate in order to evaluate the AISACS approach then it was compared with grid search algorithm and Genetic Algorithm (GA) approach. The experimental results show that the feature reduction rate and running time of the AISACS approach are better than the GA approach.

  16. Selection of individual features of a speech signal using genetic algorithms

    Directory of Open Access Journals (Sweden)

    Kamil Kamiński

    2016-03-01

    Full Text Available The paper presents an automatic speaker’s recognition system, implemented in the Matlab environment, and demonstrates how to achieve and optimize various elements of the system. The main emphasis was put on features selection of a speech signal using a genetic algorithm which takes into account synergy of features. The results of optimization of selected elements of a classifier have been also shown, including the number of Gaussian distributions used to model each of the voices. In addition, for creating voice models, a universal voice model has been used.[b]Keywords[/b]: biometrics, automatic speaker recognition, genetic algorithms, feature selection

  17. On Training Targets for Supervised Speech Separation

    OpenAIRE

    Wang, Yuxuan; Narayanan, Arun; Wang, DeLiang

    2014-01-01

    Formulation of speech separation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised learning algorithm, typically a deep neural network, is trained to learn a mapping from noisy features to a time-frequency representation of the target of interest. Traditionally, the ideal binary mask (IBM) is used as the target because of its simplicity and large speech intelligibility gains. The supervised learning framework, however, is not restricted to the...

  18. AlPOs Synthetic Factor Analysis Based on Maximum Weight and Minimum Redundancy Feature Selection

    Directory of Open Access Journals (Sweden)

    Yinghua Lv

    2013-11-01

    Full Text Available The relationship between synthetic factors and the resulting structures is critical for rational synthesis of zeolites and related microporous materials. In this paper, we develop a new feature selection method for synthetic factor analysis of (6,12-ring-containing microporous aluminophosphates (AlPOs. The proposed method is based on a maximum weight and minimum redundancy criterion. With the proposed method, we can select the feature subset in which the features are most relevant to the synthetic structure while the redundancy among these selected features is minimal. Based on the database of AlPO synthesis, we use (6,12-ring-containing AlPOs as the target class and incorporate 21 synthetic factors including gel composition, solvent and organic template to predict the formation of (6,12-ring-containing microporous aluminophosphates (AlPOs. From these 21 features, 12 selected features are deemed as the optimized features to distinguish (6,12-ring-containing AlPOs from other AlPOs without such rings. The prediction model achieves a classification accuracy rate of 91.12% using the optimal feature subset. Comprehensive experiments demonstrate the effectiveness of the proposed algorithm, and deep analysis is given for the synthetic factors selected by the proposed method.

  19. Using genetic algorithms to select and create features for pattern classification. Technical report

    Energy Technology Data Exchange (ETDEWEB)

    Chang, E.I.; Lippmann, R.P.

    1991-03-11

    Genetic algorithms were used to select and create features and to select reference exemplar patterns for machine vision and speech pattern classification tasks. On a 15-feature machine-vision inspection task, it was found that genetic algorithms performed no better than conventional approaches to feature selection but required much more computation. For a speech recognition task, genetic algorithms required no more computation time than traditional approaches but reduced the number of features required by a factor of five (from 153 to 33 features). On a difficult artificial machine-vision task, genetic algorithms were able to create new features (polynomial functions of the original features) that reduced classification error rates from 10 to almost 0 percent. Neural net and nearest-neighbor classifiers were unable to provide such low error rates using only the original features. Genetic algorithms were also used to reduce the number of reference exemplar patterns and to select the value of k for a k-nearest-neighbor classifier. On a .338 training pattern vowel recognition problem with 10 classes, genetic algorithms simultaneously reduced the number of stored exemplars from 338 to 63 and selected k without significantly decreasing classification accuracy. In all applications, genetic algorithms were easy to apply and found good solutions in many fewer trials than would be required by an exhaustive search. Run times were long but not unreasonable. These results suggest that genetic algorithms may soon be practical for pattern classification problems as faster serial and parallel computers are developed.

  20. Feature selection for appearance-based vehicle tracking in geospatial video

    Science.gov (United States)

    Poostchi, Mahdieh; Bunyak, Filiz; Palaniappan, Kannappan; Seetharaman, Guna

    2013-05-01

    Current video tracking systems often employ a rich set of intensity, edge, texture, shape and object level features combined with descriptors for appearance modeling. This approach increases tracker robustness but is compu- tationally expensive for realtime applications and localization accuracy can be adversely affected by including distracting features in the feature fusion or object classification processes. This paper explores offline feature subset selection using a filter-based evaluation approach for video tracking to reduce the dimensionality of the feature space and to discover relevant representative lower dimensional subspaces for online tracking. We com- pare the performance of the exhaustive FOCUS algorithm to the sequential heuristic SFFS, SFS and RELIEF feature selection methods. Experiments show that using offline feature selection reduces computational complex- ity, improves feature fusion and is expected to translate into better online tracking performance. Overall SFFS and SFS perform very well, close to the optimum determined by FOCUS, but RELIEF does not work as well for feature selection in the context of appearance-based object tracking.

  1. Selection of LiDAR geometric features with adaptive neighborhood size for urban land cover classification

    Science.gov (United States)

    Dong, Weihua; Lan, Jianhang; Liang, Shunlin; Yao, Wei; Zhan, Zhicheng

    2017-08-01

    LiDAR has been an effective technology for acquiring urban land cover data in recent decades. Previous studies indicate that geometric features have a strong impact on land cover classification. Here, we analyzed an urban LiDAR dataset to explore the optimal feature subset from 25 geometric features incorporating 25 scales under 6 definitions for urban land cover classification. We performed a feature selection strategy to remove irrelevant or redundant features based on the correlation coefficient between features and classification accuracy of each features. The neighborhood scales were divided into small (0.5-1.5 m), medium (1.5-6 m) and large (>6 m) scale. Combining features with lower correlation coefficient and better classification performance would improve classification accuracy. The feature depicting homogeneity or heterogeneity of points would be calculated at a small scale, and the features to smooth points at a medium scale and the features of height different at large scale. As to the neighborhood definition, cuboid and cylinder were recommended. This study can guide the selection of optimal geometric features with adaptive neighborhood scale for urban land cover classification.

  2. Diagnosis of Hepatocellular Carcinoma Spectroscopy Based on the Feature Selection Approach of the Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Shao-qing Wang

    2013-06-01

    Full Text Available This paper aims to study the application of medical imaging technology with artificial intelligence technology on how to improve the diagnostic accuracy rate for hepatocellular carcinoma. The   recognition method based on genetic algorithm (GA and Neural Network are presented. GA was used to select 20 optimal features from the 401 initial features. BP (Back-propagation Neural Network, BP and PNN (Probabilistic Neural Network, PNN were used to classify tested samples based on these optimized features, and make comparison between results based on 20 optimal features and the all 401 features. The results of the experiment show that the method can improve the recognition rate.

  3. Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images.

    Science.gov (United States)

    Zhang, Lefei; Zhang, Qian; Du, Bo; Huang, Xin; Tang, Yuan Yan; Tao, Dacheng

    2016-09-12

    In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature, and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation has not efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient.

  4. Fuzzy - Rough Feature Selection With {\\Pi}- Membership Function For Mammogram Classification

    CERN Document Server

    Thangavel, K

    2012-01-01

    Breast cancer is the second leading cause for death among women and it is diagnosed with the help of mammograms. Oncologists are miserably failed in identifying the micro calcification at the early stage with the help of the mammogram visually. In order to improve the performance of the breast cancer screening, most of the researchers have proposed Computer Aided Diagnosis using image processing. In this study mammograms are preprocessed and features are extracted, then the abnormality is identified through the classification. If all the extracted features are used, most of the cases are misidentified. Hence feature selection procedure is sought. In this paper, Fuzzy-Rough feature selection with {\\pi} membership function is proposed. The selected features are used to classify the abnormalities with help of Ant-Miner and Weka tools. The experimental analysis shows that the proposed method improves the mammograms classification accuracy.

  5. Feature Selection Strategy for Classification of Single-Trial EEG Elicited by Motor Imagery

    DEFF Research Database (Denmark)

    Prasad, Swati; Tan, Zheng-Hua; Prasad, Ramjee

    2011-01-01

    Brain-Computer Interface (BCI) provides new means of communication for people with motor disabilities by utilizing electroencephalographic activity. Selection of features from Electroencephalogram (EEG) signals for classification plays a key part in the development of BCI systems. In this paper, we...... present a feature selection strategy consisting of channel selection by fisher ratio analysis in the frequency domain and time segment selection by visual inspection in time domain. The proposed strategy achieves an absolute improvement of 7.5% in the misclassification rate as compared with the baseline...

  6. Computing visual target distinctness through selective filtering, statistical features, and visual patterns

    NARCIS (Netherlands)

    Fdez-Vidal, X.R.; Toet, A.; Garcia, J.A.; Fdez-Valdivia, J.

    2000-01-01

    This paper presents three computational visual distinctness measures, computed from image representational models based on selective filtering, statistical features, and visual patterns, respectively. They are applied to quantify the visual distinctness of targets in complex natural scenes. The

  7. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.

    Science.gov (United States)

    Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina

    2016-02-06

    The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew's Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions.

  8. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data

    Directory of Open Access Journals (Sweden)

    Runtao Yang

    2016-02-01

    Full Text Available The Golgi Apparatus (GA is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP, a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew’s Correlation Coefficient (MCC of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions.

  9. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

    KAUST Repository

    Abusamra, Heba

    2013-11-01

    Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

  10. A New Feature Selection Algorithm Based on the Mean Impact Variance

    Directory of Open Access Journals (Sweden)

    Weidong Cheng

    2014-01-01

    Full Text Available The selection of fewer or more representative features from multidimensional features is important when the artificial neural network (ANN algorithm is used as a classifier. In this paper, a new feature selection method called the mean impact variance (MIVAR method is proposed to determine the feature that is more suitable for classification. Moreover, this method is constructed on the basis of the training process of the ANN algorithm. To verify the effectiveness of the proposed method, the MIVAR value is used to rank the multidimensional features of the bearing fault diagnosis. In detail, (1 70-dimensional all waveform features are extracted from a rolling bearing vibration signal with four different operating states, (2 the corresponding MIVAR values of all 70-dimensional features are calculated to rank all features, (3 14 groups of 10-dimensional features are separately generated according to the ranking results and the principal component analysis (PCA algorithm and a back propagation (BP network is constructed, and (4 the validity of the ranking result is proven by training this BP network with these seven groups of 10-dimensional features and by comparing the corresponding recognition rates. The results prove that the features with larger MIVAR value can lead to higher recognition rates.

  11. Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring

    Directory of Open Access Journals (Sweden)

    Raghavendra B. K

    2010-11-01

    Full Text Available A credit-risk evaluation decision involves processing huge volumes of raw data, and hence requires powerful data mining tools. Several techniques that were developed in machine learning have been used for financial credit-risk evaluation decisions. Data mining is the process of finding patterns and relations in large databases. Neural Networks are one of the popular tools for building predictive models in data mining. The major drawback of neural network is the curse of dimensionality which requires optimal feature subset. Feature selection is an important topic of research in data mining. Feature selection is the problem of choosing a small subset of features that optimally is necessary and sufficient to describe the target concept. In this research an attempt has been made to investigate the preprocessing framework for feature selection in credit scoring using neural network. Feature selection techniques like best first search, info gain etc. methods have been evaluated for the effectiveness of the classification of the risk groups on publicly available data sets. In particular, German, Australian, and Japanese credit rating data sets have been used for evaluation. The results have been conclusive about the effectiveness of feature selection for neural networks and validate the hypothesis of the research.

  12. A FEATURE SELECTION ALGORITHM DESIGN AND ITS IMPLEMENTATION IN INTRUSION DETECTION SYSTEM

    Institute of Scientific and Technical Information of China (English)

    杨向荣; 沈钧毅

    2003-01-01

    Objective Present a new features selection algorithm. Methods based on rule induction and field knowledge. Results This algorithm can be applied in catching dataflow when detecting network intrusions, only the sub-dataset including discriminating features is catched. Then the time spend in following behavior patterns mining is reduced and the patterns mined are more precise. Conclusion The experiment results show that the feature subset catched by this algorithm is more informative and the dataset's quantity is reduced significantly.

  13. Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction

    Directory of Open Access Journals (Sweden)

    Hui Liu

    2015-01-01

    Full Text Available It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that p2, p1, p1′, and p2′ are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Decision fusion of subsets that getting after feature selection obtains excellent prediction performance, which proves feature selection combined with decision fusion is an effective and useful method for the task of HIV-1 protease cleavage site prediction. The results and analysis in this paper can provide useful instruction and help designing HIV-1 protease inhibitor in the future.

  14. A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease

    Science.gov (United States)

    Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas

    2017-08-01

    The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.

  15. Multi-Stage Recognition of Speech Emotion Using Sequential Forward Feature Selection

    Directory of Open Access Journals (Sweden)

    Liogienė Tatjana

    2016-07-01

    Full Text Available The intensive research of speech emotion recognition introduced a huge collection of speech emotion features. Large feature sets complicate the speech emotion recognition task. Among various feature selection and transformation techniques for one-stage classification, multiple classifier systems were proposed. The main idea of multiple classifiers is to arrange the emotion classification process in stages. Besides parallel and serial cases, the hierarchical arrangement of multi-stage classification is most widely used for speech emotion recognition. In this paper, we present a sequential-forward-feature-selection-based multi-stage classification scheme. The Sequential Forward Selection (SFS and Sequential Floating Forward Selection (SFFS techniques were employed for every stage of the multi-stage classification scheme. Experimental testing of the proposed scheme was performed using the German and Lithuanian emotional speech datasets. Sequential-feature-selection-based multi-stage classification outperformed the single-stage scheme by 12–42 % for different emotion sets. The multi-stage scheme has shown higher robustness to the growth of emotion set. The decrease in recognition rate with the increase in emotion set for multi-stage scheme was lower by 10–20 % in comparison with the single-stage case. Differences in SFS and SFFS employment for feature selection were negligible.

  16. Different Cortical Mechanisms for Spatial vs. Feature-Based Attentional Selection in Visual Working Memory.

    Science.gov (United States)

    Heuer, Anna; Schubö, Anna; Crawford, J D

    2016-01-01

    The limited capacity of visual working memory (VWM) necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial vs. feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS) to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information) or their shape (featural information). We found that TMS over the supramarginal gyrus (SMG) selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex (LO) selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in terms of attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations.

  17. Different Cortical Mechanisms for Spatial vs. Feature-Based Attentional Selection in Visual Working Memory

    Science.gov (United States)

    Heuer, Anna; Schubö, Anna; Crawford, J. D.

    2016-01-01

    The limited capacity of visual working memory (VWM) necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial vs. feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS) to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information) or their shape (featural information). We found that TMS over the supramarginal gyrus (SMG) selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex (LO) selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in terms of attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations. PMID:27582701

  18. Different cortical mechanisms for spatial vs. feature-based attentional selection in visual working memory

    Directory of Open Access Journals (Sweden)

    Anna Heuer

    2016-08-01

    Full Text Available The limited capacity of visual working memory necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial versus feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information or their shape (featural information. We found that TMS over the supramarginal gyrus (SMG selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations.

  19. Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2013-01-01

    Full Text Available Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.

  20. Improved semi-supervised online boosting for object tracking

    Science.gov (United States)

    Li, Yicui; Qi, Lin; Tan, Shukun

    2016-10-01

    The advantage of an online semi-supervised boosting method which takes object tracking problem as a classification problem, is training a binary classifier from labeled and unlabeled examples. Appropriate object features are selected based on real time changes in the object. However, the online semi-supervised boosting method faces one key problem: The traditional self-training using the classification results to update the classifier itself, often leads to drifting or tracking failure, due to the accumulated error during each update of the tracker. To overcome the disadvantages of semi-supervised online boosting based on object tracking methods, the contribution of this paper is an improved online semi-supervised boosting method, in which the learning process is guided by positive (P) and negative (N) constraints, termed P-N constraints, which restrict the labeling of the unlabeled samples. First, we train the classification by an online semi-supervised boosting. Then, this classification is used to process the next frame. Finally, the classification is analyzed by the P-N constraints, which are used to verify if the labels of unlabeled data assigned by the classifier are in line with the assumptions made about positive and negative samples. The proposed algorithm can effectively improve the discriminative ability of the classifier and significantly alleviate the drifting problem in tracking applications. In the experiments, we demonstrate real-time tracking of our tracker on several challenging test sequences where our tracker outperforms other related on-line tracking methods and achieves promising tracking performance.

  1. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

    Science.gov (United States)

    Yu, Sheng; Liao, Katherine P; Shaw, Stanley Y; Gainer, Vivian S; Churchill, Susanne E; Szolovits, Peter; Murphy, Shawn N; Kohane, Isaac S; Cai, Tianxi

    2015-09-01

    Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All

  2. Performance Evaluation of Content Based Image Retrieval on Feature Optimization and Selection Using Swarm Intelligence

    Directory of Open Access Journals (Sweden)

    Kirti Jain

    2016-03-01

    Full Text Available The diversity and applicability of swarm intelligence is increasing everyday in the fields of science and engineering. Swarm intelligence gives the features of the dynamic features optimization concept. We have used swarm intelligence for the process of feature optimization and feature selection for content-based image retrieval. The performance of content-based image retrieval faced the problem of precision and recall. The value of precision and recall depends on the retrieval capacity of the image. The basic raw image content has visual features such as color, texture, shape and size. The partial feature extraction technique is based on geometric invariant function. Three swarm intelligence algorithms were used for the optimization of features: ant colony optimization, particle swarm optimization (PSO, and glowworm optimization algorithm. Coral image dataset and MatLab software were used for evaluating performance.

  3. A Novel Feature Selection Strategy for Enhanced Biomedical Event Extraction Using the Turku System

    Directory of Open Access Journals (Sweden)

    Jingbo Xia

    2014-01-01

    Full Text Available Feature selection is of paramount importance for text-mining classifiers with high-dimensional features. The Turku Event Extraction System (TEES is the best performing tool in the GENIA BioNLP 2009/2011 shared tasks, which relies heavily on high-dimensional features. This paper describes research which, based on an implementation of an accumulated effect evaluation (AEE algorithm applying the greedy search strategy, analyses the contribution of every single feature class in TEES with a view to identify important features and modify the feature set accordingly. With an updated feature set, a new system is acquired with enhanced performance which achieves an increased F-score of 53.27% up from 51.21% for Task 1 under strict evaluation criteria and 57.24% according to the approximate span and recursive criterion.

  4. Feature subset selection based on mahalanobis distance: a statistical rough set method

    Institute of Scientific and Technical Information of China (English)

    Sun Liang; Han Chongzhao

    2008-01-01

    In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalanobis distance, the effective feature subsets are obtained as generalized attribute reducts. Experiment result shows that the classification performance can be improved by using the selected feature subsets.

  5. Survival Prediction and Feature Selection in Patients with Breast Cancer Using Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Shahrbanoo Goli

    2016-01-01

    Full Text Available The Support Vector Regression (SVR model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model.

  6. A feature selection approach towards progressive vector transmission over the Internet

    Science.gov (United States)

    Miao, Ru; Song, Jia; Feng, Min

    2017-09-01

    WebGIS has been applied for visualizing and sharing geospatial information popularly over the Internet. In order to improve the efficiency of the client applications, the web-based progressive vector transmission approach is proposed. Important features should be selected and transferred firstly, and the methods for measuring the importance of features should be further considered in the progressive transmission. However, studies on progressive transmission for large-volume vector data have mostly focused on map generalization in the field of cartography, but rarely discussed on the selection of geographic features quantitatively. This paper applies information theory for measuring the feature importance of vector maps. A measurement model for the amount of information of vector features is defined based upon the amount of information for dealing with feature selection issues. The measurement model involves geometry factor, spatial distribution factor and thematic attribute factor. Moreover, a real-time transport protocol (RTP)-based progressive transmission method is then presented to improve the transmission of vector data. To clearly demonstrate the essential methodology and key techniques, a prototype for web-based progressive vector transmission is presented, and an experiment of progressive selection and transmission for vector features is conducted. The experimental results indicate that our approach clearly improves the performance and end-user experience of delivering and manipulating large vector data over the Internet.

  7. Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification

    CERN Document Server

    Nongmeikapam, Kishorjit; 10.5121/ijcsit.2011.350

    2011-01-01

    This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an important factor in the recognition of Manipuri MWEs using Conditional Random Field (CRF). The disadvantage of manual selection and choosing of the appropriate features for running CRF motivates us to think of Genetic Algorithm (GA). Using GA we are able to find the optimal features to run the CRF. We have tried with fifty generations in feature selection along with three fold cross validation as fitness function. This model demonstrated the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%, showing an improvement over the CRF based Manipuri MWE identification without GA application.

  8. TOPSIS Based Multi-Criteria Decision Making of Feature Selection Techniques for Network Traffic Dataset

    Directory of Open Access Journals (Sweden)

    Raman Singh

    2014-01-01

    Full Text Available Intrusion detection systems (IDS have to process millions of packets with many features, which delay the detection of anomalies. Sampling and feature selection may be used to reduce computation time and hence minimizing intrusion detection time. This paper aims to suggest some feature selection algorithm on the basis of The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS. TOPSIS is used to suggest one or more choice(s among some alternatives, having many attributes. Total ten feature selection techniques have been used for the analysis of KDD network dataset. Three classifiers namely Naïve Bayes, J48 and PART have been considered for this experiment using Weka data mining tool. Ranking of the techniques using TOPSIS have been calculated by using MATLAB as a tool. Out of these techniques Filtered Subset Evaluation has been found suitable for intrusion detection in terms of very less computational time with acceptable accuracy.

  9. Cost effective approach on feature selection using genetic algorithms and fuzzy logic for diabetes diagnosis

    CERN Document Server

    Ephzibah, E P

    2011-01-01

    A way to enhance the performance of a model that combines genetic algorithms and fuzzy logic for feature selection and classification is proposed. Early diagnosis of any disease with less cost is preferable. Diabetes is one such disease. Diabetes has become the fourth leading cause of death in developed countries and there is substantial evidence that it is reaching epidemic proportions in many developing and newly industrialized nations. In medical diagnosis, patterns consist of observable symptoms along with the results of diagnostic tests. These tests have various associated costs and risks. In the automated design of pattern classification, the proposed system solves the feature subset selection problem. It is a task of identifying and selecting a useful subset of pattern-representing features from a larger set of features. Using fuzzy rule-based classification system, the proposed system proves to improve the classification accuracy.

  10. A Multistage Feature Selection Model for Document Classification Using Information Gain and Rough Set

    Directory of Open Access Journals (Sweden)

    Mrs. Leena. H. Patil

    2014-11-01

    Full Text Available Huge number of documents are increasing rapidly, therefore, to organize it in digitized form text categorization becomes an challenging issue. A major issue for text categorization is its large number of features. Most of the features are noisy, irrelevant and redundant, which may mislead the classifier. Hence, it is most important to reduce dimensionality of data to get smaller subset and provide the most gain in information. Feature selection techniques reduce the dimensionality of feature space. It also improves the overall accuracy and performance. Hence, to overcome the issues of text categorization feature selection is considered as an efficient technique . Therefore, we, proposed a multistage feature selection model to improve the overall accuracy and performance of classification. In the first stage document preprocessing part is performed. Secondly, each term within the documents are ranked according to their importance for classification using the information gain. Thirdly rough set technique is applied to the terms which are ranked importantly and feature reduction is carried out. Finally a document classification is performed on the core features using Naive Bayes and KNN classifier. Experiments are carried out on three UCI datasets, Reuters 21578, Classic 04 and Newsgroup 20. Results show the better accuracy and performance of the proposed model.

  11. Biometric hashing for handwriting: entropy-based feature selection and semantic fusion

    Science.gov (United States)

    Scheidat, Tobias; Vielhauer, Claus

    2008-02-01

    Some biometric algorithms lack of the problem of using a great number of features, which were extracted from the raw data. This often results in feature vectors of high dimensionality and thus high computational complexity. However, in many cases subsets of features do not contribute or with only little impact to the correct classification of biometric algorithms. The process of choosing more discriminative features from a given set is commonly referred to as feature selection. In this paper we present a study on feature selection for an existing biometric hash generation algorithm for the handwriting modality, which is based on the strategy of entropy analysis of single components of biometric hash vectors, in order to identify and suppress elements carrying little information. To evaluate the impact of our feature selection scheme to the authentication performance of our biometric algorithm, we present an experimental study based on data of 86 users. Besides discussing common biometric error rates such as Equal Error Rates, we suggest a novel measurement to determine the reproduction rate probability for biometric hashes. Our experiments show that, while the feature set size may be significantly reduced by 45% using our scheme, there are marginal changes both in the results of a verification process as well as in the reproducibility of biometric hashes. Since multi-biometrics is a recent topic, we additionally carry out a first study on a pair wise multi-semantic fusion based on reduced hashes and analyze it by the introduced reproducibility measure.

  12. A HYBRID FILTER AND WRAPPER FEATURE SELECTION APPROACH FOR DETECTING CONTAMINATION IN DRINKING WATER MANAGEMENT SYSTEM

    Directory of Open Access Journals (Sweden)

    S. VISALAKSHI

    2017-07-01

    Full Text Available Feature selection is an important task in predictive models which helps to identify the irrelevant features in the high - dimensional dataset. In this case of water contamination detection dataset, the standard wrapper algorithm alone cannot be applied because of the complexity. To overcome this computational complexity problem and making it lighter, filter-wrapper based algorithm has been proposed. In this work, reducing the feature space is a significant component of water contamination. The main findings are as follows: (1 The main goal is speeding up the feature selection process, so the proposed filter - based feature pre-selection is applied and guarantees that useful data are improbable to be detached in the initial stage which discussed briefly in this paper. (2 The resulting features are again filtered by using the Genetic Algorithm coded with Support Vector Machine method, where it facilitates to nutshell the subset of features with high accuracy and decreases the expense. Experimental results show that the proposed methods trim down redundant features effectively and achieved better classification accuracy.

  13. Feature selection and classification methodology for the detection of knee-joint disorders.

    Science.gov (United States)

    Nalband, Saif; Sundar, Aditya; Prince, A Amalin; Agarwal, Anita

    2016-04-01

    Vibroarthographic (VAG) signals emitted from the knee joint disorder provides an early diagnostic tool. The nonstationary and nonlinear nature of VAG signal makes an important aspect for feature extraction. In this work, we investigate VAG signals by proposing a wavelet based decomposition. The VAG signals are decomposed into sub-band signals of different frequencies. Nonlinear features such as recurrence quantification analysis (RQA), approximate entropy (ApEn) and sample entropy (SampEn) are extracted as features of VAG signal. A total of twenty-four features form a vector to characterize a VAG signal. Two feature selection (FS) techniques, apriori algorithm and genetic algorithm (GA) selects six and four features as the most significant features. Least square support vector machines (LS-SVM) and random forest are proposed as classifiers to evaluate the performance of FS techniques. Results indicate that the classification accuracy was more prominent with features selected from FS algorithms. Results convey that LS-SVM using the apriori algorithm gives the highest accuracy of 94.31% with false discovery rate (FDR) of 0.0892. The proposed work also provided better classification accuracy than those reported in the previous studies which gave an accuracy of 88%. This work can enhance the performance of existing technology for accurately distinguishing normal and abnormal VAG signals. And the proposed methodology could provide an effective non-invasive diagnostic tool for knee joint disorders.

  14. An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

    Directory of Open Access Journals (Sweden)

    Ali Ahmed

    2011-01-01

    Full Text Available Problem statement: Similarity based Virtual Screening (VS deals with a large amount of data containing irrelevant and/or redundant fragments or features. Recent use of Bayesian network as an alternative for existing tools for similarity based VS has received noticeable attention of the researchers in the field of chemoinformatics. Approach: To this end, different models of Bayesian network have been developed. In this study, we enhance the Bayesian Inference Network (BIN using a subset of selected molecules features. Results: In this approach, a few features were filtered from the molecular fingerprint features based on a features selection approach. Conclusion: Simulated virtual screening experiments with MDL Drug Data Report (MDDR data sets showed that the proposed method provides simple ways of enhancing the cost effectiveness of ligand-based virtual screening searches, especially for higher diversity data set.

  15. Using genetic algorithm feature selection in neural classification systems for image pattern recognition

    Directory of Open Access Journals (Sweden)

    Margarita R. Gamarra A.

    2012-09-01

    Full Text Available Pattern recognition performance depends on variations during extraction, selection and classification stages. This paper presents an approach to feature selection by using genetic algorithms with regard to digital image recognition and quality control. Error rate and kappa coefficient were used for evaluating the genetic algorithm approach Neural networks were used for classification, involving the features selected by the genetic algorithms. The neural network approach was compared to a K-nearest neighbor classifier. The proposed approach performed better than the other methods.

  16. Improving Image steganalysis performance using a graph-based feature selection method

    Directory of Open Access Journals (Sweden)

    Amir Nouri

    2016-05-01

    Full Text Available Steganalysis is the skill of discovering the use of steganography algorithms within an image with low or no information regarding the steganography algorithm or/and its parameters. The high-dimensionality of image data with small number of samples has presented a difficult challenge for the steganalysis task. Several methods have been presented to improve the steganalysis performance by feature selection. Feature selection, also known as variable selection, is one of the fundamental problems in the fields of machine learning, pattern recognition and statistics. The aim of feature selection is to reduce the dimensionality of image data in order to enhance the accuracy of Steganalysis task. In this paper, we have proposed a new graph-based blind steganalysis method for detecting stego images from the cover images in JPEG images using a feature selection technique based on community detection. The experimental results show that the proposed approach is easy to be employed for steganalysis purposes. Moreover, performance of proposed method is better than several recent and well-known feature selection-based Image steganalysis methods.

  17. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data.

    Science.gov (United States)

    Radovic, Milos; Ghalwash, Mohamed; Filipovic, Nenad; Obradovic, Zoran

    2017-01-03

    Feature selection, aiming to identify a subset of features among a possibly large set of features that are relevant for predicting a response, is an important preprocessing step in machine learning. In gene expression studies this is not a trivial task for several reasons, including potential temporal character of data. However, most feature selection approaches developed for microarray data cannot handle multivariate temporal data without previous data flattening, which results in loss of temporal information. We propose a temporal minimum redundancy - maximum relevance (TMRMR) feature selection approach, which is able to handle multivariate temporal data without previous data flattening. In the proposed approach we compute relevance of a gene by averaging F-statistic values calculated across individual time steps, and we compute redundancy between genes by using a dynamical time warping approach. The proposed method is evaluated on three temporal gene expression datasets from human viral challenge studies. Obtained results show that the proposed method outperforms alternatives widely used in gene expression studies. In particular, the proposed method achieved improvement in accuracy in 34 out of 54 experiments, while the other methods outperformed it in no more than 4 experiments. We developed a filter-based feature selection method for temporal gene expression data based on maximum relevance and minimum redundancy criteria. The proposed method incorporates temporal information by combining relevance, which is calculated as an average F-statistic value across different time steps, with redundancy, which is calculated by employing dynamical time warping approach. As evident in our experiments, incorporating the temporal information into the feature selection process leads to selection of more discriminative features.

  18. Supervisor's HEXACO personality traits and subordinate perceptions of abusive supervision

    NARCIS (Netherlands)

    Breevaart, Kimberley; Vries, de Reinout E.

    2017-01-01

    Abusive supervision is detrimental to both subordinates and organizations. Knowledge about individual differences in personality related to abusive supervision may improve personnel selection and potentially reduce the harmful effects of this type of leadership. Using the HEXACO personality framewor

  19. A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification.

    Science.gov (United States)

    Gillies, Christopher E; Siadat, Mohammad-Reza; Patel, Nilesh V; Wilson, George D

    2013-12-01

    Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by δ, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL(G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL(G) ≥ 0.696; (2) it decreases classification accuracy when BCAL(G) ≤ 0.389; (3) it provides marginal accuracy improvement when 0.389genes in a biological condition increases beyond 50 and

  20. Electrocardiogram Based Identification using a New Effective Intelligent Selection of Fused Features

    Science.gov (United States)

    Abbaspour, Hamidreza; Razavi, Seyyed Mohammad; Mehrshad, Nasser

    2015-01-01

    Over the years, the feasibility of using Electrocardiogram (ECG) signal for human identification issue has been investigated, and some methods have been suggested. In this research, a new effective intelligent feature selection method from ECG signals has been proposed. This method is developed in such a way that it is able to select important features that are necessary for identification using analysis of the ECG signals. For this purpose, after ECG signal preprocessing, its characterizing features were extracted and then compressed using the cosine transform. The more effective features in the identification, among the characterizing features, are selected using a combination of the genetic algorithm and artificial neural networks. The proposed method was tested on three public ECG databases, namely, MIT-BIH Arrhythmias Database, MITBIH Normal Sinus Rhythm Database and The European ST-T Database, in order to evaluate the proposed subject identification method on normal ECG signals as well as ECG signals with arrhythmias. Identification rates of 99.89% and 99.84% and 99.99% are obtained for these databases respectively. The proposed algorithm exhibits remarkable identification accuracies not only with normal ECG signals, but also in the presence of various arrhythmias. Simulation results showed that the proposed method despite the low number of selected features has a high performance in identification task. PMID:25709939

  1. An ant colony optimization based feature selection for web page classification.

    Science.gov (United States)

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  2. An Ant Colony Optimization Based Feature Selection for Web Page Classification

    Directory of Open Access Journals (Sweden)

    Esra Saraç

    2014-01-01

    Full Text Available The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  3. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection

    Directory of Open Access Journals (Sweden)

    Abdelniser Moomen

    2016-04-01

    Full Text Available Nondestructive Testing (NDT assessment of materials’ health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms.

  4. Reducing Sweeping Frequencies in Microwave NDT Employing Machine Learning Feature Selection.

    Science.gov (United States)

    Moomen, Abdelniser; Ali, Abdulbaset; Ramahi, Omar M

    2016-04-19

    Nondestructive Testing (NDT) assessment of materials' health condition is useful for classifying healthy from unhealthy structures or detecting flaws in metallic or dielectric structures. Performing structural health testing for coated/uncoated metallic or dielectric materials with the same testing equipment requires a testing method that can work on metallics and dielectrics such as microwave testing. Reducing complexity and expenses associated with current diagnostic practices of microwave NDT of structural health requires an effective and intelligent approach based on feature selection and classification techniques of machine learning. Current microwave NDT methods in general based on measuring variation in the S-matrix over the entire operating frequency ranges of the sensors. For instance, assessing the health of metallic structures using a microwave sensor depends on the reflection or/and transmission coefficient measurements as a function of the sweeping frequencies of the operating band. The aim of this work is reducing sweeping frequencies using machine learning feature selection techniques. By treating sweeping frequencies as features, the number of top important features can be identified, then only the most influential features (frequencies) are considered when building the microwave NDT equipment. The proposed method of reducing sweeping frequencies was validated experimentally using a waveguide sensor and a metallic plate with different cracks. Among the investigated feature selection techniques are information gain, gain ratio, relief, chi-squared. The effectiveness of the selected features were validated through performance evaluations of various classification models; namely, Nearest Neighbor, Neural Networks, Random Forest, and Support Vector Machine. Results showed good crack classification accuracy rates after employing feature selection algorithms.

  5. Exploitation of Intra-Spectral Band Correlation for Rapid Feature Selection, and Target Identification in Hyperspectral Imagery

    Science.gov (United States)

    2009-03-01

    entitled “Improved Feature Extraction, Feature Selection, and Identification Techniques that Create a Fast Unsupervised Hyperspectral Target Detection...thesis proposal “Improved Feature Extraction, Feature Selection, and Identification Techniques that Create a Fast Unsupervised Hyperspectral Target...target or non-target classifications . Integration of this type of autonomous target detection algorithm along with hyperspectral imaging sensors

  6. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

    Science.gov (United States)

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.

  7. A DYNAMIC FEATURE SELECTION METHOD FOR DOCUMENT RANKING WITH RELEVANCE FEEDBACK APPROACH

    Directory of Open Access Journals (Sweden)

    K. Latha

    2010-07-01

    Full Text Available Ranking search results is essential for information retrieval and Web search. Search engines need to not only return highly relevant results, but also be fast to satisfy users. As a result, not all available features can be used for ranking, and in fact only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. In this paper we describe a 0/1 knapsack procedure for automatically selecting features to use within Generalization model for Document Ranking. We propose an approach for Relevance Feedback using Expectation Maximization method and evaluate the algorithm on the TREC Collection for describing classes of feedback textual information retrieval features. Experimental results, evaluated on standard TREC-9 part of the OHSUMED collections, show that our feature selection algorithm produces models that are either significantly more effective than, or equally effective as, models such as Markov Random Field model, Correlation Co-efficient and Count Difference method

  8. SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

    Science.gov (United States)

    Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

    2014-01-01

    Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306

  9. Comparative Study on Feature Selection and Fusion Schemes for Emotion Recognition from Speech

    Directory of Open Access Journals (Sweden)

    Santiago Planet

    2012-09-01

    Full Text Available The automatic analysis of speech to detect affective states may improve the way users interact with electronic devices. However, the analysis only at the acoustic level could be not enough to determine the emotion of a user in a realistic scenario. In this paper we analyzed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We compared two versions of this greedy selection algorithm by performing the search of the relevant features forwards and backwards. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme. Moreover, the extremely reduced set of acoustic features obtained by the greedy forward search selection algorithm improved the results provided by the full set.

  10. Discriminative multi-task feature selection for multi-modality classification of Alzheimer's disease.

    Science.gov (United States)

    Ye, Tingting; Zu, Chen; Jie, Biao; Shen, Dinggang; Zhang, Daoqiang

    2016-09-01

    Recently, multi-task based feature selection methods have been used in multi-modality based classification of Alzheimer's disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). However, in traditional multi-task feature selection methods, some useful discriminative information among subjects is usually not well mined for further improving the subsequent classification performance. Accordingly, in this paper, we propose a discriminative multi-task feature selection method to select the most discriminative features for multi-modality based classification of AD/MCI. Specifically, for each modality, we train a linear regression model using the corresponding modality of data, and further enforce the group-sparsity regularization on weights of those regression models for joint selection of common features across multiple modalities. Furthermore, we propose a discriminative regularization term based on the intra-class and inter-class Laplacian matrices to better use the discriminative information among subjects. To evaluate our proposed method, we perform extensive experiments on 202 subjects, including 51 AD patients, 99 MCI patients, and 52 healthy controls (HC), from the baseline MRI and FDG-PET image data of the Alzheimer's Disease Neuroimaging Initiative (ADNI). The experimental results show that our proposed method not only improves the classification performance, but also has potential to discover the disease-related biomarkers useful for diagnosis of disease, along with the comparison to several state-of-the-art methods for multi-modality based AD/MCI classification.

  11. A biological mechanism for Bayesian feature selection: Weight decay and raising the LASSO.

    Science.gov (United States)

    Connor, Patrick; Hollensen, Paul; Krigolson, Olav; Trappenberg, Thomas

    2015-07-01

    Biological systems are capable of learning that certain stimuli are valuable while ignoring the many that are not, and thus perform feature selection. In machine learning, one effective feature selection approach is the least absolute shrinkage and selection operator (LASSO) form of regularization, which is equivalent to assuming a Laplacian prior distribution on the parameters. We review how such Bayesian priors can be implemented in gradient descent as a form of weight decay, which is a biologically plausible mechanism for Bayesian feature selection. In particular, we describe a new prior that offsets or "raises" the Laplacian prior distribution. We evaluate this alongside the Gaussian and Cauchy priors in gradient descent using a generic regression task where there are few relevant and many irrelevant features. We find that raising the Laplacian leads to less prediction error because it is a better model of the underlying distribution. We also consider two biologically relevant online learning tasks, one synthetic and one modeled after the perceptual expertise task of Krigolson et al. (2009). Here, raising the Laplacian prior avoids the fast erosion of relevant parameters over the period following training because it only allows small weights to decay. This better matches the limited loss of association seen between days in the human data of the perceptual expertise task. Raising the Laplacian prior thus results in a biologically plausible form of Bayesian feature selection that is effective in biologically relevant contexts.

  12. Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors

    Directory of Open Access Journals (Sweden)

    Hong Zhao

    2013-01-01

    Full Text Available Feature selection is an essential process in data mining applications since it reduces a model’s complexity. However, feature selection with various types of costs is still a new research topic. In this paper, we study the cost-sensitive feature selection problem of numeric data with measurement errors. The major contributions of this paper are fourfold. First, a new data model is built to address test costs and misclassification costs as well as error boundaries. It is distinguished from the existing models mainly on the error boundaries. Second, a covering-based rough set model with normal distribution measurement errors is constructed. With this model, coverings are constructed from data rather than assigned by users. Third, a new cost-sensitive feature selection problem is defined on this model. It is more realistic than the existing feature selection problems. Fourth, both backtracking and heuristic algorithms are proposed to deal with the new problem. Experimental results show the efficiency of the pruning techniques for the backtracking algorithm and the effectiveness of the heuristic algorithm. This study is a step toward realistic applications of the cost-sensitive learning.

  13. On the selection of optimal feature region set for robust digital image watermarking.

    Science.gov (United States)

    Tsai, Jen-Sheng; Huang, Win-Bin; Kuo, Yau-Hwang

    2011-03-01

    A novel feature region selection method for robust digital image watermarking is proposed in this paper. This method aims to select a nonoverlapping feature region set, which has the greatest robustness against various attacks and can preserve image quality as much as possible after watermarked. It first performs a simulated attacking procedure using some predefined attacks to evaluate the robustness of every candidate feature region. According to the evaluation results, it then adopts a track-with-pruning procedure to search a minimal primary feature set which can resist the most predefined attacks. In order to enhance its resistance to undefined attacks under the constraint of preserving image quality, the primary feature set is then extended by adding into some auxiliary feature regions. This work is formulated as a multidimensional knapsack problem and solved by a genetic algorithm based approach. The experimental results for StirMark attacks on some benchmark images support our expectation that the primary feature set can resist all the predefined attacks and its extension can enhance the robustness against undefined attacks. Comparing with some well-known feature-based methods, the proposed method exhibits better performance in robust digital watermarking.

  14. DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm

    KAUST Repository

    Soufan, Othman

    2015-02-26

    Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem\\'s dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filteringmethods thatmay be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.

  15. An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model

    KAUST Repository

    Soufan, Othman

    2012-09-01

    Feature selection is the first task of any learning approach that is applied in major fields of biomedical, bioinformatics, robotics, natural language processing and social networking. In feature subset selection problem, a search methodology with a proper criterion seeks to find the best subset of features describing data (relevance) and achieving better performance (optimality). Wrapper approaches are feature selection methods which are wrapped around a classification algorithm and use a performance measure to select the best subset of features. We analyze the proper design of the objective function for the wrapper approach and highlight an objective based on several classification algorithms. We compare the wrapper approaches to different feature selection methods based on distance and information based criteria. Significant improvement in performance, computational time, and selection of minimally sized feature subsets is achieved by combining different objectives for the wrapper model. In addition, considering various classification methods in the feature selection process could lead to a global solution of desirable characteristics.

  16. Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation

    CERN Document Server

    Tangaro, Sabina; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Inglese, Paolo; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto

    2015-01-01

    Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic Resonance Imaging (MRI) scans can show these variations and therefore be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach, for each voxel a number of local features were calculated. In this paper we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) Sequential Forward Selection and (iii) Sequential Backward Elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects...

  17. Application of Fisher Score and mRMR Techniques for Feature Selection in Compressed Medical Images

    Directory of Open Access Journals (Sweden)

    Vamsidhar Enireddy

    2015-12-01

    Full Text Available In nowadays there is a large increase in the digital medical images and different medical imaging equipments are available for diagnoses, medical professionals are increasingly relying on computer aided techniques for both indexing these images and retrieving similar images from large repositories. To develop systems which are computationally less intensive without compromising on the accuracy from the high dimensional feature space is always challenging. In this paper an investigation is made on the retrieval of compressed medical images. Images are compressed using the visually lossless compression technique. Shape and texture features are extracted and best features are selected using the fisher technique and mRMR. Using these selected features RNN with BPTT was utilized for classification of the compressed images.

  18. Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

    Science.gov (United States)

    Koromyslova, A.; Semenkina, M.; Sergienko, R.

    2017-02-01

    The text classification problem for natural language call routing was considered in the paper. Seven different term weighting methods were applied. As dimensionality reduction methods, the feature selection based on self-adaptive GA is considered. k-NN, linear SVM and ANN were used as classification algorithms. The tasks of the research are the following: perform research of text classification for natural language call routing with different term weighting methods and classification algorithms and investigate the feature selection method based on self-adaptive GA. The numerical results showed that the most effective term weighting is TRR. The most effective classification algorithm is ANN. Feature selection with self-adaptive GA provides improvement of classification effectiveness and significant dimensionality reduction with all term weighting methods and with all classification algorithms.

  19. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Science.gov (United States)

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency. PMID:26346654

  20. A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms.

    Science.gov (United States)

    Zhou, Hongfang; Guo, Jie; Wang, Yinghui; Zhao, Minghua

    2016-01-01

    Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF, t-Test, and CMFS algorithms.

  1. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Directory of Open Access Journals (Sweden)

    Pavol Partila

    2015-01-01

    Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  2. Identity Recognition Algorithm Using Improved Gabor Feature Selection of Gait Energy Image

    Science.gov (United States)

    Chao, LIANG; Ling-yao, JIA; Dong-cheng, SHI

    2017-01-01

    This paper describes an effective gait recognition approach based on Gabor features of gait energy image. In this paper, the kernel Fisher analysis combined with kernel matrix is proposed to select dominant features. The nearest neighbor classifier based on whitened cosine distance is used to discriminate different gait patterns. The approach proposed is tested on the CASIA and USF gait databases. The results show that our approach outperforms other state of gait recognition approaches in terms of recognition accuracy and robustness.

  3. Training, supervision and quality of care in selected integrated community case management (iCCM programmes: A scoping review of programmatic evidence

    Directory of Open Access Journals (Sweden)

    Xavier Bosch–Capblanch

    2014-11-01

    Full Text Available To describe the training, supervision and quality of care components of integrated Community Case Management (iCCM programmes and to draw lessons learned from existing evaluations of those programmes

  4. Functional connectivity supporting the selective maintenance of feature-location binding in visual working memory

    Directory of Open Access Journals (Sweden)

    Sachiko eTakahama

    2014-06-01

    Full Text Available Information on an object’s features bound to its location is very important for maintaining object representations in visual working memory. Interactions with dynamic multi-dimensional objects in an external environment require complex cognitive control, including the selective maintenance of feature-location binding. Here, we used event-related functional magnetic resonance imaging to investigate brain activity and functional connectivity related to the maintenance of complex feature-location binding. Participants were required to detect task-relevant changes in feature-location binding between objects defined by color, orientation, and location. We compared a complex binding task requiring complex feature-location binding (color-orientation-location with a simple binding task in which simple feature-location binding, such as color-location, was task-relevant and the other feature was task-irrelevant. Univariate analyses showed that the dorsolateral prefrontal cortex (DLPFC, hippocampus, and frontoparietal network were activated during the maintenance of complex feature-location binding. Functional connectivity analyses indicated cooperation between the inferior precentral sulcus (infPreCS, DLPFC, and hippocampus during the maintenance of complex feature-location binding. In contrast, the connectivity for the spatial updating of simple feature-location binding determined by reanalyzing the data from Takahama et al. (2010 demonstrated that the superior parietal lobule (SPL cooperated with the DLPFC and hippocampus. These results suggest that the connectivity for complex feature-location binding does not simply reflect general memory load and that the DLPFC and hippocampus flexibly modulate the dorsal frontoparietal network, depending on the task requirements, with the infPreCS involved in the maintenance of complex feature-location binding and the SPL involved in the spatial updating of simple feature-location binding.

  5. Feature Selection Applying Statistical and Neurofuzzy Methods to EEG-Based BCI.

    Science.gov (United States)

    Martinez-Leon, Juan-Antonio; Cano-Izquierdo, Jose-Manuel; Ibarrola, Julio

    2015-01-01

    This paper presents an investigation aimed at drastically reducing the processing burden required by motor imagery brain-computer interface (BCI) systems based on electroencephalography (EEG). In this research, the focus has moved from the channel to the feature paradigm, and a 96% reduction of the number of features required in the process has been achieved maintaining and even improving the classification success rate. This way, it is possible to build cheaper, quicker, and more portable BCI systems. The data set used was provided within the framework of BCI Competition III, which allows it to compare the presented results with the classification accuracy achieved in the contest. Furthermore, a new three-step methodology has been developed which includes a feature discriminant character calculation stage; a score, order, and selection phase; and a final feature selection step. For the first stage, both statistics method and fuzzy criteria are used. The fuzzy criteria are based on the S-dFasArt classification algorithm which has shown excellent performance in previous papers undertaking the BCI multiclass motor imagery problem. The score, order, and selection stage is used to sort the features according to their discriminant nature. Finally, both order selection and Group Method Data Handling (GMDH) approaches are used to choose the most discriminant ones.

  6. Explore Interregional EEG Correlations Changed by Sport Training Using Feature Selection

    Directory of Open Access Journals (Sweden)

    Jia Gao

    2016-01-01

    Full Text Available This paper investigated the interregional correlation changed by sport training through electroencephalography (EEG signals using the techniques of classification and feature selection. The EEG data are obtained from students with long-time professional sport training and normal students without sport training as baseline. Every channel of the 19-channel EEG signals is considered as a node in the brain network and Pearson Correlation Coefficients are calculated between every two nodes as the new features of EEG signals. Then, the Partial Least Square (PLS is used to select the top 10 most varied features and Pearson Correlation Coefficients of selected features are compared to show the difference of two groups. Result shows that the classification accuracy of two groups is improved from 88.13% by the method using measurement of EEG overall energy to 97.19% by the method using EEG correlation measurement. Furthermore, the features selected reveal that the most important interregional EEG correlation changed by training is the correlation between left inferior frontal and left middle temporal with a decreased value.

  7. Optimal Feature Space Selection in Detecting Epileptic Seizure based on Recurrent Quantification Analysis and Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Saleh LAshkari

    2016-06-01

    Full Text Available Selecting optimal features based on nature of the phenomenon and high discriminant ability is very important in the data classification problems. Since it doesn't require any assumption about stationary condition and size of the signal and the noise in Recurrent Quantification Analysis (RQA, it may be useful for epileptic seizure Detection. In this study, RQA was used to discriminate ictal EEG from the normal EEG where optimal features selected by combination of algorithm genetic and Bayesian Classifier. Recurrence plots of hundred samples in each two categories were obtained with five distance norms in this study: Euclidean, Maximum, Minimum, Normalized and Fixed Norm. In order to choose optimal threshold for each norm, ten threshold of ε was generated and then the best feature space was selected by genetic algorithm in combination with a bayesian classifier. The results shown that proposed method is capable of discriminating the ictal EEG from the normal EEG where for Minimum norm and 0.1˂ε˂1, accuracy was 100%. In addition, the sensitivity of proposed framework to the ε and the distance norm parameters was low. The optimal feature presented in this study is Trans which it was selected in most feature spaces with high accuracy.

  8. Less is more: Avoiding the LIBS dimensionality curse through judicious feature selection for explosive detection

    Science.gov (United States)

    Kumar Myakalwar, Ashwin; Spegazzini, Nicolas; Zhang, Chi; Kumar Anubham, Siva; Dasari, Ramachandra R.; Barman, Ishan; Kumar Gundawar, Manoj

    2015-08-01

    Despite its intrinsic advantages, translation of laser induced breakdown spectroscopy for material identification has been often impeded by the lack of robustness of developed classification models, often due to the presence of spurious correlations. While a number of classifiers exhibiting high discriminatory power have been reported, efforts in establishing the subset of relevant spectral features that enable a fundamental interpretation of the segmentation capability and avoid the ‘curse of dimensionality’ have been lacking. Using LIBS data acquired from a set of secondary explosives, we investigate judicious feature selection approaches and architect two different chemometrics classifiers -based on feature selection through prerequisite knowledge of the sample composition and genetic algorithm, respectively. While the full spectral input results in classification rate of ca.92%, selection of only carbon to hydrogen spectral window results in near identical performance. Importantly, the genetic algorithm-derived classifier shows a statistically significant improvement to ca. 94% accuracy for prospective classification, even though the number of features used is an order of magnitude smaller. Our findings demonstrate the impact of rigorous feature selection in LIBS and also hint at the feasibility of using a discrete filter based detector thereby enabling a cheaper and compact system more amenable to field operations.

  9. Texture feature selection with relevance learning to classify interstitial lung disease patterns

    Science.gov (United States)

    Huber, Markus B.; Bunte, Kerstin; Nagarajan, Mahesh B.; Biehl, Michael; Ray, Lawrence A.; Wismueller, Axel

    2011-03-01

    The Generalized Matrix Learning Vector Quantization (GMLVQ) is used to estimate the relevance of texture features in their ability to classify interstitial lung disease patterns in high-resolution computed tomography (HRCT) images. After a stochastic gradient descent, the GMLVQ algorithm provides a discriminative distance measure of relevance factors, which can account for pairwise correlations between different texture features and their importance for the classification of healthy and diseased patterns. Texture features were extracted from gray-level co-occurrence matrices (GLCMs), and were ranked and selected according to their relevance obtained by GMLVQ and, for comparison, to a mutual information (MI) criteria. A k-nearest-neighbor (kNN) classifier and a Support Vector Machine with a radial basis function kernel (SVMrbf) were optimized in a 10-fold crossvalidation for different texture feature sets. In our experiment with real-world data, the feature sets selected by the GMLVQ approach had a significantly better classification performance compared with feature sets selected by a MI ranking.

  10. Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection

    Institute of Scientific and Technical Information of China (English)

    Fatemeh Azmandian; Ayse Yilmazer; Jennifer G Dy; Javed A Aslam; David R Kaeli

    2014-01-01

    Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.

  11. A ROC-based feature selection method for computer-aided detection and diagnosis

    Science.gov (United States)

    Wang, Songyuan; Zhang, Guopeng; Liao, Qimei; Zhang, Junying; Jiao, Chun; Lu, Hongbing

    2014-03-01

    Image-based computer-aided detection and diagnosis (CAD) has been a very active research topic aiming to assist physicians to detect lesions and distinguish them from benign to malignant. However, the datasets fed into a classifier usually suffer from small number of samples, as well as significantly less samples available in one class (have a disease) than the other, resulting in the classifier's suboptimal performance. How to identifying the most characterizing features of the observed data for lesion detection is critical to improve the sensitivity and minimize false positives of a CAD system. In this study, we propose a novel feature selection method mR-FAST that combines the minimal-redundancymaximal relevance (mRMR) framework with a selection metric FAST (feature assessment by sliding thresholds) based on the area under a ROC curve (AUC) generated on optimal simple linear discriminants. With three feature datasets extracted from CAD systems for colon polyps and bladder cancer, we show that the space of candidate features selected by mR-FAST is more characterizing for lesion detection with higher AUC, enabling to find a compact subset of superior features at low cost.

  12. Supervised Transfer Sparse Coding

    KAUST Repository

    Al-Shedivat, Maruan

    2014-07-27

    A combination of the sparse coding and transfer learn- ing techniques was shown to be accurate and robust in classification tasks where training and testing objects have a shared feature space but are sampled from differ- ent underlying distributions, i.e., belong to different do- mains. The key assumption in such case is that in spite of the domain disparity, samples from different domains share some common hidden factors. Previous methods often assumed that all the objects in the target domain are unlabeled, and thus the training set solely comprised objects from the source domain. However, in real world applications, the target domain often has some labeled objects, or one can always manually label a small num- ber of them. In this paper, we explore such possibil- ity and show how a small number of labeled data in the target domain can significantly leverage classifica- tion accuracy of the state-of-the-art transfer sparse cod- ing methods. We further propose a unified framework named supervised transfer sparse coding (STSC) which simultaneously optimizes sparse representation, domain transfer and classification. Experimental results on three applications demonstrate that a little manual labeling and then learning the model in a supervised fashion can significantly improve classification accuracy.

  13. Analysis and Selection of Features for Gesture Recognition Based on a Micro Wearable Device

    Directory of Open Access Journals (Sweden)

    Yinghui Zhou

    2012-01-01

    Full Text Available More and More researchers concerned about designing a health supporting system for elders that is light weight, no disturbing to user, and low computing complexity. In the paper, we introduced a micro wearable device based on a tri-axis accelerometer, which can detect acceleration change of human body based on the position of the device being set. Considering the flexibility of human finger, we put it on a finger to detect the finger gestures. 12 kinds of one-stroke finger gestures are defined according to the sensing characteristic of the accelerometer. Feature is a paramount factor in the recognition task. In the paper, gestures features both in time domain and frequency domain are described since features decide the recognition accuracy directly. Feature generation method and selection process is analyzed in detail to get the optimal feature subset from the candidate feature set. Experiment results indicate the feature subset can get satisfactory classification results of 90.08% accuracy using 12 features considering the recognition accuracy and dimension of feature set.

  14. Textural feature selection for enhanced detection of stationary humans in through-the-wall radar imagery

    Science.gov (United States)

    Chaddad, A.; Ahmad, F.; Amin, M. G.; Sevigny, P.; DiFilippo, D.

    2014-05-01

    Feature-based methods have been recently considered in the literature for detection of stationary human targets in through-the-wall radar imagery. Specifically, textural features, such as contrast, correlation, energy, entropy, and homogeneity, have been extracted from gray-level co-occurrence matrices (GLCMs) to aid in discriminating the true targets from multipath ghosts and clutter that closely mimic the target in size and intensity. In this paper, we address the task of feature selection to identify the relevant subset of features in the GLCM domain, while discarding those that are either redundant or confusing, thereby improving the performance of feature-based scheme to distinguish between targets and ghosts/clutter. We apply a Decision Tree algorithm to find the optimal combination of co-occurrence based textural features for the problem at hand. We employ a K-Nearest Neighbor classifier to evaluate the performance of the optimal textural feature based scheme in terms of its target and ghost/clutter discrimination capability and use real-data collected with the vehicle-borne multi-channel through-the-wall radar imaging system by Defence Research and Development Canada. For the specific data analyzed, it is shown that the identified dominant features yield a higher classification accuracy, with lower number of false alarms and missed detections, compared to the full GLCM based feature set.

  15. Cluster analysis based on dimensional information with applications to feature selection and classification

    Science.gov (United States)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  16. Cluster analysis based on dimensional information with applications to feature selection and classification

    Science.gov (United States)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  17. Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.

    Science.gov (United States)

    Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin

    2017-04-01

    As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.

  18. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis.

    Science.gov (United States)

    Al-Rajab, Murad; Lu, Joan; Xu, Qiang

    2017-07-01

    This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Early Visual Cortex Dynamics during Top-Down Modulated Shifts of Feature-Selective Attention.

    Science.gov (United States)

    Müller, Matthias M; Trautmann, Mireille; Keitel, Christian

    2016-04-01

    Shifting attention from one color to another color or from color to another feature dimension such as shape or orientation is imperative when searching for a certain object in a cluttered scene. Most attention models that emphasize feature-based selection implicitly assume that all shifts in feature-selective attention underlie identical temporal dynamics. Here, we recorded time courses of behavioral data and steady-state visual evoked potentials (SSVEPs), an objective electrophysiological measure of neural dynamics in early visual cortex to investigate temporal dynamics when participants shifted attention from color or orientation toward color or orientation, respectively. SSVEPs were elicited by four random dot kinematograms that flickered at different frequencies. Each random dot kinematogram was composed of dashes that uniquely combined two features from the dimensions color (red or blue) and orientation (slash or backslash). Participants were cued to attend to one feature (such as color or orientation) and respond to coherent motion targets of the to-be-attended feature. We found that shifts toward color occurred earlier after the shifting cue compared with shifts toward orientation, regardless of the original feature (i.e., color or orientation). This was paralleled in SSVEP amplitude modulations as well as in the time course of behavioral data. Overall, our results suggest different neural dynamics during shifts of attention from color and orientation and the respective shifting destinations, namely, either toward color or toward orientation.

  20. Feature selection from short amino acid sequences in phosphorylation prediction problem

    Science.gov (United States)

    Wecławski, Jakub; Jankowski, Stanisław; Szymański, Zbigniew

    The paper describes solution of feature selection from amino acid sequences in phosphorylation prediction problem. We show that even for short sequences the variable selection leads to better classification performance. Moreover, the final simplicity of models allows for better data understanding and can be used by an expert for further analysis. The feature selection process is divided into two parts: i) the classification tree is used for finding the most relevant positions in amino acid sequences, ii) then the contrast pattern kernel is applied for pattern selection. This work summarizes the research made on classification of short amino acid sequences. The results of the research allowed us to propose a general scheme of amino acid sequence analysis.

  1. Challenges for Better thesis supervision.

    Science.gov (United States)

    Ghadirian, Laleh; Sayarifard, Azadeh; Majdzadeh, Reza; Rajabi, Fatemeh; Yunesian, Masoud

    2014-01-01

    Conduction of thesis by the students is one of their major academic activities. Thesis quality and acquired experiences are highly dependent on the supervision. Our study is aimed at identifing the challenges in thesis supervision from both students and faculty members point of view. This study was conducted using individual in-depth interviews and Focus Group Discussions (FGD). The participants were 43 students and faculty members selected by purposive sampling. It was carried out in Tehran University of Medical Sciences in 2012. Data analysis was done concurrently with data gathering using content analysis method. Our data analysis resulted in 162 codes, 17 subcategories and 4 major categories, "supervisory knowledge and skills", "atmosphere", "bylaws and regulations relating to supervision" and "monitoring and evaluation". This study showed that more attention and planning in needed for modifying related rules and regulations, qualitative and quantitative improvement in mentorship training, research atmosphere improvement and effective monitoring and evaluation in supervisory area.

  2. How can selection of biologically inspired features improve the performance of a robust object recognition model?

    Directory of Open Access Journals (Sweden)

    Masoud Ghodrati

    Full Text Available Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

  3. How can selection of biologically inspired features improve the performance of a robust object recognition model?

    Science.gov (United States)

    Ghodrati, Masoud; Khaligh-Razavi, Seyed-Mahdi; Ebrahimpour, Reza; Rajaei, Karim; Pooyan, Mohammad

    2012-01-01

    Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

  4. Efficient feature selection and multiclass classification with integrated instance and model based learning.

    Science.gov (United States)

    Liu, Zhenqiu; Bensmail, Halima; Tan, Ming

    2012-01-01

    Multiclass classification and feature (variable) selections are commonly encountered in many biological and medical applications. However, extending binary classification approaches to multiclass problems is not trivial. Instance-based methods such as the K nearest neighbor (KNN) can naturally extend to multiclass problems and usually perform well with unbalanced data, but suffer from the curse of dimensionality. Their performance is degraded when applied to high dimensional data. On the other hand, model-based methods such as logistic regression require the decomposition of the multiclass problem into several binary problems with one-vs.-one or one-vs.-rest schemes. Even though they can be applied to high dimensional data with L(1) or L(p) penalized methods, such approaches can only select independent features and the features selected with different binary problems are usually different. They also produce unbalanced classification problems with one vs. the rest scheme even if the original multiclass problem is balanced.By combining instance-based and model-based learning, we propose an efficient learning method with integrated KNN and constrained logistic regression (KNNLog) for simultaneous multiclass classification and feature selection. Our proposed method simultaneously minimizes the intra-class distance and maximizes the interclass distance with fewer estimated parameters. It is very efficient for problems with small sample size and unbalanced classes, a case common in many real applications. In addition, our model-based feature selection methods can identify highly correlated features simultaneously avoiding the multiplicity problem due to multiple tests. The proposed method is evaluated with simulation and real data including one unbalanced microRNA dataset for leukemia and one multiclass metagenomic dataset from the Human Microbiome Project (HMP). It performs well with limited computational experiments.

  5. Supervision as Metaphor

    Science.gov (United States)

    Lee, Alison; Green, Bill

    2009-01-01

    This article takes up the question of the language within which discussion of research degree supervision is couched and framed, and the consequences of such framings for supervision as a field of pedagogical practice. It examines the proliferation and intensity of metaphor, allegory and allusion in the language of candidature and supervision,…

  6. A Supervision of Solidarity

    Science.gov (United States)

    Reynolds, Vikki

    2010-01-01

    This article illustrates an approach to therapeutic supervision informed by a philosophy of solidarity and social justice activism. Called a "Supervision of Solidarity", this approach addresses the particular challenges in the supervision of therapists who work alongside clients who are subjected to social injustice and extreme marginalization. It…

  7. Feature selection and multi-kernel learning for adaptive graph regularized nonnegative matrix factorization

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-09-20

    Nonnegative matrix factorization (NMF), a popular part-based representation technique, does not capture the intrinsic local geometric structure of the data space. Graph regularized NMF (GNMF) was recently proposed to avoid this limitation by regularizing NMF with a nearest neighbor graph constructed from the input data set. However, GNMF has two main bottlenecks. First, using the original feature space directly to construct the graph is not necessarily optimal because of the noisy and irrelevant features and nonlinear distributions of data samples. Second, one possible way to handle the nonlinear distribution of data samples is by kernel embedding. However, it is often difficult to choose the most suitable kernel. To solve these bottlenecks, we propose two novel graph-regularized NMF methods, AGNMFFS and AGNMFMK, by introducing feature selection and multiple-kernel learning to the graph regularized NMF, respectively. Instead of using a fixed graph as in GNMF, the two proposed methods learn the nearest neighbor graph that is adaptive to the selected features and learned multiple kernels, respectively. For each method, we propose a unified objective function to conduct feature selection/multi-kernel learning, NMF and adaptive graph regularization simultaneously. We further develop two iterative algorithms to solve the two optimization problems. Experimental results on two challenging pattern classification tasks demonstrate that the proposed methods significantly outperform state-of-the-art data representation methods.

  8. Research into a Feature Selection Method for Hyperspectral Imagery Using PSO and SVM

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Classification and recognition of hyperspectral remote sensing images is not the same as that of conventional multi-spectral remote sensing images.We propose, a novel feature selection and classification method for hyperspectral images by combining the global optimization ability of particle swarm optimization (PSO) algorithm and the superior classification performance of a support vector machine (SVM).Global optimal search performance of PSO is improved by using a chaotic optimization search technique.Granularity based grid search strategy is used to optimize the SVM model parameters.Parameter optimization and classification of the SVM are addressed using the training date corresponding to the feature subset.A false classification rate is adopted as a fitness function.Tests of feature selection and classification are carried out on a hyperspectral data set.Classification performances are also compared among different feature extraction methods commonly used today.Results indicate that this hybrid method has a higher classification accuracy and can effectively extract optimal bands.A feasible approach is provided for feature selection and classification of hyperspectral image data.

  9. Feature selection and multi-kernel learning for sparse representation on a manifold.

    Science.gov (United States)

    Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao et al. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods.

  10. Feature selection and multi-kernel learning for sparse representation on a manifold

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao etal. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods. © 2013 Elsevier Ltd.

  11. A hybrid feature selection approach for the early diagnosis of Alzheimer’s disease

    Science.gov (United States)

    Gallego-Jutglà, Esteve; Solé-Casals, Jordi; Vialatte, François-Benoît; Elgendi, Mohamed; Cichocki, Andrzej; Dauwels, Justin

    2015-02-01

    Objective. Recently, significant advances have been made in the early diagnosis of Alzheimer’s disease (AD) from electroencephalography (EEG). However, choosing suitable measures is a challenging task. Among other measures, frequency relative power (RP) and loss of complexity have been used with promising results. In the present study we investigate the early diagnosis of AD using synchrony measures and frequency RP on EEG signals, examining the changes found in different frequency ranges. Approach. We first explore the use of a single feature for computing the classification rate (CR), looking for the best frequency range. Then, we present a multiple feature classification system that outperforms all previous results using a feature selection strategy. These two approaches are tested in two different databases, one containing mild cognitive impairment (MCI) and healthy subjects (patients age: 71.9 ± 10.2, healthy subjects age: 71.7 ± 8.3), and the other containing Mild AD and healthy subjects (patients age: 77.6 ± 10.0 healthy subjects age: 69.4 ± 11.5). Main results. Using a single feature to compute CRs we achieve a performance of 78.33% for the MCI data set and of 97.56% for Mild AD. Results are clearly improved using the multiple feature classification, where a CR of 95% is found for the MCI data set using 11 features, and 100% for the Mild AD data set using four features. Significance. The new features selection method described in this work may be a reliable tool that could help to design a realistic system that does not require prior knowledge of a patient's status. With that aim, we explore the standardization of features for MCI and Mild AD data sets with promising results.

  12. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    Directory of Open Access Journals (Sweden)

    Hossam M Zawbaa

    Full Text Available Poly-lactide-co-glycolide (PLGA is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP, multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR. The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  13. Feature Selection Strategy for Classification of Single-Trial EEG Elicited by Motor Imagery

    DEFF Research Database (Denmark)

    Prasad, Swati; Tan, Zheng-Hua; Prasad, Ramjee

    2011-01-01

    Brain-Computer Interface (BCI) provides new means of communication for people with motor disabilities by utilizing electroencephalographic activity. Selection of features from Electroencephalogram (EEG) signals for classification plays a key part in the development of BCI systems. In this paper, we...

  14. The Influence of Selected Personality and Workplace Features on Burnout among Nurse Academics

    Science.gov (United States)

    Kizilci, Sevgi; Erdogan, Vesile; Sozen, Emine

    2012-01-01

    This study aimed to determine the influence of selected individual and situational features on burnout among nurse academics. The Maslach Burnout Inventory was used to assess the burnout levels of academics. The sample population comprised 94 female participant. The emotion exhaustion (EE) score of the nurse academics was 16.43[plus or minus]5.97,…

  15. The Use of Self Organizing Map Method and Feature Selection in Image Database Classification System

    CERN Document Server

    Pratiwi, Dian

    2012-01-01

    This paper presents a technique in classifying the images into a number of classes or clusters desired by means of Self Organizing Map (SOM) Artificial Neural Network method. A number of 250 color images to be classified as previously done some processing, such as RGB to grayscale color conversion, color histogram, feature vector selection, and then classifying by the SOM Feature vector selection in this paper will use two methods, namely by PCA (Principal Component Analysis) and LSA (Latent Semantic Analysis) in which each of these methods would have taken the characteristic vector of 50, 100, and 150 from 256 initial feature vector into the process of color histogram. Then the selection will be processed into the SOM network to be classified into five classes using a learning rate of 0.5 and calculated accuracy. Classification of some of the test results showed that the highest percentage of accuracy obtained when using PCA and the selection of 100 feature vector that is equal to 88%, compared to when using...

  16. Attentional spreading to task-irrelevant object features: experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation.

    Science.gov (United States)

    Wegener, Detlef; Galashan, Fingal Orlando; Aurich, Maike Kathrin; Kreiter, Andreas Kurt

    2014-01-01

    Directing attention to a specific feature of an object has been linked to different forms of attentional modulation. Object-based attention theory founds on the finding that even task-irrelevant features at the selected object are subject to attentional modulation, while feature-based attention theory proposes a global processing benefit for the selected feature even at other objects. Most studies investigated either the one or the other form of attention, leaving open the possibility that both object- and feature-specific attentional effects do occur at the same time and may just represent two sides of a single attention system. We here investigate this issue by testing attentional spreading within and across objects, using reaction time (RT) measurements to changes of attended and unattended features on both attended and unattended objects. We asked subjects to report color and speed changes occurring on one of two overlapping random dot patterns (RDPs), presented at the center of gaze. The key property of the stimulation was that only one of the features (e.g., motion direction) was unique for each object, whereas the other feature (e.g., color) was shared by both. The results of two experiments show that co-selection of unattended features even occurs when those features have no means for selecting the object. At the same time, they demonstrate that this processing benefit is not restricted to the selected object but spreads to the task-irrelevant one. We conceptualize these findings by a 3-step model of attention that assumes a task-dependent top-down gain, object-specific feature selection based on task- and binding characteristics, and a global feature-specific processing enhancement. The model allows for the unification of a vast amount of experimental results into a single model, and makes various experimentally testable predictions for the interaction of object- and feature-specific processes.

  17. Attentional spreading to task-irrelevant object features: Experimental support and a 3-step model of attention for object-based selection and feature-based processing modulation

    Directory of Open Access Journals (Sweden)

    Detlef eWegener

    2014-06-01

    Full Text Available Directing attention to a specific feature of an object has been linked to different forms of attentional modulation. Object-based attention theory founds on the finding that even task-irrelevant features at the selected object are subject to attentional modulation, while feature-based attention theory proposes a global processing benefit for the selected feature even at other objects. Most studies investigated either the one or the other form of attention, leaving open the possibility that both object- and feature-specific attentional effects do occur at the same time and may just represent two sides of a single attention system. We here investigate this issue by testing attentional spreading within and across objects, using reaction time measurements to changes of attended and unattended features on both attended and unattended objects. We asked subjects to report color and speed changes occurring on one of two overlapping random dot patterns, presented at the center of gaze. The key property of the stimulation was that only one of the features (e.g. motion direction was unique for each object, whereas the other feature (e.g. color was shared by both. The results of two experiments show that co-selection of unattended features even occurs when those features have no means for selecting the object. At the same time, they demonstrate that this processing benefit is not restricted to the selected object but spreads to the task-irrelevant one. We conceptualize these findings by a 3-step model of attention that assumes a task-dependent top-down gain, object-specific feature selection based on task- and binding characteristics, and a global feature-specific processing enhancement. The model allows for the unification of a vast amount of experimental results into a single model, and makes various experimentally testable predictions for the interaction of object- and feature-specific processes.

  18. Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier.

    Science.gov (United States)

    Paul, Desbordes; Su, Ruan; Romain, Modzelewski; Sébastien, Vauclin; Pierre, Vera; Isabelle, Gardin

    2016-12-28

    The outcome prediction of patients can greatly help to personalize cancer treatment. A large amount of quantitative features (clinical exams, imaging, …) are potentially useful to assess the patient outcome. The challenge is to choose the most predictive subset of features. In this paper, we propose a new feature selection strategy called GARF (genetic algorithm based on random forest) extracted from positron emission tomography (PET) images and clinical data. The most relevant features, predictive of the therapeutic response or which are prognoses of the patient survival 3 years after the end of treatment, were selected using GARF on a cohort of 65 patients with a local advanced oesophageal cancer eligible for chemo-radiation therapy. The most relevant predictive results were obtained with a subset of 9 features leading to a random forest misclassification rate of 18±4% and an areas under the of receiver operating characteristic (ROC) curves (AUC) of 0.823±0.032. The most relevant prognostic results were obtained with 8 features leading to an error rate of 20±7% and an AUC of 0.750±0.108. Both predictive and prognostic results show better performances using GARF than using 4 other studied methods.

  19. Feature Selection for Better Identification of Subtypes of Guillain-Barré Syndrome

    Directory of Open Access Journals (Sweden)

    José Hernández-Torruco

    2014-01-01

    Full Text Available Guillain-Barré syndrome (GBS is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS, chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.

  20. A survey on filter techniques for feature selection in gene expression microarray analysis.

    Science.gov (United States)

    Lazar, Cosmin; Taminau, Jonatan; Meganck, Stijn; Steenhoff, David; Coletta, Alain; Molter, Colin; de Schaetzen, Virginie; Duque, Robin; Bersini, Hugues; Nowé, Ann

    2012-01-01

    A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.

  1. Improving the performance of the ripper in insurance risk classification : A comparitive study using feature selection

    CERN Document Server

    Duma, Mlungisi; Marwala, Tshilidzi

    2011-01-01

    The Ripper algorithm is designed to generate rule sets for large datasets with many features. However, it was shown that the algorithm struggles with classification performance in the presence of missing data. The algorithm struggles to classify instances when the quality of the data deteriorates as a result of increasing missing data. In this paper, a feature selection technique is used to help improve the classification performance of the Ripper model. Principal component analysis and evidence automatic relevance determination techniques are used to improve the performance. A comparison is done to see which technique helps the algorithm improve the most. Training datasets with completely observable data were used to construct the model and testing datasets with missing values were used for measuring accuracy. The results showed that principal component analysis is a better feature selection for the Ripper in improving the classification performance.

  2. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    Science.gov (United States)

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM.

  3. Feature-based and spatial attentional selection in visual working memory.

    Science.gov (United States)

    Heuer, Anna; Schubö, Anna

    2016-05-01

    The contents of visual working memory (VWM) can be modulated by spatial cues presented during the maintenance interval ("retrocues"). Here, we examined whether attentional selection of representations in VWM can also be based on features. In addition, we investigated whether the mechanisms of feature-based and spatial attention in VWM differ with respect to parallel access to noncontiguous locations. In two experiments, we tested the efficacy of valid retrocues relying on different kinds of information. Specifically, participants were presented with a typical spatial retrocue pointing to two locations, a symbolic spatial retrocue (numbers mapping onto two locations), and two feature-based retrocues: a color retrocue (a blob of the same color as two of the items) and a shape retrocue (an outline of the shape of two of the items). The two cued items were presented at either contiguous or noncontiguous locations. Overall retrocueing benefits, as compared to a neutral condition, were observed for all retrocue types. Whereas feature-based retrocues yielded benefits for cued items presented at both contiguous and noncontiguous locations, spatial retrocues were only effective when the cued items had been presented at contiguous locations. These findings demonstrate that attentional selection and updating in VWM can operate on different kinds of information, allowing for a flexible and efficient use of this limited system. The observation that the representations of items presented at noncontiguous locations could only be reliably selected with feature-based retrocues suggests that feature-based and spatial attentional selection in VWM rely on different mechanisms, as has been shown for attentional orienting in the external world.

  4. Good supervision and PBL

    DEFF Research Database (Denmark)

    Otrel-Cass, Kathrin

    This field study was conducted at the Faculty of Social Sciences at Aalborg University with the intention to investigate how students reflect on their experiences with supervision in a PBL environment. The overall aim of this study was to inform about the continued work in strengthening supervision...... at this faculty. This particular study invited Master level students to discuss: • How a typical supervision process proceeds • How they experienced and what they expected of PBL in the supervision process • What makes a good supervision process...

  5. Cuckoo search optimisation for feature selection in cancer classification: a new approach.

    Science.gov (United States)

    Gunavathi, C; Premalatha, K

    2015-01-01

    Cuckoo Search (CS) optimisation algorithm is used for feature selection in cancer classification using microarray gene expression data. Since the gene expression data has thousands of genes and a small number of samples, feature selection methods can be used for the selection of informative genes to improve the classification accuracy. Initially, the genes are ranked based on T-statistics, Signal-to-Noise Ratio (SNR) and F-statistics values. The CS is used to find the informative genes from the top-m ranked genes. The classification accuracy of k-Nearest Neighbour (kNN) technique is used as the fitness function for CS. The proposed method is experimented and analysed with ten different cancer gene expression datasets. The results show that the CS gives 100% average accuracy for DLBCL Harvard, Lung Michigan, Ovarian Cancer, AML-ALL and Lung Harvard2 datasets and it outperforms the existing techniques in DLBCL outcome and prostate datasets.

  6. A new ensemble feature selection and its application to pattern classification

    Institute of Scientific and Technical Information of China (English)

    Dongbo ZHANG; Yaonan WANG

    2009-01-01

    Neural network ensemble based on rough sets reduct is proposed to decrease the computational complexity of conventional ensemble feature selection algorithm. First, a dynamic reduction technology combining genetic algorithm with resampling method is adopted to obtain reducts with good generalization ability. Second, Multiple BP neural networks based on different reducts are built as base classifiers. According to the idea of selective ensemble, the neural network ensemble with best generalization ability can be found by search strategies. Finally, classification based on neural network ensemble is implemented by combining the predictions of component networks with voting. The method has been verified in the experiment of remote sensing image and five UCI datasets classification. Compared with conventional ensemble feature selection algorithms, it costs less time and lower computing complexity, and the classification accuracy is satisfactory.

  7. Low-Complexity Discriminative Feature Selection From EEG Before and After Short-Term Memory Task.

    Science.gov (United States)

    Behzadfar, Neda; Firoozabadi, S Mohammad P; Badie, Kambiz

    2016-10-01

    A reliable and unobtrusive quantification of changes in cortical activity during short-term memory task can be used to evaluate the efficacy of interfaces and to provide real-time user-state information. In this article, we investigate changes in electroencephalogram signals in short-term memory with respect to the baseline activity. The electroencephalogram signals have been analyzed using 9 linear and nonlinear/dynamic measures. We applied statistical Wilcoxon examination and Davis-Bouldian criterion to select optimal discriminative features. The results show that among the features, the permutation entropy significantly increased in frontal lobe and the occipital second lower alpha band activity decreased during memory task. These 2 features reflect the same mental task; however, their correlation with memory task varies in different intervals. In conclusion, it is suggested that the combination of the 2 features would improve the performance of memory based neurofeedback systems. © EEG and Clinical Neuroscience Society (ECNS) 2016.

  8. Feature selection and definition for contours classification of thermograms in breast cancer detection

    Science.gov (United States)

    Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał; Oleszkiewicz, Witold; Cichosz, Paweł

    2016-09-01

    This contribution introduces the method of cancer pathologies detection on breast skin temperature distribution images. The use of thermosensitive foils applied to the breasts skin allows to create thermograms, which displays the amount of infrared energy emitted by all breast cells. The significant foci of hyperthermia or inflammation are typical for cancer cells. That foci can be recognized on thermograms as a contours, which are the areas of higher temperature. Every contour can be converted to a feature set that describe it, using the raw, central, Hu, outline, Fourier and colour moments of image pixels processing. This paper defines also the new way of describing a set of contours through theirs neighbourhood relations. Contribution introduces moreover the way of ranking and selecting most relevant features. Authors used Neural Network with Gevrey`s concept and recursive feature elimination, to estimate feature importance.

  9. Sequential feature selection for detecting buried objects using forward looking ground penetrating radar

    Science.gov (United States)

    Shaw, Darren; Stone, Kevin; Ho, K. C.; Keller, James M.; Luke, Robert H.; Burns, Brian P.

    2016-05-01

    Forward looking ground penetrating radar (FLGPR) has the benefit of detecting objects at a significant standoff distance. The FLGPR signal is radiated over a large surface area and the radar signal return is often weak. Improving detection, especially for buried in road targets, while maintaining an acceptable false alarm rate remains to be a challenging task. Various kinds of features have been developed over the years to increase the FLGPR detection performance. This paper focuses on investigating the use of as many features as possible for detecting buried targets and uses the sequential feature selection technique to automatically choose the features that contribute most for improving performance. Experimental results using data collected at a government test site are presented.

  10. Improved face representation by nonuniform multilevel selection of Gabor convolution features.

    Science.gov (United States)

    Du, Shan; Ward, Rabab Kreidieh

    2009-12-01

    Gabor wavelets are widely employed in face representation to decompose face images into their spatial-frequency domains. The Gabor wavelet transform, however, introduces very high dimensional data. To reduce this dimensionality, uniform sampling of Gabor features has traditionally been used. Since uniform sampling equally treats all the features, it can lead to a loss of important features while retaining trivial ones. In this paper, we propose a new face representation method that employs nonuniform multilevel selection of Gabor features. The proposed method is based on the local statistics of the Gabor features and is implemented using a coarse-to-fine hierarchical strategy. Gabor features that correspond to important face regions are automatically selected and sampled finer than other features. The nonuniformly extracted Gabor features are then classified using principal component analysis and/or linear discriminant analysis for the purpose of face recognition. To verify the effectiveness of the proposed method, experiments have been conducted on benchmark face image databases where the images vary in illumination, expression, pose, and scale. Compared with the methods that use the original gray-scale image with 4096-dimensional data and uniform sampling with 2560-dimensional data, the proposed method results in a significantly higher recognition rate, with a substantial lower dimension of around 700. The experimental results also show that the proposed method works well not only when multiple sample images are available for training but also when only one sample image is available for each person. The proposed face representation method has the advantages of low complexity, low dimensionality, and high discriminance.

  11. Multivariate Feature Selection for Predicting Scour-Related Bridge Damage using a Genetic Algorithm

    Science.gov (United States)

    Anderson, I.

    2015-12-01

    Scour and hydraulic damage are the most common cause of bridge failure, reported to be responsible for over 60% of bridge failure nationwide. Scour is a complex process, and is likely an epistatic function of both bridge and stream conditions that are both stationary and in dynamic flux. Bridge inspections, conducted regularly on bridges nationwide, rate bridge health assuming a static stream condition, and typically do not include dynamically changing geomorphological adjustments. The Vermont Agency of Natural Resources stream geomorphic assessment data could add value into the current bridge inspection and scour design. The 2011 bridge damage from Tropical Storm Irene served as a case study for feature selection to improve bridge scour damage prediction in extreme events. The bridge inspection (with over 200 features on more than 300 damaged and 2,000 non-damaged bridges), and the stream geomorphic assessment (with over 300 features on more than 5000 stream reaches) constitute "Big Data", and together have the potential to generate large numbers of combined features ("epistatic relationships") that might better predict scour-related bridge damage. The potential combined features pose significant computational challenges for traditional statistical techniques (e.g., multivariate logistic regression). This study uses a genetic algorithm to perform a search of the multivariate feature space to identify epistatic relationships that are indicative of bridge scour damage. The combined features identified could be used to improve bridge scour design, and to better monitor and rate bridge scour vulnerability.

  12. Feature Selection in Detection of Adverse Drug Reactions from the Health Improvement Network (THIN Database

    Directory of Open Access Journals (Sweden)

    Yihui Liu

    2015-02-01

    Full Text Available Adverse drug reaction (ADR is widely concerned for public health issue. ADRs are one of most common causes to withdraw some drugs from market. Prescription event monitoring (PEM is an important approach to detect the adverse drug reactions. The main problem to deal with this method is how to automatically extract the medical events or side effects from high-throughput medical events, which are collected from day to day clinical practice. In this study we propose a novel concept of feature matrix to detect the ADRs. Feature matrix, which is extracted from big medical data from The Health Improvement Network (THIN database, is created to characterize the medical events for the patients who take drugs. Feature matrix builds the foundation for the irregular and big medical data. Then feature selection methods are performed on feature matrix to detect the significant features. Finally the ADRs can be located based on the significant features. The experiments are carried out on three drugs: Atorvastatin, Alendronate, and Metoclopramide. Major side effects for each drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on computerized methods, further investigation is needed.

  13. Selection of Entropy Based Features for Automatic Analysis of Essential Tremor

    Directory of Open Access Journals (Sweden)

    Karmele López-de-Ipiña

    2016-05-01

    Full Text Available Biomedical systems produce biosignals that arise from interaction mechanisms. In a general form, those mechanisms occur across multiple scales, both spatial and temporal, and contain linear and non-linear information. In this framework, entropy measures are good candidates in order provide useful evidence about disorder in the system, lack of information in time-series and/or irregularity of the signals. The most common movement disorder is essential tremor (ET, which occurs 20 times more than Parkinson’s disease. Interestingly, about 50%–70% of the cases of ET have a genetic origin. One of the most used standard tests for clinical diagnosis of ET is Archimedes’ spiral drawing. This work focuses on the selection of non-linear biomarkers from such drawings and handwriting, and it is part of a wider cross study on the diagnosis of essential tremor, where our piece of research presents the selection of entropy features for early ET diagnosis. Classic entropy features are compared with features based on permutation entropy. Automatic analysis system settled on several Machine Learning paradigms is performed, while automatic features selection is implemented by means of ANOVA (analysis of variance test. The obtained results for early detection are promising and appear applicable to real environments.

  14. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data

    Directory of Open Access Journals (Sweden)

    Rabia Aziz

    2016-06-01

    Full Text Available Feature (gene selection and classification of microarray data are the two most interesting machine learning challenges. In the present work two existing feature selection/extraction algorithms, namely independent component analysis (ICA and fuzzy backward feature elimination (FBFE are used which is a new combination of selection/extraction. The main objective of this paper is to select the independent components of the DNA microarray data using FBFE to improve the performance of support vector machine (SVM and Naïve Bayes (NB classifier, while making the computational expenses affordable. To show the validity of the proposed method, it is applied to reduce the number of genes for five DNA microarray datasets namely; colon cancer, acute leukemia, prostate cancer, lung cancer II, and high-grade glioma. Now these datasets are then classified using SVM and NB classifiers. Experimental results on these five microarray datasets demonstrate that gene selected by proposed approach, effectively improve the performance of SVM and NB classifiers in terms of classification accuracy. We compare our proposed method with principal component analysis (PCA as a standard extraction algorithm and find that the proposed method can obtain better classification accuracy, using SVM and NB classifiers with a smaller number of selected genes than the PCA. The curve between the average error rate and number of genes with each dataset represents the selection of required number of genes for the highest accuracy with our proposed method for both the classifiers. ROC shows best subset of genes for both the classifier of different datasets with propose method.

  15. EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION PREDICTION

    Directory of Open Access Journals (Sweden)

    Noura AlNuaimi

    2015-11-01

    Full Text Available Large amount of heterogeneous medical data is generated every day in various healthcare organizations. Those data could derive insights for improving monitoring and care delivery in the Intensive Care Unit. Conversely, these data presents a challenge in reducing this amount of data without information loss. Dimension reduction is considered the most popular approach for reducing data size and also to reduce noise and redundancies in data. In this paper, we are investigate the effect of the average laboratory test value and number of total laboratory in predicting patient deterioration in the Intensive Care Unit, where we consider laboratory tests as features. Choosing a subset of features would mean choosing the most important lab tests to perform. Thus, our approach uses state-of-the-art feature selection to identify the most discriminative attributes, where we would have a better understanding of patient deterioration problem. If the number of tests can be reduced by identifying the most important tests, then we could also identify the redundant tests. By omitting the redundant tests, observation time could be reduced and early treatment could be provided to avoid the risk. Additionally, unnecessary monetary cost would be avoided. We apply our technique on the publicly available MIMIC-II database and show the effectiveness of the feature selection. We also provide a detailed analysis of the best features identified by our approach.

  16. BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    KAUST Repository

    Kandaswamy, Krishna Kumar

    2011-08-17

    Background: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.

  17. Characterization of computer network events through simultaneous feature selection and clustering of intrusion alerts

    Science.gov (United States)

    Chen, Siyue; Leung, Henry; Dondo, Maxwell

    2014-05-01

    As computer network security threats increase, many organizations implement multiple Network Intrusion Detection Systems (NIDS) to maximize the likelihood of intrusion detection and provide a comprehensive understanding of intrusion activities. However, NIDS trigger a massive number of alerts on a daily basis. This can be overwhelming for computer network security analysts since it is a slow and tedious process to manually analyse each alert produced. Thus, automated and intelligent clustering of alerts is important to reveal the structural correlation of events by grouping alerts with common features. As the nature of computer network attacks, and therefore alerts, is not known in advance, unsupervised alert clustering is a promising approach to achieve this goal. We propose a joint optimization technique for feature selection and clustering to aggregate similar alerts and to reduce the number of alerts that analysts have to handle individually. More precisely, each identified feature is assigned a binary value, which reflects the feature's saliency. This value is treated as a hidden variable and incorporated into a likelihood function for clustering. Since computing the optimal solution of the likelihood function directly is analytically intractable, we use the Expectation-Maximisation (EM) algorithm to iteratively update the hidden variable and use it to maximize the expected likelihood. Our empirical results, using a labelled Defense Advanced Research Projects Agency (DARPA) 2000 reference dataset, show that the proposed method gives better results than the EM clustering without feature selection in terms of the clustering accuracy.

  18. Selection of clinical features for pattern recognition applied to gait analysis.

    Science.gov (United States)

    Altilio, Rosa; Paoloni, Marco; Panella, Massimo

    2017-04-01

    This paper deals with the opportunity of extracting useful information from medical data retrieved directly from a stereophotogrammetric system applied to gait analysis. A feature selection method to exhaustively evaluate all the possible combinations of the gait parameters is presented, in order to find the best subset able to classify among diseased and healthy subjects. This procedure will be used for estimating the performance of widely used classification algorithms, whose performance has been ascertained in many real-world problems with respect to well-known classification benchmarks, both in terms of number of selected features and classification accuracy. Precisely, support vector machine, Naive Bayes and K nearest neighbor classifiers can obtain the lowest classification error, with an accuracy greater than 97 %. For the considered classification problem, the whole set of features will be proved to be redundant and it can be significantly pruned. Namely, groups of 3 or 5 features only are able to preserve high accuracy when the aim is to check the anomaly of a gait. The step length and the swing speed are the most informative features for the gait analysis, but also cadence and stride may add useful information for the movement evaluation.

  19. Ant-cuckoo colony optimization for feature selection in digital mammogram.

    Science.gov (United States)

    Jona, J B; Nagaveni, N

    2014-01-15

    Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.

  20. Feature Selection and Classifier Parameters Estimation for EEG Signals Peak Detection Using Particle Swarm Optimization

    Directory of Open Access Journals (Sweden)

    Asrul Adam

    2014-01-01

    Full Text Available Electroencephalogram (EEG signal peak detection is widely used in clinical applications. The peak point can be detected using several approaches, including time, frequency, time-frequency, and nonlinear domains depending on various peak features from several models. However, there is no study that provides the importance of every peak feature in contributing to a good and generalized model. In this study, feature selection and classifier parameters estimation based on particle swarm optimization (PSO are proposed as a framework for peak detection on EEG signals in time domain analysis. Two versions of PSO are used in the study: (1 standard PSO and (2 random asynchronous particle swarm optimization (RA-PSO. The proposed framework tries to find the best combination of all the available features that offers good peak detection and a high classification rate from the results in the conducted experiments. The evaluation results indicate that the accuracy of the peak detection can be improved up to 99.90% and 98.59% for training and testing, respectively, as compared to the framework without feature selection adaptation. Additionally, the proposed framework based on RA-PSO offers a better and reliable classification rate as compared to standard PSO as it produces low variance model.

  1. Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

    Directory of Open Access Journals (Sweden)

    Junbao Zheng

    2012-03-01

    Full Text Available Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor as well as its parallel channels (inner factor. The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.

  2. Feature Subset Selection for Hot Method Prediction using Genetic Algorithm wrapped with Support Vector Machines

    Directory of Open Access Journals (Sweden)

    S. Johnson

    2011-01-01

    Full Text Available Problem statement: All compilers have simple profiling-based heuristics to identify and predict program hot methods and also to make optimization decisions. The major challenge in the profile-based optimization is addressing the problem of overhead. The aim of this work is to perform feature subset selection using Genetic Algorithms (GA to improve and refine the machine learnt static hot method predictive technique and to compare the performance of the new models against the simple heuristics. Approach: The relevant features for training the predictive models are extracted from an initial set of randomly selected ninety static program features, with the help of the GA wrapped with the predictive model using the Support Vector Machine (SVM, a Machine Learning (ML algorithm. Results: The GA-generated feature subsets containing thirty and twenty nine features respectively for the two predictive models when tested on MiBench predict Long Running Hot Methods (LRHM and frequently called hot methods (FCHM with the respective accuracies of 71% and 80% achieving an increase of 19% and 22%. Further, inlining of the predicted LRHM and FCHM improve the program performance by 3% and 5% as against 4% and 6% with Low Level Virtual Machines (LLVM default heuristics. When intra-procedural optimizations (IPO are performed on the predicted hot methods, this system offers a performance improvement of 5% and 4% as against 0% and 3% by LLVM default heuristics on LRHM and FCHM respectively. However, we observe an improvement of 36% in certain individual programs. Conclusion: Overall, the results indicate that the GA wrapped with SVM derived feature reduction improves the hot method prediction accuracy and that the technique of hot method prediction based optimization is potentially useful in selective optimization.

  3. Feature Selection by Merging Sequential Bidirectional Search into Relevance Vector Machine in Condition Monitoring

    Institute of Scientific and Technical Information of China (English)

    ZHANG Kui; DONG Yu; BALL Andrew

    2015-01-01

    For more accurate fault detection and diagnosis, there is an increasing trend to use a large number of sensors and to collect data at high frequency. This inevitably produces large-scale data and causes difficulties in fault classification. Actually, the classification methods are simply intractable when applied to high-dimensional condition monitoring data. In order to solve the problem, engineers have to resort to complicated feature extraction methods to reduce the dimensionality of data. However, the features transformed by the methods cannot be understood by the engineers due to a loss of the original engineering meaning. In this paper, other forms of dimensionality reduction technique(feature selection methods) are employed to identify machinery condition, based only on frequency spectrum data. Feature selection methods are usually divided into three main types: filter, wrapper and embedded methods. Most studies are mainly focused on the first two types, whilst the development and application of the embedded feature selection methods are very limited. This paper attempts to explore a novel embedded method. The method is formed by merging a sequential bidirectional search algorithm into scale parameters tuning within a kernel function in the relevance vector machine. To demonstrate the potential for applying the method to machinery fault diagnosis, the method is implemented to rolling bearing experimental data. The results obtained by using the method are consistent with the theoretical interpretation, proving that this algorithm has important engineering significance in revealing the correlation between the faults and relevant frequency features. The proposed method is a theoretical extension of relevance vector machine, and provides an effective solution to detect the fault-related frequency components with high efficiency.

  4. Feature selection by merging sequential bidirectional search into relevance vector machine in condition monitoring

    Science.gov (United States)

    Zhang, Kui; Dong, Yu; Ball, Andrew

    2015-11-01

    For more accurate fault detection and diagnosis, there is an increasing trend to use a large number of sensors and to collect data at high frequency. This inevitably produces large-scale data and causes difficulties in fault classification. Actually, the classification methods are simply intractable when applied to high-dimensional condition monitoring data. In order to solve the problem, engineers have to resort to complicated feature extraction methods to reduce the dimensionality of data. However, the features transformed by the methods cannot be understood by the engineers due to a loss of the original engineering meaning. In this paper, other forms of dimensionality reduction technique(feature selection methods) are employed to identify machinery condition, based only on frequency spectrum data. Feature selection methods are usually divided into three main types: filter, wrapper and embedded methods. Most studies are mainly focused on the first two types, whilst the development and application of the embedded feature selection methods are very limited. This paper attempts to explore a novel embedded method. The method is formed by merging a sequential bidirectional search algorithm into scale parameters tuning within a kernel function in the relevance vector machine. To demonstrate the potential for applying the method to machinery fault diagnosis, the method is implemented to rolling bearing experimental data. The results obtained by using the method are consistent with the theoretical interpretation, proving that this algorithm has important engineering significance in revealing the correlation between the faults and relevant frequency features. The proposed method is a theoretical extension of relevance vector machine, and provides an effective solution to detect the fault-related frequency components with high efficiency.

  5. SU-E-T-214: Predicting Plan Quality from Patient Geometry: Feature Selection and Inference Modeling.

    Science.gov (United States)

    Ruan, D; Shao, W; DeMarco, J; Kupelian, P; Low, D

    2012-06-01

    To investigate and develop methods to infer treatment plan quality from the geometric features of PTV/OAR structures; to discover and identify features of high prognostic values. This study explores the prognostic utility of geometric features of two categories: (1) absolute geometry, characterizing the volumes of single structures (PTV, OARs); and (2) relative geometry, based on the minimal 3D distance and/or overlapping volume between pairs of structures. Using prostate as a pilot site, we developed inference models to 'predict' SBRT plan quality of DVH end points. We developed and assessed (1) a full linear regression model based on both absolute and relative geometric features, (2) a sparsity-penalized linear regression model, (3) a linear regression model based on absolute geometry features only; (4) a learning-based nonparametric model. Cross-validation was used for both selecting the parameter values as well as quantifying the inference performance. The best inference method for each of the DVH end points was identified to reveal the structural and prognostic differences among them. For linear regression, using sparsity-regularization discovered geometric features that were mostly absolute, demonstrating their dominant linear prognostic utility. However, introducing relative geometric features improved the plan quality prediction by 15% for all DVH end points. In contrast, nonparametric models had a heavier dependence on relative geometry features. While linear regression based on both features sets predicted OAR DVH points slightly better, the nonparametric method excelled in predicting PTV coverage and conformality. The inference result from this study provides an 'expectation' for the plan quality before the planning is to be performed, providing reference goals for the planner and a baseline for detecting abnormality. The use of relative geometry complements the absolute geometry with information on spatial configuration of the PTV/OAR structures of

  6. Context-dependent feature selection for landmine detection with ground-penetrating radar

    Science.gov (United States)

    Ratto, Christopher R.; Torrione, Peter A.; Collins, Leslie M.

    2009-05-01

    We present a novel method for improving landmine detection with ground-penetrating radar (GPR) by utilizing a priori knowledge of environmental conditions to facilitate algorithm training. The goal of Context-Dependent Feature Selection (CDFS) is to mitigate performance degradation caused by environmental factors. CDFS operates on GPR data by first identifying its environmental context, and then fuses the decisions of several classifiers trained on context-dependent subsets of features. CDFS was evaluated on GPR data collected at several distinct sites under a variety of weather conditions. Results show that using prior environmental knowledge in this fashion has the potential to improve landmine detection.

  7. Highly accurate SVM model with automatic feature selection for word sense disambiguation

    Institute of Scientific and Technical Information of China (English)

    王浩; 陈贵林; 吴连献

    2004-01-01

    A novel algorithm for word sense disambiguation(WSD) that is based on SVM model improved with automatic feature selection is introduced. This learning method employs rich contextual features to predict the proper senses for specific words. Experimental results show that this algorithm can achieve an execellent performance on the set of data released during the SENSEEVAL-2 competition. We present the results obtained and discuss the transplantation of this algorithm to other languages such as Chinese. Experimental results on Chinese corpus show that our algorithm achieves an accuracy of 70.0 % even with small training data.

  8. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning.

    Science.gov (United States)

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context

  9. A kernel-based multivariate feature selection method for microarray data classification.

    Directory of Open Access Journals (Sweden)

    Shiquan Sun

    Full Text Available High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, [Formula: see text]-nearest neighbor on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.

  10. A flexible mechanism of rule selection enables rapid feature-based reinforcement learning

    Directory of Open Access Journals (Sweden)

    Matthew eBalcarras

    2016-03-01

    Full Text Available Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or colour and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or colour. Two-thirds of subjects (n=22/32 exhibited behaviour that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behaviour of other subjects (n=10/32 was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioural rules by leveraging simple model-free reinforcement

  11. 双向自动分支界限特征选择算法%Bidirectional Automated Branch and Bound Algorithm for Feature Selection

    Institute of Scientific and Technical Information of China (English)

    杨胜; 施鹏飞

    2005-01-01

    Feature selection is a process where a minimal feature subset is selected from an original feature set according to a certain measure. In this paper, feature relevancy is defined by an Inconsistency rate. A bidirectional automated branch and bound algorithm is presented. It is a new complete search algorithm for feature selection, which performs feature deletion and feature addition in parallel.it is fit for feature selection.

  12. Empirical study of supervised gene screening

    Directory of Open Access Journals (Sweden)

    Ma Shuangge

    2006-12-01

    Full Text Available Abstract Background Microarray studies provide a way of linking variations of phenotypes with their genetic causations. Constructing predictive models using high dimensional microarray measurements usually consists of three steps: (1 unsupervised gene screening; (2 supervised gene screening; and (3 statistical model building. Supervised gene screening based on marginal gene ranking is commonly used to reduce the number of genes in the model building. Various simple statistics, such as t-statistic or signal to noise ratio, have been used to rank genes in the supervised screening. Despite of its extensive usage, statistical study of supervised gene screening remains scarce. Our study is partly motivated by the differences in gene discovery results caused by using different supervised gene screening methods. Results We investigate concordance and reproducibility of supervised gene screening based on eight commonly used marginal statistics. Concordance is assessed by the relative fractions of overlaps between top ranked genes screened using different marginal statistics. We propose a Bootstrap Reproducibility Index, which measures reproducibility of individual genes under the supervised screening. Empirical studies are based on four public microarray data. We consider the cases where the top 20%, 40% and 60% genes are screened. Conclusion From a gene discovery point of view, the effect of supervised gene screening based on different marginal statistics cannot be ignored. Empirical studies show that (1 genes passed different supervised screenings may be considerably different; (2 concordance may vary, depending on the underlying data structure and percentage of selected genes; (3 evaluated with the Bootstrap Reproducibility Index, genes passed supervised screenings are only moderately reproducible; and (4 concordance cannot be improved by supervised screening based on reproducibility.

  13. Inference for feature selection using the Lasso with high-dimensional data

    DEFF Research Database (Denmark)

    Brink-Jensen, Kasper; Ekstrøm, Claus Thorn

    2014-01-01

    Penalized regression models such as the Lasso have proved useful for variable selection in many fields - especially for situations with high-dimensional data where the numbers of predictors far exceeds the number of observations. These methods identify and rank variables of importance but do...... not generally provide any inference of the selected variables. Thus, the variables selected might be the "most important" but need not be significant. We propose a significance test for the selection found by the Lasso. We introduce a procedure that computes inference and p-values for features chosen...... by the Lasso. This method rephrases the null hypothesis and uses a randomization approach which ensures that the error rate is controlled even for small samples. We demonstrate the ability of the algorithm to compute $p$-values of the expected magnitude with simulated data using a multitude of scenarios...

  14. Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface

    Directory of Open Access Journals (Sweden)

    Michael H. Thaut

    2005-11-01

    Full Text Available Most EEG-based BCI systems make use of well-studied patterns of brain activity. However, those systems involve tasks that indirectly map to simple binary commands such as “yes” or “no” or require many weeks of biofeedback training. We hypothesized that signal processing and machine learning methods can be used to discriminate EEG in a direct “yes”/“no” BCI from a single session. Blind source separation (BSS and spectral transformations of the EEG produced a 180-dimensional feature space. We used a modified genetic algorithm (GA wrapped around a support vector machine (SVM classifier to search the space of feature subsets. The GA-based search found feature subsets that outperform full feature sets and random feature subsets. Also, BSS transformations of the EEG outperformed the original time series, particularly in conjunction with a subset search of both spaces. The results suggest that BSS and feature selection can be used to improve the performance of even a “direct,” single-session BCI.

  15. Feature selection using angle modulated simulated Kalman filter for peak classification of EEG signals.

    Science.gov (United States)

    Adam, Asrul; Ibrahim, Zuwairie; Mokhtar, Norrima; Shapiai, Mohd Ibrahim; Mubin, Marizan; Saad, Ismail

    2016-01-01

    In the existing electroencephalogram (EEG) signals peak classification research, the existing models, such as Dumpala, Acir, Liu, and Dingle peak models, employ different set of features. However, all these models may not be able to offer good performance for various applications and it is found to be problem dependent. Therefore, the objective of this study is to combine all the associated features from the existing models before selecting the best combination of features. A new optimization algorithm, namely as angle modulated simulated Kalman filter (AMSKF) will be employed as feature selector. Also, the neural network random weight method is utilized in the proposed AMSKF technique as a classifier. In the conducted experiment, 11,781 samples of peak candidate are employed in this study for the validation purpose. The samples are collected from three different peak event-related EEG signals of 30 healthy subjects; (1) single eye blink, (2) double eye blink, and (3) eye movement signals. The experimental results have shown that the proposed AMSKF feature selector is able to find the best combination of features and performs at par with the existing related studies of epileptic EEG events classification.

  16. Classification of features selected through Optimum Index Factor (OIF)for improving classification accuracy

    Institute of Scientific and Technical Information of China (English)

    Nilanchal Patel; Brijesh Kaushal

    2011-01-01

    The present investigation was performed to determine if the features selected through Optimum Index Factor (OIF) could provide improved classification accuracy of the various categories on the satellite images of the individual years as well as stacked images of two different years as compared to all the features considered together. Further, in order to determine if there occurs increase in the classification accuracy of the different categories with corresponding increase in the OIF values of the features extracted from both the individual years' and stacked images, we performed linear regression between the producer's accuracy (PA) of the various categories with the OIF values of the different combinations of the features. The investigations demonstrated that there occurs significant improvement in the PA of two impervious categories viz. moderate built-up and low density built-up determined from the classification of the bands and principal components associated with the highest OIF value as compared to all the bands and principal components for both the individual years' and stacked images respectively. Regression analyses exhibited positive trends between the regression coefficients and OIF values forthe various categories determined for the individual years' and stacked images respectively signifying the prevalence of direct relationship between the increase in the information content with corresponding increase in the OIF values. The research proved that features extracted through OIF from both the individual years' and stacked images are capable of providing significantly improved PA as compared to all the features pooled together.

  17. Supervised retinal biometrics in different lighting conditions.

    Science.gov (United States)

    Azemin, Mohd Zulfaezal Che; Kumar, Dinesh K; Sugavaneswaran, Lakshmi; Krishnan, Sridhar

    2011-01-01

    Retinal image has been considered for number of health and biometrics applications. However, the reliability of these has not been investigated thoroughly. The variation observed in retina scans taken at different times is attributable to differences in illumination and positioning of the camera. It causes some missing bifurcations and crossovers from the retinal vessels. Exhaustive selection of optimal parameters is needed to construct the best similarity metrics equation to overcome the incomplete landmarks. In this paper, we extracted multiple features from the retina scans and employs supervised classification to overcome the shortcomings of the current techniques. The experimental results of 60 retina scans with different lightning conditions demonstrate the efficacy of this technique. The results were compared with the existing methods.

  18. Hybrid Binary Imperialist Competition Algorithm and Tabu Search Approach for Feature Selection Using Gene Expression Data

    Science.gov (United States)

    Aorigele; Zeng, Weiming; Hong, Xiaomin

    2016-01-01

    Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323

  19. GalNAc-transferase specificity prediction based on feature selection method.

    Science.gov (United States)

    Lu, Lin; Niu, Bing; Zhao, Jun; Liu, Liang; Lu, Wen-Cong; Liu, Xiao-Jun; Li, Yi-Xue; Cai, Yu-Dong

    2009-02-01

    GalNAc-transferase can catalyze the biosynthesis of O-linked oligosaccharides. The specificity of GalNAc-transferase is composed of nine amino acid residues denoted by R4, R3, R2, R1, R0, R1', R2', R3', R4'. To predict whether the reducing monosaccharide will be covalently linked to the central residue R0(Ser or Thr), a new method based on feature selection has been proposed in our work. 277 nonapeptides from reference [Chou KC. A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase. Protein Sci 1995;4:1365-83] are chosen for training set. Each nonapeptide is represented by hundreds of amino acid properties collected by Amino Acid Index database (http://www.genome.jp/aaindex) and transformed into a numeric vector with 4554 features. The Maximum Relevance Minimum Redundancy (mRMR) method combining with Incremental Feature Selection (IFS) and Feature Forward Selection (FFS) are then applied for feature selection. Nearest Neighbor Algorithm (NNA) is used to build prediction models. The optimal model contains 54 features and its correct rate tested by Jackknife cross-validation test reaches 91.34%. Final feature analysis indicates that amino acid residues at position R3' play the most important role in the recognition of GalNAc-transferase specificity, which were confirmed by the experiments [Elhammer AP, Poorman RA, Brown E, Maggiora LL, Hoogerheide JG, Kezdy FJ. The specificity of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase as inferred from a database of in vivo substrates and from the in vitro glycosylation of proteins and peptides. J Biol Chem 1993;268:10029-38; O'Connell BC, Hagen FK, Tabak LA. The influence of flanking sequence on the O-glycosylation of threonine in vitro. J Biol Chem 1992;267:25010-8; Yoshida A, Suzuki M, Ikenaga H, Takeuchi M. Discovery of the shortest sequence motif for high level mucin-type O-glycosylation. J Biol Chem 1997;272:16884-8]. Our method can be used as a tool for predicting O

  20. 2-DE combined with two-layer feature selection accurately establishes the origin of oolong tea.

    Science.gov (United States)

    Chien, Han-Ju; Chu, Yen-Wei; Chen, Chi-Wei; Juang, Yu-Min; Chien, Min-Wei; Liu, Chih-Wei; Wu, Chia-Chang; Tzen, Jason T C; Lai, Chien-Chen

    2016-11-15

    Taiwan is known for its high quality oolong tea. Because of high consumer demand, some tea manufactures mix lower quality leaves with genuine Taiwan oolong tea in order to increase profits. Robust scientific methods are, therefore, needed to verify the origin and quality of tea leaves. In this study, we investigated whether two-dimensional gel electrophoresis (2-DE) and nanoscale liquid chromatography/tandem mass spectroscopy (nano-LC/MS/MS) coupled with a two-layer feature selection mechanism comprising information gain attribute evaluation (IGAE) and support vector machine feature selection (SVM-FS) are useful in identifying characteristic proteins that can be used as markers of the original source of oolong tea. Samples in this study included oolong tea leaves from 23 different sources. We found that our method had an accuracy of 95.5% in correctly identifying the origin of the leaves. Overall, our method is a novel approach for determining the origin of oolong tea leaves.

  1. Intelligent feature selection techniques for pattern classification of Lamb wave signals

    Energy Technology Data Exchange (ETDEWEB)

    Hinders, Mark K.; Miller, Corey A. [College of William and Mary, Department of Applied Science, Williamsburg, Virginia 23187-8795 (United States)

    2014-02-18

    Lamb wave interaction with flaws is a complex, three-dimensional phenomenon, which often frustrates signal interpretation schemes based on mode arrival time shifts predicted by dispersion curves. As the flaw severity increases, scattering and mode conversion effects will often dominate the time-domain signals, obscuring available information about flaws because multiple modes may arrive on top of each other. Even for idealized flaw geometries the scattering and mode conversion behavior of Lamb waves is very complex. Here, multi-mode Lamb waves in a metal plate are propagated across a rectangular flat-bottom hole in a sequence of pitch-catch measurements corresponding to the double crosshole tomography geometry. The flaw is sequentially deepened, with the Lamb wave measurements repeated at each flaw depth. Lamb wave tomography reconstructions are used to identify which waveforms have interacted with the flaw and thereby carry information about its depth. Multiple features are extracted from each of the Lamb wave signals using wavelets, which are then fed to statistical pattern classification algorithms that identify flaw severity. In order to achieve the highest classification accuracy, an optimal feature space is required but it’s never known a priori which features are going to be best. For structural health monitoring we make use of the fact that physical flaws, such as corrosion, will only increase over time. This allows us to identify feature vectors which are topologically well-behaved by requiring that sequential classes “line up” in feature vector space. An intelligent feature selection routine is illustrated that identifies favorable class distributions in multi-dimensional feature spaces using computational homology theory. Betti numbers and formal classification accuracies are calculated for each feature space subset to establish a correlation between the topology of the class distribution and the corresponding classification accuracy.

  2. Effective feature selection of clinical and genetic to predict warfarin dose using artificial neural network

    Directory of Open Access Journals (Sweden)

    Mohammad Karim Sohrabi

    2016-03-01

    Full Text Available Background: Warfarin is one of the most common oral anticoagulant, which role is to prevent the clots. The dose of this medicine is very important because changes can be dangerous for patients. Diagnosis is difficult for physicians because increase and decrease in use of warfarin is so dangerous for patients. Identifying the clinical and genetic features involved in determining dose could be useful to predict using data mining techniques. The aim of this paper is to provide a convenient way to select the clinical and genetic features to determine the dose of warfarin using artificial neural networks (ANN and evaluate it in order to predict the dose patients. Methods: This experimental study, was investigate from April to May 2014 on 552 patients in Tehran Heart Center Hospital (THC candidates for warfarin anticoagulant therapy within the international normalized ratio (INR therapeutic target. Factors affecting the dose include clinical characteristics and genetic extracted, and different methods of feature selection based on genetic algorithm and particle swarm optimization (PSO and evaluation function neural networks in MATLAB (MathWorks, MA, USA, were performed. Results: Between algorithms used, particle swarm optimization algorithm accuracy was more appropriate, for the mean square error (MSE, root mean square error (RMSE and mean absolute error (MAE were 0.0262, 0.1621 and 0.1164, respectively. Conclusion: In this article, the most important characteristics were identified using methods of feature selection and the stable dose had been predicted based on artificial neural networks. The output is acceptable and with less features, it is possible to achieve the prediction warfarin dose accurately. Since the prescribed dose for the patients is important, the output of the obtained model can be used as a decision support system.

  3. Supervised and Unsupervised Classification for Pattern Recognition Purposes

    Directory of Open Access Journals (Sweden)

    Catalina COCIANU

    2006-01-01

    Full Text Available A cluster analysis task has to identify the grouping trends of data, to decide on the sound clusters as well as to validate somehow the resulted structure. The identification of the grouping tendency existing in a data collection assumes the selection of a framework stated in terms of a mathematical model allowing to express the similarity degree between couples of particular objects, quasi-metrics expressing the similarity between an object an a cluster and between clusters, respectively. In supervised classification, we are provided with a collection of preclassified patterns, and the problem is to label a newly encountered pattern. Typically, the given training patterns are used to learn the descriptions of classes which in turn are used to label a new pattern. The final section of the paper presents a new methodology for supervised learning based on PCA. The classes are represented in the measurement/feature space by a continuous repartitions

  4. A New Method for Solving Supervised Data Classification Problems

    Directory of Open Access Journals (Sweden)

    Parvaneh Shabanzadeh

    2014-01-01

    Full Text Available Supervised data classification is one of the techniques used to extract nontrivial information from data. Classification is a widely used technique in various fields, including data mining, industry, medicine, science, and law. This paper considers a new algorithm for supervised data classification problems associated with the cluster analysis. The mathematical formulations for this algorithm are based on nonsmooth, nonconvex optimization. A new algorithm for solving this optimization problem is utilized. The new algorithm uses a derivative-free technique, with robustness and efficiency. To improve classification performance and efficiency in generating classification model, a new feature selection algorithm based on techniques of convex programming is suggested. Proposed methods are tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithms.

  5. Explore Interregional EEG Correlations Changed by Sport Training Using Feature Selection

    OpenAIRE

    2016-01-01

    This paper investigated the interregional correlation changed by sport training through electroencephalography (EEG) signals using the techniques of classification and feature selection. The EEG data are obtained from students with long-time professional sport training and normal students without sport training as baseline. Every channel of the 19-channel EEG signals is considered as a node in the brain network and Pearson Correlation Coefficients are calculated between every two nodes as the...

  6. Applications of feature selection. [development of classification algorithms for LANDSAT data

    Science.gov (United States)

    Guseman, L. F., Jr.

    1976-01-01

    The use of satellite-acquired (LANDSAT) multispectral scanner (MSS) data to conduct an inventory of some crop of economic interest such as wheat over a large geographical area is considered in relation to the development of accurate and efficient algorithms for data classification. The dimension of the measurement space and the computational load for a classification algorithm is increased by the use of multitemporal measurements. Feature selection/combination techniques used to reduce the dimensionality of the problem are described.

  7. Feature subset selection based on mahalanobis distance: a statistical rough set method

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    In order to select effective feature subsets for pattern classification, a novel statistics rough set method is presented based on generalized attribute reduction. Unlike classical reduction approaches, the objects in universe of discourse are signs of training sample sets and values of attributes are taken as statistical parameters. The binary relation and discernibility matrix for the reduction are induced by distance function. Furthermore, based on the monotony of the distance function defined by Mahalan...

  8. A combinational feature selection and ensemble neural network method for classification of gene expression data

    Directory of Open Access Journals (Sweden)

    Jiang Tianzi

    2004-09-01

    Full Text Available Abstract Background Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification. Results We validate our new method on several recent publicly available datasets both with predictive accuracy of testing samples and through cross validation. Compared with the best performance of other current methods, remarkably improved results can be obtained using our new strategy on a wide range of different datasets. Conclusions Thus, we conclude that our methods can obtain more information in microarray data to get more accurate classification and also can help to extract the latent marker genes of the diseases for better diagnosis and treatment.

  9. Feature selection of seismic waveforms for long period event detection at Cotopaxi Volcano

    Science.gov (United States)

    Lara-Cueva, R. A.; Benítez, D. S.; Carrera, E. V.; Ruiz, M.; Rojo-Álvarez, J. L.

    2016-04-01

    Volcano Early Warning Systems (VEWS) have become a research topic in order to preserve human lives and material losses. In this setting, event detection criteria based on classification using machine learning techniques have proven useful, and a number of systems have been proposed in the literature. However, to the best of our knowledge, no comprehensive and principled study has been conducted to compare the influence of the many different sets of possible features that have been used as input spaces in previous works. We present an automatic recognition system of volcano seismicity, by considering feature extraction, event classification, and subsequent event detection, in order to reduce the processing time as a first step towards a high reliability automatic detection system in real-time. We compiled and extracted a comprehensive set of temporal, moving average, spectral, and scale-domain features, for separating long period seismic events from background noise. We benchmarked two usual kinds of feature selection techniques, namely, filter (mutual information and statistical dependence) and embedded (cross-validation and pruning), each of them by using suitable and appropriate classification algorithms such as k Nearest Neighbors (k-NN) and Decision Trees (DT). We applied this approach to the seismicity presented at Cotopaxi Volcano in Ecuador during 2009 and 2010. The best results were obtained by using a 15 s segmentation window, feature matrix in the frequency domain, and DT classifier, yielding 99% of detection accuracy and sensitivity. Selected features and their interpretation were consistent among different input spaces, in simple terms of amplitude and spectral content. Our study provides the framework for an event detection system with high accuracy and reduced computational requirements.

  10. Microcanonical Annealing and Threshold Accepting for Parameter Determination and Feature Selection of Support Vector Machines

    Directory of Open Access Journals (Sweden)

    Seyyid Ahmed Medjahed

    2016-12-01

    Full Text Available Support vector machine (SVM is a popular classification technique with many diverse applications. Parameter determination and feature selection significantly influences the classification accuracy rate and the SVM model quality. This paper proposes two novel approaches based on: Microcanonical Annealing (MA-SVM and Threshold Accepting (TA-SVM to determine the optimal value parameter and the relevant features subset, without reducing SVM classification accuracy. In order to evaluate the performance of MA-SVM and TA-SVM, several public datasets are employed to compute the classification accuracy rate. The proposed approaches were tested in the context of medical diagnosis. Also, we tested the approaches on DNA microarray datasets used for cancer diagnosis. The results obtained by the MA-SVM and TA-SVM algorithms are shown to be superior and have given a good performance in the DNA microarray data sets which are characterized by the large number of features. Therefore, the MA-SVM and TA-SVM approaches are well suited for parameter determination and feature selection in SVM.

  11. An Enhanced Grey Wolf Optimization Based Feature Selection Wrapped Kernel Extreme Learning Machine for Medical Diagnosis

    Science.gov (United States)

    Li, Qiang; Zhao, Xuehua; Cai, ZhenNao; Tong, Changfei; Liu, Wenbin; Tian, Xin

    2017-01-01

    In this study, a new predictive framework is proposed by integrating an improved grey wolf optimization (IGWO) and kernel extreme learning machine (KELM), termed as IGWO-KELM, for medical diagnosis. The proposed IGWO feature selection approach is used for the purpose of finding the optimal feature subset for medical data. In the proposed approach, genetic algorithm (GA) was firstly adopted to generate the diversified initial positions, and then grey wolf optimization (GWO) was used to update the current positions of population in the discrete searching space, thus getting the optimal feature subset for the better classification purpose based on KELM. The proposed approach is compared against the original GA and GWO on the two common disease diagnosis problems in terms of a set of performance metrics, including classification accuracy, sensitivity, specificity, precision, G-mean, F-measure, and the size of selected features. The simulation results have proven the superiority of the proposed method over the other two competitive counterparts. PMID:28246543

  12. Self-adaptive MOEA feature selection for classification of bankruptcy prediction data.

    Science.gov (United States)

    Gaspar-Cunha, A; Recio, G; Costa, L; Estébanez, C

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.

  13. Oxygen Saturation and RR Intervals Feature Selection for Sleep Apnea Detection

    Directory of Open Access Journals (Sweden)

    Antonio G. Ravelo-García

    2015-05-01

    Full Text Available A diagnostic system for sleep apnea based on oxygen saturation and RR intervals obtained from the EKG (electrocardiogram is proposed with the goal to detect and quantify minute long segments of sleep with breathing pauses. We measured the discriminative capacity of combinations of features obtained from RR series and oximetry to evaluate improvements of the performance compared to oximetry-based features alone. Time and frequency domain variables derived from oxygen saturation (SpO2 as well as linear and non-linear variables describing the RR series have been explored in recordings from 70 patients with suspected sleep apnea. We applied forward feature selection in order to select a minimal set of variables that are able to locate patterns indicating respiratory pauses. Linear discriminant analysis (LDA was used to classify the presence of apnea during specific segments. The system will finally provide a global score indicating the presence of clinically significant apnea integrating the segment based apnea detection. LDA results in an accuracy of 87%; sensitivity of 76% and specificity of 91% (AUC = 0.90 with a global classification of 97% when only oxygen saturation is used. In case of additionally including features from the RR series; the system performance improves to an accuracy of 87%; sensitivity of 73% and specificity of 92% (AUC = 0.92, with a global classification rate of 100%.

  14. Feature selection and classification of multiparametric medical images using bagging and SVM

    Science.gov (United States)

    Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

    2008-03-01

    This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.

  15. On a Variational Model for Selective Image Segmentation of Features with Infinite Perimeter

    Institute of Scientific and Technical Information of China (English)

    Lavdie RADA; Ke CHEN

    2013-01-01

    Variational models provide reliable formulation for segmentation of features and their boundaries in an image,following the seminal work of Mumford-Shah (1989,Commun.Pure Appl.Math.) on dividing a general surface into piecewise smooth sub-surfaces.A central idea of models based on this work is to minimize the length of feature's boundaries (i.e.,(H)1 Hausdorff measure).However there exist problems with irregular and oscillatory object boundaries,where minimizing such a length is not appropriate,as noted by Barchiesi et al.(2010,SIAM J.Multiscale Model.Simu.) who proposed to miminize (L)2 Lebesgue measure of the γ-neighborhood of the boundaries.This paper presents a dual level set selective segmentation model based on Barchiesi et al.(2010) to automatically select a local feature instead of all global features.Our model uses two level set functions:a global level set which segments all boundaries,and the local level set which evolves and finds the boundary of the object closest to the geometric constraints.Using real life images with oscillatory boundaries,we show qualitative results demonstrating the effectiveness of the proposed method.

  16. Self-Adaptive MOEA Feature Selection for Classification of Bankruptcy Prediction Data

    Directory of Open Access Journals (Sweden)

    A. Gaspar-Cunha

    2014-01-01

    Full Text Available Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy. The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.

  17. Comparison of Genetic Algorithm, Particle Swarm Optimization and Biogeography-based Optimization for Feature Selection to Classify Clusters of Microcalcifications

    Science.gov (United States)

    Khehra, Baljit Singh; Pharwaha, Amar Partap Singh

    2016-06-01

    Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.

  18. Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

    Science.gov (United States)

    Laimighofer, Michael; Krumsiek, Jan; Theis, Fabian J.

    2016-01-01

    Abstract With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN. PMID:26894327

  19. Advances in feature selection methods for hyperspectral image processing in food industry applications: a review.

    Science.gov (United States)

    Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An

    2015-01-01

    There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.

  20. Kernel-based Joint Feature Selection and Max-Margin Classification for Early Diagnosis of Parkinson’s Disease

    Science.gov (United States)

    Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang

    2017-01-01

    Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods.

  1. Kernel-based Joint Feature Selection and Max-Margin Classification for Early Diagnosis of Parkinson’s Disease

    Science.gov (United States)

    Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang

    2017-01-01

    Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods. PMID:28120883

  2. Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection

    Directory of Open Access Journals (Sweden)

    Park Jinho

    2012-06-01

    Full Text Available Abstract Background Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome in electrocardiogram (ECG more accurately and automatically can prevent it from developing into a catastrophic disease. To this end, we propose a new method, which employs wavelets and simple feature selection. Methods For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in 90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method based on the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used for differentiating ST episodes from normal: 1 the area between QRS offset and T-peak points, 2 the normalized and signed sum from QRS offset to effective zero voltage point, and 3 the slope from QRS onset to offset point. We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiers to those features. Results We evaluated the algorithm by kernel density estimation (KDE and support vector machine (SVM methods. Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemic ST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively. The SVM classifier detects 355 ischemic ST episodes. Conclusions We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removing baseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and feature extraction from morphology of ECG waveforms explicitly. It was shown that the number of selected features were sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposed KDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require any numerical

  3. Mutual information-based feature selection for low-cost BCIs based on motor imagery.

    Science.gov (United States)

    Schiatti, L; Faes, L; Tessadori, J; Barresi, G; Mattos, L

    2016-08-01

    In the present study a feature selection algorithm based on mutual information (MI) was applied to electro-encephalographic (EEG) data acquired during three different motor imagery tasks from two dataset: Dataset I from BCI Competition IV including full scalp recordings from four subjects, and new data recorded from three subjects using the popular low-cost Emotiv EPOC EEG headset. The aim was to evaluate optimal channels and band-power (BP) features for motor imagery tasks discrimination, in order to assess the feasibility of a portable low-cost motor imagery based Brain-Computer Interface (BCI) system. The minimal sub set of features most relevant to task description and less redundant to each other was determined, and the corresponding classification accuracy was assessed offline employing linear support vector machine (SVM) in a 10-fold cross validation scheme. The analysis was performed: (a) on the original full Dataset I from BCI competition IV, (b) on a restricted channels set from Dataset I corresponding to available Emotiv EPOC electrodes locations, and (c) on data recorded with the EPOC system. Results from (a) showed that an offline classification accuracy above 80% can be reached using only 5 features. Limiting the analysis to EPOC channels caused a decrease of classification accuracy, although it still remained above chance level, both for data from (b) and (c). A top accuracy of 70% was achieved using 2 optimal features. These results encourage further research towards the development of portable low cost motor imagery-based BCI systems.

  4. Classifying human voices by using hybrid SFX time-series preprocessing and ensemble feature selection.

    Science.gov (United States)

    Fong, Simon; Lan, Kun; Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC.

  5. Suitable features selection for monitoring thermal condition of electrical equipment using infrared thermography

    Science.gov (United States)

    Huda, A. S. N.; Taib, S.

    2013-11-01

    Monitoring the thermal condition of electrical equipment is necessary for maintaining the reliability of electrical system. The degradation of electrical equipment can cause excessive overheating, which can lead to the eventual failure of the equipment. Additionally, failure of equipment requires a lot of maintenance cost, manpower and can also be catastrophic- causing injuries or even deaths. Therefore, the recognition processof equipment conditions as normal and defective is an essential step towards maintaining reliability and stability of the system. The study introduces infrared thermography based condition monitoring of electrical equipment. Manual analysis of thermal image for detecting defects and classifying the status of equipment take a lot of time, efforts and can also lead to incorrect diagnosis results. An intelligent system that can separate the equipment automatically could help to overcome these problems. This paper discusses an intelligent classification system for the conditions of equipment using neural networks. Three sets of features namely first order histogram based statistical, grey level co-occurrence matrix and component based intensity features are extracted by image analysis, which are used as input data for the neural networks. The multilayered perceptron networks are trained using four different training algorithms namely Resilient back propagation, Bayesian Regulazation, Levenberg-Marquardt and Scale conjugate gradient. The experimental results show that the component based intensity features perform better compared to other two sets of features. Finally, after selecting the best features, multilayered perceptron network trained using Levenberg-Marquardt algorithm achieved the best results to classify the conditions of electrical equipment.

  6. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis.

    Science.gov (United States)

    Ding, Hui; Feng, Peng-Mian; Chen, Wei; Lin, Hao

    2014-08-01

    The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells. Accurate identification of bacteriophage virion proteins is very important for understanding their functions and clarifying the lysis mechanism of bacterial cells. In this study, a new sequence-based method was developed to identify phage virion proteins. In the new method, the protein sequences were initially formulated by the g-gap dipeptide compositions. Subsequently, the analysis of variance (ANOVA) with incremental feature selection (IFS) was used to search for the optimal feature set. It was observed that, in jackknife cross-validation, the optimal feature set including 160 optimized features can produce the maximum accuracy of 85.02%. By performing feature analysis, we found that the correlation between two amino acids with one gap was more important than other correlations for phage virion protein prediction and that some of the 1-gap dipeptides were important and mainly contributed to the virion protein prediction. This analysis will provide novel insights into the function of phage virion proteins. On the basis of the proposed method, an online web-server, PVPred, was established and can be freely accessed from the website (http://lin.uestc.edu.cn/server/PVPred). We believe that the PVPred will become a powerful tool to study phage virion proteins and to guide the related experimental validations.

  7. Genetic Fuzzy System (GFS based wavelet co-occurrence feature selection in mammogram classification for breast cancer diagnosis

    Directory of Open Access Journals (Sweden)

    Meenakshi M. Pawar

    2016-09-01

    Full Text Available Breast cancer is significant health problem diagnosed mostly in women worldwide. Therefore, early detection of breast cancer is performed with the help of digital mammography, which can reduce mortality rate. This paper presents wrapper based feature selection approach for wavelet co-occurrence feature (WCF using Genetic Fuzzy System (GFS in mammogram classification problem. The performance of GFS algorithm is explained using mini-MIAS database. WCF features are obtained from detail wavelet coefficients at each level of decomposition of mammogram image. At first level of decomposition, 18 features are applied to GFS algorithm, which selects 5 features with an average classification success rate of 39.64%. Subsequently, at second level it selects 9 features from 36 features and the classification success rate is improved to 56.75%. For third level, 16 features are selected from 54 features and average success rate is improved to 64.98%. Lastly, at fourth level 72 features are applied to GFS, which selects 16 features and thereby increasing average success rate to 89.47%. Hence, GFS algorithm is the effective way of obtaining optimal set of feature in breast cancer diagnosis.

  8. Deep sparse multi-task learning for feature selection in Alzheimer's disease diagnosis.

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2016-06-01

    Recently, neuroimaging-based Alzheimer's disease (AD) or mild cognitive impairment (MCI) diagnosis has attracted researchers in the field, due to the increasing prevalence of the diseases. Unfortunately, the unfavorable high-dimensional nature of neuroimaging data, but a limited small number of samples available, makes it challenging to build a robust computer-aided diagnosis system. Machine learning techniques have been considered as a useful tool in this respect and, among various methods, sparse regression has shown its validity in the literature. However, to our best knowledge, the existing sparse regression methods mostly try to select features based on the optimal regression coefficients in one step. We argue that since the training feature vectors are composed of both informative and uninformative or less informative features, the resulting optimal regression coefficients are inevidently affected by the uninformative or less informative features. To this end, we first propose a novel deep architecture to recursively discard uninformative features by performing sparse multi-task learning in a hierarchical fashion. We further hypothesize that the optimal regression coefficients reflect the relative importance of features in representing the target response variables. In this regard, we use the optimal regression coefficients learned in one hierarchy as feature weighting factors in the following hierarchy, and formulate a weighted sparse multi-task learning method. Lastly, we also take into account the distributional characteristics of samples per class and use clustering-induced subclass label vectors as target response values in our sparse regression model. In our experiments on the ADNI cohort, we performed both binary and multi-class classification tasks in AD/MCI diagnosis and showed the superiority of the proposed method by comparing with the state-of-the-art methods.

  9. Effective automated feature construction and selection for classification of biological sequences.

    Directory of Open Access Journals (Sweden)

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  10. Evaluating EMG Feature and Classifier Selection for Application to Partial-Hand Prosthesis Control.

    Science.gov (United States)

    Adewuyi, Adenike A; Hargrove, Levi J; Kuiken, Todd A

    2016-01-01

    Pattern recognition-based myoelectric control of upper-limb prostheses has the potential to restore control of multiple degrees of freedom. Though this control method has been extensively studied in individuals with higher-level amputations, few studies have investigated its effectiveness for individuals with partial-hand amputations. Most partial-hand amputees retain a functional wrist and the ability of pattern recognition-based methods to correctly classify hand motions from different wrist positions is not well studied. In this study, focusing on partial-hand amputees, we evaluate (1) the performance of non-linear and linear pattern recognition algorithms and (2) the performance of optimal EMG feature subsets for classification of four hand motion classes in different wrist positions for 16 non-amputees and 4 amputees. Our results show that linear discriminant analysis and linear and non-linear artificial neural networks perform significantly better than the quadratic discriminant analysis for both non-amputees and partial-hand amputees. For amputees, including information from multiple wrist positions significantly decreased error (p < 0.001) but no further significant decrease in error occurred when more than 4, 2, or 3 positions were included for the extrinsic (p = 0.07), intrinsic (p = 0.06), or combined extrinsic and intrinsic muscle EMG (p = 0.08), respectively. Finally, we found that a feature set determined by selecting optimal features from each channel outperformed the commonly used time domain (p < 0.001) and time domain/autoregressive feature sets (p < 0.01). This method can be used as a screening filter to select the features from each channel that provide the best classification of hand postures across different wrist positions.

  11. Evaluating EMG Feature and Classifier Selection for Application to Partial-Hand Prosthesis Control

    Directory of Open Access Journals (Sweden)

    Adenike A. Adewuyi

    2016-10-01

    Full Text Available Pattern recognition-based myoelectric control of upper limb prostheses has the potential to restore control of multiple degrees of freedom. Though this control method has been extensively studied in individuals with higher-level amputations, few studies have investigated its effectiveness for individuals with partial-hand amputations. Most partial-hand amputees retain a functional wrist and the ability of pattern recognition-based methods to correctly classify hand motions from different wrist positions is not well studied. In this study, focusing on partial-hand amputees, we evaluate (1 the performance of non-linear and linear pattern recognition algorithms and (2 the performance of optimal EMG feature subsets for classification of four hand motion classes in different wrist positions for 16 non-amputees and 4 amputees. Our results show that linear discriminant analysis and linear and non-linear artificial neural networks perform significantly better than the quadratic discriminant analysis for both non-amputees and partial-hand amputees. For amputees, including information from multiple wrist positions significantly decreased error (p<0.001 but no further significant decrease in error occurred when more than 4, 2, or 3 positions were included for the extrinsic (p=0.07, intrinsic (p=0.06, or combined extrinsic and intrinsic muscle EMG (p=0.08, respectively. Finally, we found that a feature set determined by selecting optimal features from each channel outperformed the commonly used time domain (p<0.001 and time domain/autoregressive feature sets (p<0.01. This method can be used as a screening filter to select the features from each channel that provide the best classification of hand postures across different wrist positions.

  12. Evaluating EMG Feature and Classifier Selection for Application to Partial-Hand Prosthesis Control

    Science.gov (United States)

    Adewuyi, Adenike A.; Hargrove, Levi J.; Kuiken, Todd A.

    2016-01-01

    Pattern recognition-based myoelectric control of upper-limb prostheses has the potential to restore control of multiple degrees of freedom. Though this control method has been extensively studied in individuals with higher-level amputations, few studies have investigated its effectiveness for individuals with partial-hand amputations. Most partial-hand amputees retain a functional wrist and the ability of pattern recognition-based methods to correctly classify hand motions from different wrist positions is not well studied. In this study, focusing on partial-hand amputees, we evaluate (1) the performance of non-linear and linear pattern recognition algorithms and (2) the performance of optimal EMG feature subsets for classification of four hand motion classes in different wrist positions for 16 non-amputees and 4 amputees. Our results show that linear discriminant analysis and linear and non-linear artificial neural networks perform significantly better than the quadratic discriminant analysis for both non-amputees and partial-hand amputees. For amputees, including information from multiple wrist positions significantly decreased error (p < 0.001) but no further significant decrease in error occurred when more than 4, 2, or 3 positions were included for the extrinsic (p = 0.07), intrinsic (p = 0.06), or combined extrinsic and intrinsic muscle EMG (p = 0.08), respectively. Finally, we found that a feature set determined by selecting optimal features from each channel outperformed the commonly used time domain (p < 0.001) and time domain/autoregressive feature sets (p < 0.01). This method can be used as a screening filter to select the features from each channel that provide the best classification of hand postures across different wrist positions. PMID:27807418

  13. Genetic Particle Swarm Optimization-Based Feature Selection for Very-High-Resolution Remotely Sensed Imagery Object Change Detection.

    Science.gov (United States)

    Chen, Qiang; Chen, Yunhao; Jiang, Weiguo

    2016-07-30

    In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm.

  14. Localization of neural efficiency of the mathematically gifted brain through a feature subset selection method.

    Science.gov (United States)

    Zhang, Li; Gan, John Q; Wang, Haixian

    2015-10-01

    Based on the neural efficiency hypothesis and task-induced EEG gamma-band response (GBR), this study investigated the brain regions where neural resource could be most efficiently recruited by the math-gifted adolescents in response to varying cognitive demands. In this experiment, various GBR-based mental states were generated with three factors (level of mathematical ability, task complexity, and short-term learning) modulating the level of neural activation. A feature subset selection method based on the sequential forward floating search algorithm was used to identify an "optimal" combination of EEG channel locations, where the corresponding GBR feature subset could obtain the highest accuracy in discriminating pairwise mental states influenced by each experiment factor. The integrative results from multi-factor selections suggest that the right-lateral fronto-parietal system is highly involved in neural efficiency of the math-gifted brain, primarily including the bilateral superior frontal, right inferior frontal, right-lateral central and right temporal regions. By means of the localization method based on single-trial classification of mental states, new GBR features and EEG channel-based brain regions related to mathematical giftedness were identified, which could be useful for the brain function improvement of children/adolescents in mathematical learning through brain-computer interface systems.

  15. Context-dependent feature selection using unsupervised contexts applied to GPR-based landmine detection

    Science.gov (United States)

    Ratto, Christopher R.; Torrione, Peter A.; Collins, Leslie M.

    2010-04-01

    Context-dependent classification techniques applied to landmine detection with ground-penetrating radar (GPR) have demonstrated substantial performance improvements over conventional classification algorithms. Context-dependent algorithms compute a decision statistic by integrating over uncertainty in the unknown, but probabilistically inferable, context of the observation. When applied to GPR, contexts may be defined by differences in electromagnetic properties of the subsurface environment, which are due to discrepancies in soil composition, moisture levels, and surface texture. Context-dependent Feature Selection (CDFS) is a technique developed for selecting a unique subset of features for classifying landmines from clutter in different environmental contexts. In past work, context definitions were assumed to be soil moisture conditions which were known during training. However, knowledge of environmental conditions could be difficult to obtain in the field. In this paper, we utilize an unsupervised learning algorithm for defining contexts which are unknown a priori. Our method performs unsupervised context identification based on similarities in physics-based and statistical features that characterize the subsurface environment of the raw GPR data. Results indicate that utilizing this contextual information improves classification performance, and provides performance improvements over non-context-dependent approaches. Implications for on-line context identification will be suggested as a possible avenue for future work.

  16. Non-parametric Bayesian mixture of sparse regressions with application towards feature selection for statistical downscaling

    Directory of Open Access Journals (Sweden)

    D. Das

    2014-04-01

    Full Text Available Climate projections simulated by Global Climate Models (GCM are often used for assessing the impacts of climate change. However, the relatively coarse resolutions of GCM outputs often precludes their application towards accurately assessing the effects of climate change on finer regional scale phenomena. Downscaling of climate variables from coarser to finer regional scales using statistical methods are often performed for regional climate projections. Statistical downscaling (SD is based on the understanding that the regional climate is influenced by two factors – the large scale climatic state and the regional or local features. A transfer function approach of SD involves learning a regression model which relates these features (predictors to a climatic variable of interest (predictand based on the past observations. However, often a single regression model is not sufficient to describe complex dynamic relationships between the predictors and predictand. We focus on the covariate selection part of the transfer function approach and propose a nonparametric Bayesian mixture of sparse regression models based on Dirichlet Process (DP, for simultaneous clustering and discovery of covariates within the clusters while automatically finding the number of clusters. Sparse linear models are parsimonious and hence relatively more generalizable than non-sparse alternatives, and lends to domain relevant interpretation. Applications to synthetic data demonstrate the value of the new approach and preliminary results related to feature selection for statistical downscaling shows our method can lead to new insights.

  17. Feature selection for disruption prediction from scratch in JET by using genetic algorithms and probabilistic predictors

    Energy Technology Data Exchange (ETDEWEB)

    Pereira, Augusto, E-mail: augusto.pereira@ciemat.es [Laboratorio Nacional de Fusión, CIEMAT, Madrid (Spain); Vega, Jesús; Moreno, Raúl [Laboratorio Nacional de Fusión, CIEMAT, Madrid (Spain); Dormido-Canto, Sebastián [Dpto. Informática y Automática – UNED, Madrid (Spain); Rattá, Giuseppe A. [Laboratorio Nacional de Fusión, CIEMAT, Madrid (Spain); Pavón, Fernando [Dpto. Informática y Automática – UNED, Madrid (Spain)

    2015-10-15

    Recently, a probabilistic classifier has been developed at JET to be used as predictor from scratch. It has been applied to a database of 1237 JET ITER-like wall (ILW) discharges (of which 201 disrupted) with good results: success rate of 94% and false alarm rate of 4.21%. A combinatorial analysis between 14 features to ensure the selection of the best ones to achieve good enough results in terms of success rate and false alarm rate was performed. All possible combinations with a number of features between 2 and 7 were tested and 9893 different predictors were analyzed. An important drawback in this analysis was the time required to compute the results that can be estimated in 1731 h (∼2.4 months). Genetic algorithms (GA) are searching algorithms that simulate the process of natural selection. In this article, the GA and the Venn predictors are combined with the objective not only of finding good enough features within the 14 available ones but also of reducing the computational time requirements. Five different performance metrics as measures of the GA fitness function have been evaluated. The best metric was the measurement called Informedness, with just 6 generations (168 predictors at 29.4 h).

  18. Evaluation of feature selection algorithms for classification in temporal lobe epilepsy based on MR images

    Science.gov (United States)

    Lai, Chunren; Guo, Shengwen; Cheng, Lina; Wang, Wensheng; Wu, Kai

    2017-02-01

    It's very important to differentiate the temporal lobe epilepsy (TLE) patients from healthy people and localize the abnormal brain regions of the TLE patients. The cortical features and changes can reveal the unique anatomical patterns of brain regions from the structural MR images. In this study, structural MR images from 28 normal controls (NC), 18 left TLE (LTLE), and 21 right TLE (RTLE) were acquired, and four types of cortical feature, namely cortical thickness (CTh), cortical surface area (CSA), gray matter volume (GMV), and mean curvature (MCu), were explored for discriminative analysis. Three feature selection methods, the independent sample t-test filtering, the sparse-constrained dimensionality reduction model (SCDRM), and the support vector machine-recursive feature elimination (SVM-RFE), were investigated to extract dominant regions with significant differences among the compared groups for classification using the SVM classifier. The results showed that the SVM-REF achieved the highest performance (most classifications with more than 92% accuracy), followed by the SCDRM, and the t-test. Especially, the surface area and gray volume matter exhibited prominent discriminative ability, and the performance of the SVM was improved significantly when the four cortical features were combined. Additionally, the dominant regions with higher classification weights were mainly located in temporal and frontal lobe, including the inferior temporal, entorhinal cortex, fusiform, parahippocampal cortex, middle frontal and frontal pole. It was demonstrated that the cortical features provided effective information to determine the abnormal anatomical pattern and the proposed method has the potential to improve the clinical diagnosis of the TLE.

  19. A comprehensive analysis of earthquake damage patterns using high dimensional model representation feature selection

    Science.gov (United States)

    Taşkin Kaya, Gülşen

    2013-10-01

    Recently, earthquake damage assessment using satellite images has been a very popular ongoing research direction. Especially with the availability of very high resolution (VHR) satellite images, a quite detailed damage map based on building scale has been produced, and various studies have also been conducted in the literature. As the spatial resolution of satellite images increases, distinguishability of damage patterns becomes more cruel especially in case of using only the spectral information during classification. In order to overcome this difficulty, textural information needs to be involved to the classification to improve the visual quality and reliability of damage map. There are many kinds of textural information which can be derived from VHR satellite images depending on the algorithm used. However, extraction of textural information and evaluation of them have been generally a time consuming process especially for the large areas affected from the earthquake due to the size of VHR image. Therefore, in order to provide a quick damage map, the most useful features describing damage patterns needs to be known in advance as well as the redundant features. In this study, a very high resolution satellite image after Iran, Bam earthquake was used to identify the earthquake damage. Not only the spectral information, textural information was also used during the classification. For textural information, second order Haralick features were extracted from the panchromatic image for the area of interest using gray level co-occurrence matrix with different size of windows and directions. In addition to using spatial features in classification, the most useful features representing the damage characteristic were selected with a novel feature selection method based on high dimensional model representation (HDMR) giving sensitivity of each feature during classification. The method called HDMR was recently proposed as an efficient tool to capture the input

  20. Networks of Professional Supervision

    Science.gov (United States)

    Annan, Jean; Ryba, Ken

    2013-01-01

    An ecological analysis of the supervisory activity of 31 New Zealand school psychologists examined simultaneously the theories of school psychology, supervision practices, and the contextual qualities that mediated participants' supervisory actions. The findings indicated that the school psychologists worked to achieve the supervision goals of…

  1. Forskellighed i supervision

    DEFF Research Database (Denmark)

    Petersen, Birgitte; Beck, Emma

    2009-01-01

    Indtryk og tendenser fra den anden danske konference om supervision, som blev holdt på Københavns Universitet i oktober 2008......Indtryk og tendenser fra den anden danske konference om supervision, som blev holdt på Københavns Universitet i oktober 2008...

  2. Experiments in Virtual Supervision.

    Science.gov (United States)

    Walker, Rob

    This paper examines the use of First Class conferencing software to create a virtual culture among research students and as a vehicle for supervision and advising. Topics discussed include: computer-mediated communication and research; entry to cyberculture, i.e., research students' induction into the research community; supervision and the…

  3. Discharges Classification using Genetic Algorithms and Feature Selection Algorithms on Time and Frequency Domain Data Extracted from Leakage Current Measurements

    Directory of Open Access Journals (Sweden)

    D. Pylarinos

    2013-12-01

    Full Text Available A number of 387 discharge portraying waveforms recorded on 18 different 150 kV post insulators installed at two different Substations in Crete, Greece are considered in this paper. Twenty different features are extracted from each waveform and two feature selection algorithms (t-test and mRMR are employed. Genetic algorithms are used to classify waveforms in two different classes related to the portrayed discharges. Five different data sets are employed (1. the original feature vector, 2. time domain features, 3. frequency domain features, 4. t-test selected features 5. mRMR selected features. Results are discussed and compared with previous classification implementations on this particular data group.

  4. Research on Methods for Discovering and Selecting Cloud Infrastructure Services Based on Feature Modeling

    Directory of Open Access Journals (Sweden)

    Huamin Zhu

    2016-01-01

    Full Text Available Nowadays more and more cloud infrastructure service providers are providing large numbers of service instances which are a combination of diversified resources, such as computing, storage, and network. However, for cloud infrastructure services, the lack of a description standard and the inadequate research of systematic discovery and selection methods have exposed difficulties in discovering and choosing services for users. First, considering the highly configurable properties of a cloud infrastructure service, the feature model method is used to describe such a service. Second, based on the description of the cloud infrastructure service, a systematic discovery and selection method for cloud infrastructure services are proposed. The automatic analysis techniques of the feature model are introduced to verify the model’s validity and to perform the matching of the service and demand models. Finally, we determine the critical decision metrics and their corresponding measurement methods for cloud infrastructure services, where the subjective and objective weighting results are combined to determine the weights of the decision metrics. The best matching instances from various providers are then ranked by their comprehensive evaluations. Experimental results show that the proposed methods can effectively improve the accuracy and efficiency of cloud infrastructure service discovery and selection.

  5. Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

    Science.gov (United States)

    Keitel, Christian; Müller, Matthias M

    2016-05-01

    Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space.

  6. Continuous wavelet transform-based feature selection applied to near-infrared spectral diagnosis of cancer.

    Science.gov (United States)

    Chen, Hui; Lin, Zan; Mo, Lin; Wu, Hegang; Wu, Tong; Tan, Chao

    2015-01-01

    Spectrum is inherently local in nature since it can be thought of as a signal being composed of various frequency components. Wavelet transform (WT) is a powerful tool that partitions a signal into components with different frequency. The property of multi-resolution enables WT a very effective and natural tool for analyzing spectrum-like signal. In this study, a continuous wavelet transform (CWT)-based variable selection procedure was proposed to search for a set of informative wavelet coefficients for constructing a near-infrared (NIR) spectral diagnosis model of cancer. The CWT provided a fine multi-resolution feature space for selecting best predictors. A measure of discriminating power (DP) was defined to evaluate the coefficients. Partial least squares-discriminant analysis (PLS-DA) was used as the classification algorithm. A NIR spectral dataset associated to cancer diagnosis was used for experiment. The optimal results obtained correspond to the wavelet of db2. It revealed that on condition of having better performance on the training set, the optimal PLS-DA model using only 40 wavelet coefficients in 10 scales achieved the same performance as the one using all the variables in the original space on the test set: an overall accuracy of 93.8%, sensitivity of 92.5% and specificity of 96.3%. It confirms that the CWT-based feature selection coupled with PLS-DA is feasible and effective for constructing models of diagnostic cancer by NIR spectroscopy.

  7. Analysis of the GRNs Inference by Using Tsallis Entropy and a Feature Selection Approach

    Science.gov (United States)

    Lopes, Fabrício M.; de Oliveira, Evaldo A.; Cesar, Roberto M.

    An important problem in the bioinformatics field is to understand how genes are regulated and interact through gene networks. This knowledge can be helpful for many applications, such as disease treatment design and drugs creation purposes. For this reason, it is very important to uncover the functional relationship among genes and then to construct the gene regulatory network (GRN) from temporal expression data. However, this task usually involves data with a large number of variables and small number of observations. In this way, there is a strong motivation to use pattern recognition and dimensionality reduction approaches. In particular, feature selection is specially important in order to select the most important predictor genes that can explain some phenomena associated with the target genes. This work presents a first study about the sensibility of entropy methods regarding the entropy functional form, applied to the problem of topology recovery of GRNs. The generalized entropy proposed by Tsallis is used to study this sensibility. The inference process is based on a feature selection approach, which is applied to simulated temporal expression data generated by an artificial gene network (AGN) model. The inferred GRNs are validated in terms of global network measures. Some interesting conclusions can be drawn from the experimental results, as reported for the first time in the present paper.

  8. Identification of landscape features influencing gene flow: How useful are habitat selection models?

    Science.gov (United States)

    Roffler, Gretchen H.; Schwartz, Michael K.; Pilgrim, Kristy L.; Talbot, Sandra; Sage, Kevin; Adams, Layne G.; Luikart, Gordon

    2016-01-01

    Understanding how dispersal patterns are influenced by landscape heterogeneity is critical for modeling species connectivity. Resource selection function (RSF) models are increasingly used in landscape genetics approaches. However, because the ecological factors that drive habitat selection may be different from those influencing dispersal and gene flow, it is important to consider explicit assumptions and spatial scales of measurement. We calculated pairwise genetic distance among 301 Dall's sheep (Ovis dalli dalli) in southcentral Alaska using an intensive noninvasive sampling effort and 15 microsatellite loci. We used multiple regression of distance matrices to assess the correlation of pairwise genetic distance and landscape resistance derived from an RSF, and combinations of landscape features hypothesized to influence dispersal. Dall's sheep gene flow was positively correlated with steep slopes, moderate peak normalized difference vegetation indices (NDVI), and open land cover. Whereas RSF covariates were significant in predicting genetic distance, the RSF model itself was not significantly correlated with Dall's sheep gene flow, suggesting that certain habitat features important during summer (rugged terrain, mid-range elevation) were not influential to effective dispersal. This work underscores that consideration of both habitat selection and landscape genetics models may be useful in developing management strategies to both meet the immediate survival of a species and allow for long-term genetic connectivity.

  9. Feature Selection and Classification of Electroencephalographic Signals: An Artificial Neural Network and Genetic Algorithm Based Approach.

    Science.gov (United States)

    Erguzel, Turker Tekin; Ozekes, Serhat; Tan, Oguz; Gultekin, Selahattin

    2015-10-01

    Feature selection is an important step in many pattern recognition systems aiming to overcome the so-called curse of dimensionality. In this study, an optimized classification method was tested in 147 patients with major depressive disorder (MDD) treated with repetitive transcranial magnetic stimulation (rTMS). The performance of the combination of a genetic algorithm (GA) and a back-propagation (BP) neural network (BPNN) was evaluated using 6-channel pre-rTMS electroencephalographic (EEG) patterns of theta and delta frequency bands. The GA was first used to eliminate the redundant and less discriminant features to maximize classification performance. The BPNN was then applied to test the performance of the feature subset. Finally, classification performance using the subset was evaluated using 6-fold cross-validation. Although the slow bands of the frontal electrodes are widely used to collect EEG data for patients with MDD and provide quite satisfactory classification results, the outcomes of the proposed approach indicate noticeably increased overall accuracy of 89.12% and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.904 using the reduced feature set.

  10. Fuzzy rough sets, and a granular neural network for unsupervised feature selection.

    Science.gov (United States)

    Ganivada, Avatharam; Ray, Shubhra Sankar; Pal, Sankar K

    2013-12-01

    A granular neural network for identifying salient features of data, based on the concepts of fuzzy set and a newly defined fuzzy rough set, is proposed. The formation of the network mainly involves an input vector, initial connection weights and a target value. Each feature of the data is normalized between 0 and 1 and used to develop granulation structures by a user defined α-value. The input vector and the target value of the network are defined using granulation structures, based on the concept of fuzzy sets. The same granulation structures are also presented to a decision system. The decision system helps in extracting the domain knowledge about data in the form of dependency factors, using the notion of new fuzzy rough set. These dependency factors are assigned as the initial connection weights of the proposed network. It is then trained using minimization of a novel feature evaluation index in an unsupervised manner. The effectiveness of the proposed network, in evaluating selected features, is demonstrated on several real-life datasets. The results of FRGNN are found to be statistically more significant than related methods in 28 instances of 40 instances, i.e., 70% of instances, using the paired t-test.

  11. Multi-Stage Feature Selection Based Intelligent Classifier for Classification of Incipient Stage Fire in Building

    Directory of Open Access Journals (Sweden)

    Allan Melvin Andrew

    2016-01-01

    Full Text Available In this study, an early fire detection algorithm has been proposed based on low cost array sensing system, utilising off- the shelf gas sensors, dust particles and ambient sensors such as temperature and humidity sensor. The odour or “smellprint” emanated from various fire sources and building construction materials at early stage are measured. For this purpose, odour profile data from five common fire sources and three common building construction materials were used to develop the classification model. Normalised feature extractions of the smell print data were performed before subjected to prediction classifier. These features represent the odour signals in the time domain. The obtained features undergo the proposed multi-stage feature selection technique and lastly, further reduced by Principal Component Analysis (PCA, a dimension reduction technique. The hybrid PCA-PNN based approach has been applied on different datasets from in-house developed system and the portable electronic nose unit. Experimental classification results show that the dimension reduction process performed by PCA has improved the classification accuracy and provided high reliability, regardless of ambient temperature and humidity variation, baseline sensor drift, the different gas concentration level and exposure towards different heating temperature range.

  12. QSAR modeling for quinoxaline derivatives using genetic algorithm and simulated annealing based feature selection.

    Science.gov (United States)

    Ghosh, P; Bagchi, M C

    2009-01-01

    With a view to the rational design of selective quinoxaline derivatives, 2D and 3D-QSAR models have been developed for the prediction of anti-tubercular activities. Successful implementation of a predictive QSAR model largely depends on the selection of a preferred set of molecular descriptors that can signify the chemico-biological interaction. Genetic algorithm (GA) and simulated annealing (SA) are applied as variable selection methods for model development. 2D-QSAR modeling using GA or SA based partial least squares (GA-PLS and SA-PLS) methods identified some important topological and electrostatic descriptors as important factor for tubercular activity. Kohonen network and counter propagation artificial neural network (CP-ANN) considering GA and SA based feature selection methods have been applied for such QSAR modeling of Quinoxaline compounds. Out of a variable pool of 380 molecular descriptors, predictive QSAR models are developed for the training set and validated on the test set compounds and a comparative study of the relative effectiveness of linear and non-linear approaches has been investigated. Further analysis using 3D-QSAR technique identifies two models obtained by GA-PLS and SA-PLS methods leading to anti-tubercular activity prediction. The influences of steric and electrostatic field effects generated by the contribution plots are discussed. The results indicate that SA is a very effective variable selection approach for such 3D-QSAR modeling.

  13. A Feature Selection Method Based on Maximal Marginal Relevance%一种基于最大边缘相关的特征选择方法

    Institute of Scientific and Technical Information of China (English)

    刘赫; 张相洪; 刘大有; 李燕军; 尹立军

    2012-01-01

    文本分类的特点是高维的特征空间和高度的特征冗余.针对这两个特点,采用x2统计量处理高维的特征空间,利用信息新颖度的思想处理高度的特征冗余,根据最大边缘相关的定义,将二者有机结合,提出一种基于最大边缘相关的特征选择方法.该方法可以在特征选择过程中减少大量的冗余特征.最后,在Reuters-21578 Topl0和OHSCAL两个文本数据集上进行实验.实验结果表明,基于最大边缘相关的特征选择方法比x2统计量和信息增益两种特征选择方法更高效,并且能够提高naive Bayes,Rocchio和kNN 3种不同分类器的性能.%With the rapid growth of textual information on the Internet, text categorization has already been one of the key research directions in data mining. Text categorization is a supervised learning process, defined as automatically distributing free text into one or more predefined categories. At the present, text categorization is necessary for managing textual information and has been applied into many fields. However, text categorization has two characteristics: high dimensionality of feature space and high level of feature redundancy. For the two characteristics, X is used to deal with high dimensionality of feature space, and information novelty is used to deal with high level of feature redundancy. According to the definition of maximal marginal relevance, a feature selection method based on maximal marginal relevance is proposed, which can reduce redundancy between features in the process of feature selection. Furthermore, the experiments are carried out on two text data sets, namely, Reuters-21578 ToplO and OHSCAL. The results indicate that the featureselection method based on maximal marginal relevance is more efficient than X and information gain. Moveover it can improve the performance of three different categorizers, namely, naive Bayes, Rocchio and k NN.

  14. Document Clustering Based on Semi-Supervised Term Clustering

    Directory of Open Access Journals (Sweden)

    Hamid Mahmoodi

    2012-05-01

    Full Text Available The study is conducted to propose a multi-step feature (term selection process and in semi-supervised fashion, provide initial centers for term clusters. Then utilize the fuzzy c-means (FCM clustering algorithm for clustering terms. Finally assign each of documents to closest associated term clusters. While most text clustering algorithms directly use documents for clustering, we propose to first group the terms using FCM algorithm and then cluster documents based on terms clusters. We evaluate effectiveness of our technique on several standard text collections and compare our results with the some classical text clustering algorithms.

  15. Classification methodology and feature selection to assist fault location in power distribution systems

    Directory of Open Access Journals (Sweden)

    Juan José Mora Flórez

    2008-01-01

    Full Text Available A classification methodology based on Support Vector Machines (SVM is proposed to locate the faulted zone in power distribution networks. The goal is to reduce the multiple-estimation problem inherent in those methods that use single end measures (in the substation to estimate the fault location in radial systems. A selection of features or descriptors obtained from voltages and currents measured in the substation are analyzed and used as input of the SVM classifier. Performance of the fault locator having several combinations of these features has been evaluated according to its capability to discriminate between faults in different zones but located at similar distance. An application example illustrates the precision, to locate the faulted zone, obtained with the proposed methodology in simulated framework. The proposal provides appropriate information for the prevention and opportune attention of faults,requires minimum investment and overcomes the multiple-estimation problem of the classic impedance based methods.

  16. Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection.

    Science.gov (United States)

    Botsis, Taxiarchis; Nguyen, Michael D; Woo, Emily Jane; Markatou, Marianthi; Ball, Robert

    2011-01-01

    The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N(pos)=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.

  17. Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection.

    Science.gov (United States)

    Kim, Sungho; Song, Woo-Jin; Kim, So-Hyun

    2016-07-19

    Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR) images or infrared (IR) images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT) and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter) and an asymmetric morphological closing filter (AMCF, post-filter) into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC)-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic database generated

  18. Robust Ground Target Detection by SAR and IR Sensor Fusion Using Adaboost-Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Sungho Kim

    2016-07-01

    Full Text Available Long-range ground targets are difficult to detect in a noisy cluttered environment using either synthetic aperture radar (SAR images or infrared (IR images. SAR-based detectors can provide a high detection rate with a high false alarm rate to background scatter noise. IR-based approaches can detect hot targets but are affected strongly by the weather conditions. This paper proposes a novel target detection method by decision-level SAR and IR fusion using an Adaboost-based machine learning scheme to achieve a high detection rate and low false alarm rate. The proposed method consists of individual detection, registration, and fusion architecture. This paper presents a single framework of a SAR and IR target detection method using modified Boolean map visual theory (modBMVT and feature-selection based fusion. Previous methods applied different algorithms to detect SAR and IR targets because of the different physical image characteristics. One method that is optimized for IR target detection produces unsuccessful results in SAR target detection. This study examined the image characteristics and proposed a unified SAR and IR target detection method by inserting a median local average filter (MLAF, pre-filter and an asymmetric morphological closing filter (AMCF, post-filter into the BMVT. The original BMVT was optimized to detect small infrared targets. The proposed modBMVT can remove the thermal and scatter noise by the MLAF and detect extended targets by attaching the AMCF after the BMVT. Heterogeneous SAR and IR images were registered automatically using the proposed RANdom SAmple Region Consensus (RANSARC-based homography optimization after a brute-force correspondence search using the detected target centers and regions. The final targets were detected by feature-selection based sensor fusion using Adaboost. The proposed method showed good SAR and IR target detection performance through feature selection-based decision fusion on a synthetic

  19. Welding Diagnostics by Means of Particle Swarm Optimization and Feature Selection

    Directory of Open Access Journals (Sweden)

    J. Mirapeix

    2012-01-01

    Full Text Available In a previous contribution, a welding diagnostics approach based on plasma optical spectroscopy was presented. It consisted of the employment of optimization algorithms and synthetic spectra to obtain the participation profiles of the species participating in the plasma. A modification of the model is discussed here: on the one hand the controlled random search algorithm has been substituted by a particle swarm optimization implementation. On the other hand a feature selection stage has been included to determine those spectral windows where the optimization process will take place. Both experimental and field tests will be shown to illustrate the performance of the solution that improves the results of the previous work.

  20. An Approach with Support Vector Machine using Variable Features Selection on Breast Cancer Prognosis

    Directory of Open Access Journals (Sweden)

    Sandeep Chaurasia

    2013-09-01

    Full Text Available Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of machine learning. In this paper we have used an approach by using support vector machine classifier to construct a model that is useful for the breast cancer survivability prediction. We have used both 5 cross and 10 cross validation of variable selection on input feature vectors and the performance measurement through bio-learning class performance while measuring AUC, specificity and sensitivity. The performance of the SVM is much better than the other machine learning classifier.

  1. Psychological features of youths at a selection to beach volley-ball

    Directory of Open Access Journals (Sweden)

    Samoday V.

    2010-04-01

    Full Text Available The psychological features of young sportsmen are considered on the initial stage of sporting preparation. The dynamics of mental condition of youths is certain from beach volley-ball on the stage of sporting selection. 25 young sportsmen of groups of initial preparation took part in research. Middle age of young sportsmen made 12 years. On the indexes of self-appraisal of mental condition at swingeing majority of youths exposed a few the overextension of feel, activity and location of spirit. On the psychological indexes of concentration, the middle level of their development is set firmness and concentration of attention.

  2. Data Visualization and Feature Selection Methods in Gel-based Proteomics

    DEFF Research Database (Denmark)

    Silva, Tomé Santos; Richard, Nadege; Dias, Jorge P.;

    2014-01-01

    Despite the increasing popularity of gel-free proteomic strategies, two-dimensional gel electrophoresis (2DE) is still the most widely used approach in top-down proteomic studies, for all sorts of biological models. In order to achieve meaningful biological insight using 2DE approaches, importance......-based proteomics, summarizing the current state of research within this field. Particular focus is given on discussing the usefulness of available multivariate analysis tools both for data visualization and feature selection purposes. Visual examples are given using a real gel-based proteomic dataset as basis....

  3. Feature selection and the information content of Thematic Mapper simulator data for forest structural assessment

    Science.gov (United States)

    Spanner, M. A.; Brass, J. A.; Peterson, D. L.

    1984-01-01

    An assessment is made of the information content of Thematic Mapper Simulator (TMS) data for the case of a forested region, in order to determine the sensitivity of such data to forest crown closure and tree size class. Principal components analysis and Monte Carlo simulation indicated that channels 4, 7, 5 and 3 were optimal for four-channel forest structure analysis. As the number of channels supplied to the Monte Carlo feature selection routine increased, classification accuracy increased. The greatest sensitivity to the forest structural parameters, which included succession within clearcuts as well as crown closure and size class, was obtained from the 7-channel TMS data.

  4. Providing effective supervision in clinical neuropsychology.

    Science.gov (United States)

    Stucky, Kirk J; Bush, Shane; Donders, Jacobus

    2010-01-01

    A specialty like clinical neuropsychology is shaped by its selection of trainees, educational standards, expected competencies, and the structure of its training programs. The development of individual competency in this specialty is dependent to a considerable degree on the provision of competent supervision to its trainees. In clinical neuropsychology, as in other areas of professional health-service psychology, supervision is the most frequently used method for teaching a variety of skills, including assessment, report writing, differential diagnosis, and treatment. Although much has been written about the provision of quality supervision in clinical and counseling psychology, very little published guidance is available regarding the teaching and provision of supervision in clinical neuropsychology. The primary focus of this article is to provide a framework and guidance for the development of suggested competency standards for training of neuropsychological supervisors, particularly at the residency level. In this paper we outline important components of supervision for neuropsychology trainees and suggest ways in which clinicians can prepare for supervisory roles. Similar to Falender and Shafranske (2004), we propose a competency-based approach to supervision that advocates for a science-informed, formalized, and objective process that clearly delineates the competencies required for good supervisory practice. As much as possible, supervisory competencies are related to foundational and functional competencies in professional psychology, as well as recent legislative initiatives mandating training in supervision. It is our hope that this article will foster further discussion regarding this complex topic, and eventually enhance training in clinical neuropsychology.

  5. Electrical Identification and Selective Microstimulation of Neuronal Compartments Based on Features of Extracellular Action Potentials

    Science.gov (United States)

    Radivojevic, Milos; Jäckel, David; Altermatt, Michael; Müller, Jan; Viswam, Vijay; Hierlemann, Andreas; Bakkum, Douglas J.

    2016-08-01

    A detailed, high-spatiotemporal-resolution characterization of neuronal responses to local electrical fields and the capability of precise extracellular microstimulation of selected neurons are pivotal for studying and manipulating neuronal activity and circuits in networks and for developing neural prosthetics. Here, we studied cultured neocortical neurons by using high-density microelectrode arrays and optical imaging, complemented by the patch-clamp technique, and with the aim to correlate morphological and electrical features of neuronal compartments with their responsiveness to extracellular stimulation. We developed strategies to electrically identify any neuron in the network, while subcellular spatial resolution recording of extracellular action potential (AP) traces enabled their assignment to the axon initial segment (AIS), axonal arbor and proximal somatodendritic compartments. Stimulation at the AIS required low voltages and provided immediate, selective and reliable neuronal activation, whereas stimulation at the soma required high voltages and produced delayed and unreliable responses. Subthreshold stimulation at the soma depolarized the somatic membrane potential without eliciting APs.

  6. An Application of Discriminant Analysis to Pattern Recognition of Selected Contaminated Soil Features in Thin Sections

    DEFF Research Database (Denmark)

    Ribeiro, Alexandra B.; Nielsen, Allan Aasbjerg

    1997-01-01

    qualitative microprobe results: present elements Al, Si, Cr, Fe, As (associated with others). Selected groups of calibrated images (same light conditions and magnification) submitted to discriminant analysis, in order to find a pattern of recognition in the soil features corresponding to contamination already...... in the soluble and exchangeable phase, these elements being associated primarily with amorphous-crystalline Fe-oxides, organic matter and/or resistant phases. The results obtained with sequential extraction were the prerequisite to the attempt to identify the Cr and As distribution in the solid phase. If high...... concentrations of contaminants are indicated by chemical wet analysis, these contaminants must occur directly in the solid phase. Thin sections of soil aggregates were scanned for Cu, Cr and As using an electron microprobe, and qualitative analysis was made on selected areas. Microphotographs of thin sections...

  7. An application of locally linear model tree algorithm with combination of feature selection in credit scoring

    Science.gov (United States)

    Siami, Mohammad; Gholamian, Mohammad Reza; Basiri, Javad

    2014-10-01

    Nowadays, credit scoring is one of the most important topics in the banking sector. Credit scoring models have been widely used to facilitate the process of credit assessing. In this paper, an application of the locally linear model tree algorithm (LOLIMOT) was experimented to evaluate the superiority of its performance to predict the customer's credit status. The algorithm is improved with an aim of adjustment by credit scoring domain by means of data fusion and feature selection techniques. Two real world credit data sets - Australian and German - from UCI machine learning database were selected to demonstrate the performance of our new classifier. The analytical results indicate that the improved LOLIMOT significantly increase the prediction accuracy.

  8. Mining for diagnostic information in body surface potential maps: A comparison of feature selection techniques

    Directory of Open Access Journals (Sweden)

    McCullagh Paul J

    2005-09-01

    Full Text Available Abstract Background In body surface potential mapping, increased spatial sampling is used to allow more accurate detection of a cardiac abnormality. Although diagnostically superior to more conventional electrocardiographic techniques, the perceived complexity of the Body Surface Potential Map (BSPM acquisition process has prohibited its acceptance in clinical practice. For this reason there is an interest in striking a compromise between the minimum number of electrocardiographic recording sites required to sample the maximum electrocardiographic information. Methods In the current study, several techniques widely used in the domains of data mining and knowledge discovery have been employed to mine for diagnostic information in 192 lead BSPMs. In particular, the Single Variable Classifier (SVC based filter and Sequential Forward Selection (SFS based wrapper approaches to feature selection have been implemented and evaluated. Using a set of recordings from 116 subjects, the diagnostic ability of subsets of 3, 6, 9, 12, 24 and 32 electrocardiographic recording sites have been evaluated based on their ability to correctly asses the presence or absence of Myocardial Infarction (MI. Results It was observed that the wrapper approach, using sequential forward selection and a 5 nearest neighbour classifier, was capable of choosing a set of 24 recording sites that could correctly classify 82.8% of BSPMs. Although the filter method performed slightly less favourably, the performance was comparable with a classification accuracy of 79.3%. In addition, experiments were conducted to show how (a features chosen using the wrapper approach were specific to the classifier used in the selection model, and (b lead subsets chosen were not necessarily unique. Conclusion It was concluded that both the filter and wrapper approaches adopted were suitable for guiding the choice of recording sites useful for determining the presence of MI. It should be noted however

  9. A soft computing based approach using modified selection strategy for feature reduction of medical systems.

    Science.gov (United States)

    Zuhtuogullari, Kursat; Allahverdi, Novruz; Arikan, Nihat

    2013-01-01

    The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes) with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  10. A Soft Computing Based Approach Using Modified Selection Strategy for Feature Reduction of Medical Systems

    Directory of Open Access Journals (Sweden)

    Kursat Zuhtuogullari

    2013-01-01

    Full Text Available The systems consisting high input spaces require high processing times and memory usage. Most of the attribute selection algorithms have the problems of input dimensions limits and information storage problems. These problems are eliminated by means of developed feature reduction software using new modified selection mechanism with middle region solution candidates adding. The hybrid system software is constructed for reducing the input attributes of the systems with large number of input variables. The designed software also supports the roulette wheel selection mechanism. Linear order crossover is used as the recombination operator. In the genetic algorithm based soft computing methods, locking to the local solutions is also a problem which is eliminated by using developed software. Faster and effective results are obtained in the test procedures. Twelve input variables of the urological system have been reduced to the reducts (reduced input attributes with seven, six, and five elements. It can be seen from the obtained results that the developed software with modified selection has the advantages in the fields of memory allocation, execution time, classification accuracy, sensitivity, and specificity values when compared with the other reduction algorithms by using the urological test data.

  11. Supervision som undervisningsform i voksenspecialundervisningen

    DEFF Research Database (Denmark)

    Kristensen, René

    2000-01-01

    Supervision som undervisningsform i voksenspecialundervisningen. Procesarbejde i undervisning af voksne.......Supervision som undervisningsform i voksenspecialundervisningen. Procesarbejde i undervisning af voksne....

  12. EMG feature assessment for myoelectric pattern recognition and channel selection: a study with incomplete spinal cord injury.

    Science.gov (United States)

    Liu, Jie; Li, Xiaoyan; Li, Guanglin; Zhou, Ping

    2014-07-01

    Myoelectric pattern recognition with a large number of electromyogram (EMG) channels provides an approach to assessing motor control information available from the recorded muscles. In order to develop a practical myoelectric control system, a feature dependent channel reduction method was developed in this study to determine a small number of EMG channels for myoelectric pattern recognition analysis. The method selects appropriate raw EMG features for classification of different movements, using the minimum Redundancy Maximum Relevance (mRMR) and the Markov random field (MRF) methods to rank a large number of EMG features, respectively. A k-nearest neighbor (KNN) classifier was used to evaluate the performance of the selected features in terms of classification accuracy. The method was tested using 57 channels' surface EMG signals recorded from forearm and hand muscles of individuals with incomplete spinal cord injury (SCI). Our results demonstrate that appropriate selection of a small number of raw EMG features from different recording channels resulted in similar high classification accuracies as achieved by using all the EMG channels or features. Compared with the conventional sequential forward selection (SFS) method, the feature dependent method does not require repeated classifier implementation. It can effectively reduce redundant information not only cross different channels, but also cross different features in the same channel. Such hybrid feature-channel selection from a large number of EMG recording channels can reduce computational cost for implementation of a myoelectric pattern recognition based control system.

  13. On the use of feature selection to improve the detection of sea oil spills in SAR images

    Science.gov (United States)

    Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo

    2017-03-01

    Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to

  14. Online Learning of Hierarchical Pitman-Yor Process Mixture of Generalized Dirichlet Distributions With Feature Selection.

    Science.gov (United States)

    Fan, Wentao; Sallay, Hassen; Bouguila, Nizar

    2016-06-09

    In this paper, a novel statistical generative model based on hierarchical Pitman-Yor process and generalized Dirichlet distributions (GDs) is presented. The proposed model allows us to perform joint clustering and feature selection thanks to the interesting properties of the GD distribution. We develop an online variational inference algorithm, formulated in terms of the minimization of a Kullback-Leibler divergence, of our resulting model that tackles the problem of learning from high-dimensional examples. This variational Bayes formulation allows simultaneously estimating the parameters, determining the model's complexity, and selecting the appropriate relevant features for the clustering structure. Moreover, the proposed online learning algorithm allows data instances to be processed in a sequential manner, which is critical for large-scale and real-time applications. Experiments conducted using challenging applications, namely, scene recognition and video segmentation, where our approach is viewed as an unsupervised technique for visual learning in high-dimensional spaces, showed that the proposed approach is suitable and promising.

  15. Comparative Analysis of PSO and GA in Geom-Statistical Character Features Selection for Online Character Recognition

    Directory of Open Access Journals (Sweden)

    Fenwa O.D

    2015-08-01

    Full Text Available Online handwriting recognition today has special interest due to increased usage of the hand held devices and it has become a difficult problem because of the high variability and ambiguity in the character shapes written by individuals. One major problem encountered by researchers in developing character recognition system is selection of efficient features (optimal features. In this paper, a feature extraction technique for online character recognition system was developed using hybrid of geometrical and statistical (Geom-statistical features. Thus, through the integration of geometrical and statistical features, insights were gained into new character properties, since these types of features were considered to be complementary. Several optimization techniques have been used in literature for feature selection in character recognition such as; Ant Colony Optimization Algorithm (ACO, Genetic Algorithm (GA, Particle Swarm Optimization (PSO and Simulated Annealing but comparative analysis of GA and PSO in online character has not been carried out. In this paper, a comparative analysis of performance was made between the GA and PSO in optimizing the Geom-statistical features in online character recognition using Modified Optical Backpropagation (MOBP as classifier. Simulation of the system was done and carried out on Matlab 7.10a. The results generated show that PSO is a well-accepted optimization algorithm in selection of optimal features as it outperforms the GA in terms of number of features selected, training time and recognition accuracy.

  16. Feature and Score Fusion Based Multiple Classifier Selection for Iris Recognition

    Directory of Open Access Journals (Sweden)

    Md. Rabiul Islam

    2014-01-01

    Full Text Available The aim of this work is to propose a new feature and score fusion based iris recognition approach where voting method on Multiple Classifier Selection technique has been applied. Four Discrete Hidden Markov Model classifiers output, that is, left iris based unimodal system, right iris based unimodal system, left-right iris feature fusion based multimodal system, and left-right iris likelihood ratio score fusion based multimodal system, is combined using voting method to achieve the final recognition result. CASIA-IrisV4 database has been used to measure the performance of the proposed system with various dimensions. Experimental results show the versatility of the proposed system of four different classifiers with various dimensions. Finally, recognition accuracy of the proposed system has been compared with existing N hamming distance score fusion approach proposed by Ma et al., log-likelihood ratio score fusion approach proposed by Schmid et al., and single level feature fusion approach proposed by Hollingsworth et al.

  17. Neurons in the thalamic reticular nucleus are selective for diverse and complex visual features

    Directory of Open Access Journals (Sweden)

    Vishal eVaingankar

    2012-12-01

    Full Text Available All visual signals the cortex receives are influenced by the perigeniculate sector of the thalamic reticular nucleus, which receives input from relay cells in the lateral geniculate and provides feedback inhibition in return. Relay cells have been studied in quantitative depth; they behave