WorldWideScience

Sample records for based feature selection

  1. Feature Selection Based on Confidence Machine

    OpenAIRE

    Liu, Chang; Xu, Yi

    2014-01-01

    In machine learning and pattern recognition, feature selection has been a hot topic in the literature. Unsupervised feature selection is challenging due to the loss of labels which would supply the related information.How to define an appropriate metric is the key for feature selection. We propose a filter method for unsupervised feature selection which is based on the Confidence Machine. Confidence Machine offers an estimation of confidence on a feature'reliability. In this paper, we provide...

  2. CBFS: high performance feature selection algorithm based on feature clearness.

    Directory of Open Access Journals (Sweden)

    Minseok Seo

    Full Text Available BACKGROUND: The goal of feature selection is to select useful features and simultaneously exclude garbage features from a given dataset for classification purposes. This is expected to bring reduction of processing time and improvement of classification accuracy. METHODOLOGY: In this study, we devised a new feature selection algorithm (CBFS based on clearness of features. Feature clearness expresses separability among classes in a feature. Highly clear features contribute towards obtaining high classification accuracy. CScore is a measure to score clearness of each feature and is based on clustered samples to centroid of classes in a feature. We also suggest combining CBFS and other algorithms to improve classification accuracy. CONCLUSIONS/SIGNIFICANCE: From the experiment we confirm that CBFS is more excellent than up-to-date feature selection algorithms including FeaLect. CBFS can be applied to microarray gene selection, text categorization, and image classification.

  3. Feature Selection Based on Mutual Correlation

    Czech Academy of Sciences Publication Activity Database

    Haindl, Michal; Somol, Petr; Ververidis, D.; Kotropoulos, C.

    2006-01-01

    Roč. 19, č. 4225 (2006), s. 569-577 ISSN 0302-9743. [Iberoamerican Congress on Pattern Recognition. CIARP 2006 /11./. Cancun, 14.11.2006-17.11.2006] R&D Projects: GA AV ČR 1ET400750407; GA MŠk 1M0572; GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : feature selection Subject RIV: BD - Theory of Information Impact factor: 0.402, year: 2005 http://library.utia.cas.cz/separaty/historie/haindl-feature selection based on mutual correlation.pdf

  4. Feature selection based classifier combination approach for ...

    Indian Academy of Sciences (India)

    2016-08-26

    Aug 26, 2016 ... Feature selection based classifier combination approach for handwritten Devanagari numeral recognition. Pratibha Singh Ajay Verma ... ensemble of classifiers. The main contribution of the proposed method is that, the method gives quite efficient results utilizing only 10% patterns of the available dataset.

  5. EEG feature selection method based on decision tree.

    Science.gov (United States)

    Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun

    2015-01-01

    This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.

  6. Graph-based unsupervised feature selection and multiview ...

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Biosciences; Volume 40; Issue 4. Graph-based unsupervised feature selection and multiview clustering for microarray data. Tripti Swarnkar Pabitra Mitra ... Keywords. Biological functional enrichment; clustering; explorative data analysis; feature selection; gene selection; graph-based learning.

  7. Linear regression-based feature selection for microarray data classification.

    Science.gov (United States)

    Abid Hasan, Md; Hasan, Md Kamrul; Abdul Mottalib, M

    2015-01-01

    Predicting the class of gene expression profiles helps improve the diagnosis and treatment of diseases. Analysing huge gene expression data otherwise known as microarray data is complicated due to its high dimensionality. Hence the traditional classifiers do not perform well where the number of features far exceeds the number of samples. A good set of features help classifiers to classify the dataset efficiently. Moreover, a manageable set of features is also desirable for the biologist for further analysis. In this paper, we have proposed a linear regression-based feature selection method for selecting discriminative features. Our main focus is to classify the dataset more accurately using less number of features than other traditional feature selection methods. Our method has been compared with several other methods and in almost every case the classification accuracy is higher using less number of features than the other popular feature selection methods.

  8. Feature Selection with Neighborhood Entropy-Based Cooperative Game Theory

    Directory of Open Access Journals (Sweden)

    Kai Zeng

    2014-01-01

    Full Text Available Feature selection plays an important role in machine learning and data mining. In recent years, various feature measurements have been proposed to select significant features from high-dimensional datasets. However, most traditional feature selection methods will ignore some features which have strong classification ability as a group but are weak as individuals. To deal with this problem, we redefine the redundancy, interdependence, and independence of features by using neighborhood entropy. Then the neighborhood entropy-based feature contribution is proposed under the framework of cooperative game. The evaluative criteria of features can be formalized as the product of contribution and other classical feature measures. Finally, the proposed method is tested on several UCI datasets. The results show that neighborhood entropy-based cooperative game theory model (NECGT yield better performance than classical ones.

  9. Feature selection based classifier combination approach for ...

    Indian Academy of Sciences (India)

    based classifier combination is the simplest method in which final decision is that class for which maximum (greater than N/2) participating classifier vote, where N is the number of classifiers. 3.2b Decision templates: The method based on decision template, (Kuncheva et al 2001) firstly creates DT for each class using ...

  10. Feature selection based classifier combination approach for ...

    Indian Academy of Sciences (India)

    3.2c Dempster-Shafer rule based classifier combination: Dempster–Shafer (DS) method is based on the evidence theory, proposed by Glen Shafer as a way to represent cognitive knowledge. Here the probability is obtained using belief function instead of using the Bayesian distribution. Prob- ability values are assigned to a ...

  11. Graph-based unsupervised feature selection and multiview ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... Biological functional enrichment; clustering; explorative data analysis; feature selection; gene selection; graph-based learning. Published online: 28 September ...... RFGS: random forest gene selection; SVST: Support vector sampling technique; SOM: Self-organizing map; GUFS: proposed graph-based.

  12. Feature selection for fMRI-based deception detection

    Science.gov (United States)

    Jin, Bo; Strasburger, Alvin; Laken, Steven J; Kozel, F Andrew; Johnson, Kevin A; George, Mark S; Lu, Xinghua

    2009-01-01

    Background Functional magnetic resonance imaging (fMRI) is a technology used to detect brain activity. Patterns of brain activation have been utilized as biomarkers for various neuropsychiatric applications. Detecting deception based on the pattern of brain activation characterized with fMRI is getting attention – with machine learning algorithms being applied to this field in recent years. The high dimensionality of fMRI data makes it a difficult task to directly utilize the original data as input for classification algorithms in detecting deception. In this paper, we investigated the procedures of feature selection to enhance fMRI-based deception detection. Results We used the t-statistic map derived from the statistical parametric mapping analysis of fMRI signals to construct features that reflect brain activation patterns. We subsequently investigated various feature selection methods including an ensemble method to identify discriminative features to detect deception. Using 124 features selected from a set of 65,166 original features as inputs for a support vector machine classifier, our results indicate that feature selection significantly enhanced the classification accuracy of the support vector machine in comparison to the models trained using all features and dimension reduction based models. Furthermore, the selected features are shown to form anatomic clusters within brain regions, which supports the hypothesis that specific brain regions may play a role during deception processes. Conclusion Feature selection not only enhances classification accuracy in fMRI-based deception detection but also provides support for the biological hypothesis that brain activities in certain regions of the brain are important for discrimination of deception. PMID:19761569

  13. Linear feature selection in texture analysis - A PLS based method

    DEFF Research Database (Denmark)

    Marques, Joselene; Igel, Christian; Lillholm, Martin

    2013-01-01

    We present a texture analysis methodology that combined uncommitted machine-learning techniques and partial least square (PLS) in a fully automatic framework. Our approach introduces a robust PLS-based dimensionality reduction (DR) step to specifically address outliers and high-dimensional feature......, which first applied a PLS regression to rank the features and then defined the best number of features to retain in the model by an iterative learning phase. The outliers in the dataset, that could inflate the number of selected features, were eliminated by a pre-processing step. To cope...... and considering all CV groups, the methods selected 36 % of the original features available. The diagnosis evaluation reached a generalization area-under-the-ROC curve of 0.92, which was higher than established cartilage-based markers known to relate to OA diagnosis....

  14. Mutual information-based feature selection for radiomics

    Science.gov (United States)

    Oubel, Estanislao; Beaumont, Hubert; Iannessi, Antoine

    2016-03-01

    Background The extraction and analysis of image features (radiomics) is a promising field in the precision medicine era, with applications to prognosis, prediction, and response to treatment quantification. In this work, we present a mutual information - based method for quantifying reproducibility of features, a necessary step for qualification before their inclusion in big data systems. Materials and Methods Ten patients with Non-Small Cell Lung Cancer (NSCLC) lesions were followed over time (7 time points in average) with Computed Tomography (CT). Five observers segmented lesions by using a semi-automatic method and 27 features describing shape and intensity distribution were extracted. Inter-observer reproducibility was assessed by computing the multi-information (MI) of feature changes over time, and the variability of global extrema. Results The highest MI values were obtained for volume-based features (VBF). The lesion mass (M), surface to volume ratio (SVR) and volume (V) presented statistically significant higher values of MI than the rest of features. Within the same VBF group, SVR showed also the lowest variability of extrema. The correlation coefficient (CC) of feature values was unable to make a difference between features. Conclusions MI allowed to discriminate three features (M, SVR, and V) from the rest in a statistically significant manner. This result is consistent with the order obtained when sorting features by increasing values of extrema variability. MI is a promising alternative for selecting features to be considered as surrogate biomarkers in a precision medicine context.

  15. Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

    Directory of Open Access Journals (Sweden)

    Jin-Jia Wang

    2015-01-01

    Full Text Available Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs. Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

  16. Auditory-model based robust feature selection for speech recognition.

    Science.gov (United States)

    Koniaris, Christos; Kuropatwinski, Marcin; Kleijn, W Bastiaan

    2010-02-01

    It is shown that robust dimension-reduction of a feature set for speech recognition can be based on a model of the human auditory system. Whereas conventional methods optimize classification performance, the proposed method exploits knowledge implicit in the auditory periphery, inheriting its robustness. Features are selected to maximize the similarity of the Euclidean geometry of the feature domain and the perceptual domain. Recognition experiments using mel-frequency cepstral coefficients (MFCCs) confirm the effectiveness of the approach, which does not require labeled training data. For noisy data the method outperforms commonly used discriminant-analysis based dimension-reduction methods that rely on labeling. The results indicate that selecting MFCCs in their natural order results in subsets with good performance.

  17. Using PSO-Based Hierarchical Feature Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Zhiwei Ji

    2014-01-01

    Full Text Available Hepatocellular carcinoma (HCC is one of the most common malignant tumors. Clinical symptoms attributable to HCC are usually absent, thus often miss the best therapeutic opportunities. Traditional Chinese Medicine (TCM plays an active role in diagnosis and treatment of HCC. In this paper, we proposed a particle swarm optimization-based hierarchical feature selection (PSOHFS model to infer potential syndromes for diagnosis of HCC. Firstly, the hierarchical feature representation is developed by a three-layer tree. The clinical symptoms and positive score of patient are leaf nodes and root in the tree, respectively, while each syndrome feature on the middle layer is extracted from a group of symptoms. Secondly, an improved PSO-based algorithm is applied in a new reduced feature space to search an optimal syndrome subset. Based on the result of feature selection, the causal relationships of symptoms and syndromes are inferred via Bayesian networks. In our experiment, 147 symptoms were aggregated into 27 groups and 27 syndrome features were extracted. The proposed approach discovered 24 syndromes which obviously improved the diagnosis accuracy. Finally, the Bayesian approach was applied to represent the causal relationships both at symptom and syndrome levels. The results show that our computational model can facilitate the clinical diagnosis of HCC.

  18. An opinion formation based binary optimization approach for feature selection

    Science.gov (United States)

    Hamedmoghadam, Homayoun; Jalili, Mahdi; Yu, Xinghuo

    2018-02-01

    This paper proposed a novel optimization method based on opinion formation in complex network systems. The proposed optimization technique mimics human-human interaction mechanism based on a mathematical model derived from social sciences. Our method encodes a subset of selected features to the opinion of an artificial agent and simulates the opinion formation process among a population of agents to solve the feature selection problem. The agents interact using an underlying interaction network structure and get into consensus in their opinions, while finding better solutions to the problem. A number of mechanisms are employed to avoid getting trapped in local minima. We compare the performance of the proposed method with a number of classical population-based optimization methods and a state-of-the-art opinion formation based method. Our experiments on a number of high dimensional datasets reveal outperformance of the proposed algorithm over others.

  19. Local-Nearest-Neighbors-Based Feature Weighting for Gene Selection.

    Science.gov (United States)

    An, Shuai; Wang, Jun; Wei, Jinmao

    2017-06-07

    Selecting functional genes is essential for analyzing microarray data. Among many available feature (gene) selection approaches, the ones on the basis of the large margin nearest neighbor receive more attention due to their low computational costs and high accuracies in analyzing the high-dimensional data. Yet there still exist some problems that hamper the existing approaches in sifting real target genes, including selecting erroneous nearest neighbors, high sensitivity to irrelevant genes, and inappropriate evaluation criteria. Previous pioneer works have partly addressed some of the problems, but none of them are capable of solving these problems simultaneously. In this paper, we propose a new local-nearest-neighbors-based feature weighting approach to alleviate the above problems. The proposed approach is based on the trick of locally minimizing the within-class distances and maximizing the between-class distances with the k nearest neighbors rule. We further define a feature weight vector, and construct it by minimizing the cost function with a regularization term. The proposed approach can be applied naturally to the multi-class problems and does not require extra modification. Experimental results on the UCI and the open microarray data sets validate the effectiveness and efficiency of the new approach.

  20. Efficient Multi-Label Feature Selection Using Entropy-Based Label Selection

    Directory of Open Access Journals (Sweden)

    Jaesung Lee

    2016-11-01

    Full Text Available Multi-label feature selection is designed to select a subset of features according to their importance to multiple labels. This task can be achieved by ranking the dependencies of features and selecting the features with the highest rankings. In a multi-label feature selection problem, the algorithm may be faced with a dataset containing a large number of labels. Because the computational cost of multi-label feature selection increases according to the number of labels, the algorithm may suffer from a degradation in performance when processing very large datasets. In this study, we propose an efficient multi-label feature selection method based on an information-theoretic label selection strategy. By identifying a subset of labels that significantly influence the importance of features, the proposed method efficiently outputs a feature subset. Experimental results demonstrate that the proposed method can identify a feature subset much faster than conventional multi-label feature selection methods for large multi-label datasets.

  1. Unsupervised Feature Selection Based on the Morisita Index

    Science.gov (United States)

    Golay, Jean; Kanevski, Mikhail

    2016-04-01

    Recent breakthroughs in technology have radically improved our ability to collect and store data. As a consequence, the size of datasets has been increasing rapidly both in terms of number of variables (or features) and number of instances. Since the mechanism of many phenomena is not well known, too many variables are sampled. A lot of them are redundant and contribute to the emergence of three major challenges in data mining: (1) the complexity of result interpretation, (2) the necessity to develop new methods and tools for data processing, (3) the possible reduction in the accuracy of learning algorithms because of the curse of dimensionality. This research deals with a new algorithm for selecting the smallest subset of features conveying all the information of a dataset (i.e. an algorithm for removing redundant features). It is a new version of the Fractal Dimensionality Reduction (FDR) algorithm [1] and it relies on two ideas: (a) In general, data lie on non-linear manifolds of much lower dimension than that of the spaces where they are embedded. (b) The situation describes in (a) is partly due to redundant variables, since they do not contribute to increasing the dimension of manifolds, called Intrinsic Dimension (ID). The suggested algorithm implements these ideas by selecting only the variables influencing the data ID. Unlike the FDR algorithm, it resorts to a recently introduced ID estimator [2] based on the Morisita index of clustering and to a sequential forward search strategy. Consequently, in addition to its ability to capture non-linear dependences, it can deal with large datasets and its implementation is straightforward in any programming environment. Many real world case studies are considered. They are related to environmental pollution and renewable resources. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158

  2. Information Theory based Feature Selection for Customer Classification

    OpenAIRE

    Barraza, Néstor Rubén; Moro, Sergio; Ferreyra, Marcelo; de la Peña, Adolfo

    2016-01-01

    The application of Information Theory techniques in customer feature selection is analyzed. This method, usually called information gain has been demonstrated to be simple and fast for feature selection. The important concept of mutual information, originally introduced to analyze and model a noisy channel is used in order to measure relations between characteristics of given customers. An application to a bank customers data set of telemarketing calls for selling bank long-term deposits is s...

  3. Boosting feature selection for Neural Network based regression.

    Science.gov (United States)

    Bailly, Kevin; Milgram, Maurice

    2009-01-01

    The head pose estimation problem is well known to be a challenging task in computer vision and is a useful tool for several applications involving human-computer interaction. This problem can be stated as a regression one where the input is an image and the output is pan and tilt angles. Finding the optimal regression is a hard problem because of the high dimensionality of the input (number of image pixels) and the large variety of morphologies and illumination. We propose a new method combining a boosting strategy for feature selection and a neural network for the regression. Potential features are a very large set of Haar-like wavelets which are well known to be adapted to face image processing. To achieve the feature selection, a new Fuzzy Functional Criterion (FFC) is introduced which is able to evaluate the link between a feature and the output without any estimation of the joint probability density function as in the Mutual Information. The boosting strategy uses this criterion at each step: features are evaluated by the FFC using weights on examples computed from the error produced by the neural network trained at the previous step. Tests are carried out on the commonly used Pointing 04 database and compared with three state-of-the-art methods. We also evaluate the accuracy of the estimation on FacePix, a database with a high angular resolution. Our method is compared positively to a Convolutional Neural Network, which is well known to incorporate feature extraction in its first layers.

  4. Sequence-based classification using discriminatory motif feature selection.

    Directory of Open Access Journals (Sweden)

    Hao Xiong

    Full Text Available Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all k-mer patterns. The motivation behind such (enumerative approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length ≤ k, such that potentially important, longer (> k predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated. We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is

  5. Conditional Mutual Information Based Feature Selection for Classification Task

    Czech Academy of Sciences Publication Activity Database

    Novovičová, Jana; Somol, Petr; Haindl, Michal; Pudil, Pavel

    2007-01-01

    Roč. 45, č. 4756 (2007), s. 417-426 ISSN 0302-9743 R&D Projects: GA MŠk 1M0572; GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Pattern classification * feature selection * conditional mutual information * text categorization Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.402, year: 2005

  6. Train axle bearing fault detection using a feature selection scheme based multi-scale morphological filter

    Science.gov (United States)

    Li, Yifan; Liang, Xihui; Lin, Jianhui; Chen, Yuejian; Liu, Jianxin

    2018-02-01

    This paper presents a novel signal processing scheme, feature selection based multi-scale morphological filter (MMF), for train axle bearing fault detection. In this scheme, more than 30 feature indicators of vibration signals are calculated for axle bearings with different conditions and the features which can reflect fault characteristics more effectively and representatively are selected using the max-relevance and min-redundancy principle. Then, a filtering scale selection approach for MMF based on feature selection and grey relational analysis is proposed. The feature selection based MMF method is tested on diagnosis of artificially created damages of rolling bearings of railway trains. Experimental results show that the proposed method has a superior performance in extracting fault features of defective train axle bearings. In addition, comparisons are performed with the kurtosis criterion based MMF and the spectral kurtosis criterion based MMF. The proposed feature selection based MMF method outperforms these two methods in detection of train axle bearing faults.

  7. [Feature extraction for breast cancer data based on geometric algebra theory and feature selection using differential evolution].

    Science.gov (United States)

    Li, Jing; Hong, Wenxue

    2014-12-01

    The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.

  8. GAIN RATIO BASED FEATURE SELECTION METHOD FOR PRIVACY PRESERVATION

    Directory of Open Access Journals (Sweden)

    R. Praveena Priyadarsini

    2011-04-01

    Full Text Available Privacy-preservation is a step in data mining that tries to safeguard sensitive information from unsanctioned disclosure and hence protecting individual data records and their privacy. There are various privacy preservation techniques like k-anonymity, l-diversity and t-closeness and data perturbation. In this paper k-anonymity privacy protection technique is applied to high dimensional datasets like adult and census. since, both the data sets are high dimensional, feature subset selection method like Gain Ratio is applied and the attributes of the datasets are ranked and low ranking attributes are filtered to form new reduced data subsets. K-anonymization privacy preservation technique is then applied on reduced datasets. The accuracy of the privacy preserved reduced datasets and the original datasets are compared for their accuracy on the two functionalities of data mining namely classification and clustering using naïve Bayesian and k-means algorithm respectively. Experimental results show that classification and clustering accuracy are comparatively the same for reduced k-anonym zed datasets and the original data sets.

  9. Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

    International Nuclear Information System (INIS)

    Wang Xiaojia; Mao Qirong; Zhan Yongzhao

    2008-01-01

    There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions. The experiments show that this method can improve the recognition rate and the time of feature extraction

  10. Selecting protein families for environmental features based on manifold regularization.

    Science.gov (United States)

    Jiang, Xingpeng; Xu, Weiwei; Park, E K; Li, Guangrong

    2014-06-01

    Recently, statistics and machine learning have been developed to identify functional or taxonomic features of environmental features or physiological status. Important proteins (or other functional and taxonomic entities) to environmental features can be potentially used as biosensors. A major challenge is how the distribution of protein and gene functions embodies the adaption of microbial communities across environments and host habitats. In this paper, we propose a novel regularization method for linear regression to adapt the challenge. The approach is inspired by local linear embedding (LLE) and we call it a manifold-constrained regularization for linear regression (McRe). The novel regularization procedure also has potential to be used in solving other linear systems. We demonstrate the efficiency and the performance of the approach in both simulation and real data.

  11. Selection of a green manufacturing process based on CAD features

    OpenAIRE

    Gaha, Raoudha; Yannou, Bernard; Benamara, Abdelmajid

    2016-01-01

    International audience; Environmentally conscious manufacturing process (ECMP) has become an obligation to the environment and to the society itself, enforced primarily by governmental regulations and customer perspective on environmental issues. ECMP involves integrating environmental thinking into new product development. This is especially true in the computer-aided design (CAD) phase which is the last phase in the design process. At this stage, more than 80 % of choices are done. Feature ...

  12. A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

    Directory of Open Access Journals (Sweden)

    Yong Wang

    2016-02-01

    Full Text Available Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.

  13. Dynamic frequency feature selection based approach for classification of motor imageries.

    Science.gov (United States)

    Luo, Jing; Feng, Zuren; Zhang, Jun; Lu, Na

    2016-08-01

    Electroencephalography (EEG) is one of the most popular techniques to record the brain activities such as motor imagery, which is of low signal-to-noise ratio and could lead to high classification error. Therefore, selection of the most discriminative features could be crucial to improve the classification performance. However, the traditional feature selection methods employed in brain-computer interface (BCI) field (e.g. Mutual Information-based Best Individual Feature (MIBIF), Mutual Information-based Rough Set Reduction (MIRSR) and cross-validation) mainly focus on the overall performance on all the trials in the training set, and thus may have very poor performance on some specific samples, which is not acceptable. To address this problem, a novel sequential forward feature selection approach called Dynamic Frequency Feature Selection (DFFS) is proposed in this paper. The DFFS method emphasized the importance of the samples that got misclassified while only pursuing high overall classification performance. In the DFFS based classification scheme, the EEG data was first transformed to frequency domain using Wavelet Packet Decomposition (WPD), which is then employed as the candidate set for further discriminatory feature selection. The features are selected one by one in a boosting manner. After one feature being selected, the importance of the correctly classified samples based on the feature will be decreased, which is equivalent to increasing the importance of the misclassified samples. Therefore, a complement feature to the current features could be selected in the next run. The selected features are then fed to a classifier trained by random forest algorithm. Finally, a time series voting-based method is utilized to improve the classification performance. Comparisons between the DFFS-based approach and state-of-art methods on BCI competition IV data set 2b have been conducted, which have shown the superiority of the proposed algorithm. Copyright © 2016

  14. Feature selection for anomaly–based network intrusion detection using cluster validity indices

    CSIR Research Space (South Africa)

    Naidoo, T

    2015-09-01

    Full Text Available A feature selection algorithm that is novel in the context of anomaly–based network intrusion detection is proposed in this paper. The distinguishing factor of the proposed feature selection algorithm is its complete lack of dependency on labelled...

  15. Graph-based unsupervised feature selection and multiview ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... llama.med.harvard.edu/funcassociate), a Web-based application which discovers properties enriched in lists of genes or proteins that emerge from large-scale experimen- tation (Berriz et al. 2009) is also used for biological significance measurement. Further, gene-card (http:// www.genecards.org/) (Safran ...

  16. Non-Negative Spectral Learning and Sparse Regression-Based Dual-Graph Regularized Feature Selection.

    Science.gov (United States)

    Shang, Ronghua; Wang, Wenbing; Stolkin, Rustam; Jiao, Licheng

    2018-02-01

    Feature selection is an important approach for reducing the dimension of high-dimensional data. In recent years, many feature selection algorithms have been proposed, but most of them only exploit information from the data space. They often neglect useful information contained in the feature space, and do not make full use of the characteristics of the data. To overcome this problem, this paper proposes a new unsupervised feature selection algorithm, called non-negative spectral learning and sparse regression-based dual-graph regularized feature selection (NSSRD). NSSRD is based on the feature selection framework of joint embedding learning and sparse regression, but extends this framework by introducing the feature graph. By using low dimensional embedding learning in both data space and feature space, NSSRD simultaneously exploits the geometric information of both spaces. Second, the algorithm uses non-negative constraints to constrain the low-dimensional embedding matrix of both feature space and data space, ensuring that the elements in the matrix are non-negative. Third, NSSRD unifies the embedding matrix of the feature space and the sparse transformation matrix. To ensure the sparsity of the feature array, the sparse transformation matrix is constrained using the -norm. Thus feature selection can obtain accurate discriminative information from these matrices. Finally, NSSRD uses an iterative and alternative updating rule to optimize the objective function, enabling it to select the representative features more quickly and efficiently. This paper explains the objective function, the iterative updating rules and a proof of convergence. Experimental results show that NSSRD is significantly more effective than several other feature selection algorithms from the literature, on a variety of test data.

  17. RANDOM FORESTS-BASED FEATURE SELECTION FOR LAND-USE CLASSIFICATION USING LIDAR DATA AND ORTHOIMAGERY

    Directory of Open Access Journals (Sweden)

    H. Guan

    2012-07-01

    Full Text Available The development of lidar system, especially incorporated with high-resolution camera components, has shown great potential for urban classification. However, how to automatically select the best features for land-use classification is challenging. Random Forests, a newly developed machine learning algorithm, is receiving considerable attention in the field of image classification and pattern recognition. Especially, it can provide the measure of variable importance. Thus, in this study the performance of the Random Forests-based feature selection for urban areas was explored. First, we extract features from lidar data, including height-based, intensity-based GLCM measures; other spectral features can be obtained from imagery, such as Red, Blue and Green three bands, and GLCM-based measures. Finally, Random Forests is used to automatically select the optimal and uncorrelated features for landuse classification. 0.5-meter resolution lidar data and aerial imagery are used to assess the feature selection performance of Random Forests in the study area located in Mannheim, Germany. The results clearly demonstrate that the use of Random Forests-based feature selection can improve the classification performance by the selected features.

  18. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection

    Directory of Open Access Journals (Sweden)

    Yifei Chen

    2015-01-01

    Full Text Available Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.

  19. Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection.

    Science.gov (United States)

    Chen, Yifei; Sun, Yuxing; Han, Bing-Qing

    2015-01-01

    Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of the F1 measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.

  20. Mutual information-based feature selection for low-cost BCIs based on motor imagery.

    Science.gov (United States)

    Schiatti, L; Faes, L; Tessadori, J; Barresi, G; Mattos, L

    2016-08-01

    In the present study a feature selection algorithm based on mutual information (MI) was applied to electro-encephalographic (EEG) data acquired during three different motor imagery tasks from two dataset: Dataset I from BCI Competition IV including full scalp recordings from four subjects, and new data recorded from three subjects using the popular low-cost Emotiv EPOC EEG headset. The aim was to evaluate optimal channels and band-power (BP) features for motor imagery tasks discrimination, in order to assess the feasibility of a portable low-cost motor imagery based Brain-Computer Interface (BCI) system. The minimal sub set of features most relevant to task description and less redundant to each other was determined, and the corresponding classification accuracy was assessed offline employing linear support vector machine (SVM) in a 10-fold cross validation scheme. The analysis was performed: (a) on the original full Dataset I from BCI competition IV, (b) on a restricted channels set from Dataset I corresponding to available Emotiv EPOC electrodes locations, and (c) on data recorded with the EPOC system. Results from (a) showed that an offline classification accuracy above 80% can be reached using only 5 features. Limiting the analysis to EPOC channels caused a decrease of classification accuracy, although it still remained above chance level, both for data from (b) and (c). A top accuracy of 70% was achieved using 2 optimal features. These results encourage further research towards the development of portable low cost motor imagery-based BCI systems.

  1. An intelligent system for lung cancer diagnosis using a new genetic algorithm based feature selection method.

    Science.gov (United States)

    Lu, Chunhong; Zhu, Zhaomin; Gu, Xiaofeng

    2014-09-01

    In this paper, we develop a novel feature selection algorithm based on the genetic algorithm (GA) using a specifically devised trace-based separability criterion. According to the scores of class separability and variable separability, this criterion measures the significance of feature subset, independent of any specific classification. In addition, a mutual information matrix between variables is used as features for classification, and no prior knowledge about the cardinality of feature subset is required. Experiments are performed by using a standard lung cancer dataset. The obtained solutions are verified with three different classifiers, including the support vector machine (SVM), the back-propagation neural network (BPNN), and the K-nearest neighbor (KNN), and compared with those obtained by the whole feature set, the F-score and the correlation-based feature selection methods. The comparison results show that the proposed intelligent system has a good diagnosis performance and can be used as a promising tool for lung cancer diagnosis.

  2. Feature Selection by Reordering

    Czech Academy of Sciences Publication Activity Database

    Jiřina, Marcel; Jiřina jr., M.

    2005-01-01

    Roč. 2, č. 1 (2005), s. 155-161 ISSN 1738-6438 Institutional research plan: CEZ:AV0Z10300504 Keywords : feature selection * data reduction * ordering of features Subject RIV: BA - General Mathematics

  3. Robust Feature Selection from Microarray Data Based on Cooperative Game Theory and Qualitative Mutual Information

    Directory of Open Access Journals (Sweden)

    Atiyeh Mortazavi

    2016-01-01

    Full Text Available High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative game theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension of microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual information and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation of the proposed approach is to employ Qualitative Mutual Information (QMI for this purpose. The idea of Qualitative Mutual Information causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance and scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The performance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum redundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification accuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability compared to other approaches.

  4. Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis.

    Science.gov (United States)

    Inbarani, H Hannah; Azar, Ahmad Taher; Jothi, G

    2014-01-01

    Medical datasets are often classified by a large number of disease measurements and a relatively small number of patient records. All these measurements (features) are not important or irrelevant/noisy. These features may be especially harmful in the case of relatively small training sets, where this irrelevancy and redundancy is harder to evaluate. On the other hand, this extreme number of features carries the problem of memory usage in order to represent the dataset. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Thus, the learning model receives a concise structure without forfeiting the predictive accuracy built by using only the selected prominent features. Therefore, nowadays, FS is an essential part of knowledge discovery. In this study, new supervised feature selection methods based on hybridization of Particle Swarm Optimization (PSO), PSO based Relative Reduct (PSO-RR) and PSO based Quick Reduct (PSO-QR) are presented for the diseases diagnosis. The experimental result on several standard medical datasets proves the efficiency of the proposed technique as well as enhancements over the existing feature selection techniques. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  5. Different cortical mechanisms for spatial vs. feature-based attentional selection in visual working memory

    Directory of Open Access Journals (Sweden)

    Anna Heuer

    2016-08-01

    Full Text Available The limited capacity of visual working memory necessitates attentional mechanisms that selectively update and maintain only the most task-relevant content. Psychophysical experiments have shown that the retroactive selection of memory content can be based on visual properties such as location or shape, but the neural basis for such differential selection is unknown. For example, it is not known if there are different cortical modules specialized for spatial versus feature-based mnemonic attention, in the same way that has been demonstrated for attention to perceptual input. Here, we used transcranial magnetic stimulation (TMS to identify areas in human parietal and occipital cortex involved in the selection of objects from memory based on cues to their location (spatial information or their shape (featural information. We found that TMS over the supramarginal gyrus (SMG selectively facilitated spatial selection, whereas TMS over the lateral occipital cortex selectively enhanced feature-based selection for remembered objects in the contralateral visual field. Thus, different cortical regions are responsible for spatial vs. feature-based selection of working memory representations. Since the same regions are involved in attention to external events, these new findings indicate overlapping mechanisms for attentional control over perceptual input and mnemonic representations.

  6. Emotional textile image classification based on cross-domain convolutional sparse autoencoders with feature selection

    Science.gov (United States)

    Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin

    2017-01-01

    We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.

  7. Feature selection for elderly faller classification based on wearable sensors.

    Science.gov (United States)

    Howcroft, Jennifer; Kofman, Jonathan; Lemaire, Edward D

    2017-05-30

    Wearable sensors can be used to derive numerous gait pattern features for elderly fall risk and faller classification; however, an appropriate feature set is required to avoid high computational costs and the inclusion of irrelevant features. The objectives of this study were to identify and evaluate smaller feature sets for faller classification from large feature sets derived from wearable accelerometer and pressure-sensing insole gait data. A convenience sample of 100 older adults (75.5 ± 6.7 years; 76 non-fallers, 24 fallers based on 6 month retrospective fall occurrence) walked 7.62 m while wearing pressure-sensing insoles and tri-axial accelerometers at the head, pelvis, left and right shanks. Feature selection was performed using correlation-based feature selection (CFS), fast correlation based filter (FCBF), and Relief-F algorithms. Faller classification was performed using multi-layer perceptron neural network, naïve Bayesian, and support vector machine classifiers, with 75:25 single stratified holdout and repeated random sampling. The best performing model was a support vector machine with 78% accuracy, 26% sensitivity, 95% specificity, 0.36 F1 score, and 0.31 MCC and one posterior pelvis accelerometer input feature (left acceleration standard deviation). The second best model achieved better sensitivity (44%) and used a support vector machine with 74% accuracy, 83% specificity, 0.44 F1 score, and 0.29 MCC. This model had ten input features: maximum, mean and standard deviation posterior acceleration; maximum, mean and standard deviation anterior acceleration; mean superior acceleration; and three impulse features. The best multi-sensor model sensitivity (56%) was achieved using posterior pelvis and both shank accelerometers and a naïve Bayesian classifier. The best single-sensor model sensitivity (41%) was achieved using the posterior pelvis accelerometer and a naïve Bayesian classifier. Feature selection provided models with smaller feature

  8. Unsupervised Feature Subset Selection

    DEFF Research Database (Denmark)

    Søndberg-Madsen, Nicolaj; Thomsen, C.; Pena, Jose

    2003-01-01

    This paper studies filter and hybrid filter-wrapper feature subset selection for unsupervised learning (data clustering). We constrain the search for the best feature subset by scoring the dependence of every feature on the rest of the features, conjecturing that these scores discriminate some...... irrelevant features. We report experimental results on artificial and real data for unsupervised learning of naive Bayes models. Both the filter and hybrid approaches perform satisfactorily....

  9. Feature selection for neural network based defect classification of ceramic components using high frequency ultrasound.

    Science.gov (United States)

    Kesharaju, Manasa; Nagarajah, Romesh

    2015-09-01

    The motivation for this research stems from a need for providing a non-destructive testing method capable of detecting and locating any defects and microstructural variations within armour ceramic components before issuing them to the soldiers who rely on them for their survival. The development of an automated ultrasonic inspection based classification system would make possible the checking of each ceramic component and immediately alert the operator about the presence of defects. Generally, in many classification problems a choice of features or dimensionality reduction is significant and simultaneously very difficult, as a substantial computational effort is required to evaluate possible feature subsets. In this research, a combination of artificial neural networks and genetic algorithms are used to optimize the feature subset used in classification of various defects in reaction-sintered silicon carbide ceramic components. Initially wavelet based feature extraction is implemented from the region of interest. An Artificial Neural Network classifier is employed to evaluate the performance of these features. Genetic Algorithm based feature selection is performed. Principal Component Analysis is a popular technique used for feature selection and is compared with the genetic algorithm based technique in terms of classification accuracy and selection of optimal number of features. The experimental results confirm that features identified by Principal Component Analysis lead to improved performance in terms of classification percentage with 96% than Genetic algorithm with 94%. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics

    Directory of Open Access Journals (Sweden)

    Xiaohui Lin

    2017-12-01

    Full Text Available Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.

  11. Entropy based unsupervised Feature Selection in digital mammogram image using rough set theory.

    Science.gov (United States)

    Velayutham, C; Thangavel, K

    2012-01-01

    Feature Selection (FS) is a process, which attempts to select features, which are more informative. In the supervised FS methods various feature subsets are evaluated using an evaluation function or metric to select only those features, which are related to the decision classes of the data under consideration. However, for many data mining applications, decision class labels are often unknown or incomplete, thus indicating the significance of unsupervised FS. However, in unsupervised learning, decision class labels are not provided. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. In this paper, a novel unsupervised FS in mammogram image, using rough set-based entropy measures, is proposed. A typical mammogram image processing system generally consists of mammogram image acquisition, pre-processing of image, segmentation, features extracted from the segmented mammogram image. The proposed method is used to select features from data set, the method is compared with the existing rough set-based supervised FS methods and classification performance of both methods are recorded and demonstrates the efficiency of the method.

  12. Swarm intelligence based wavelet coefficient feature selection for mass spectral classification: an application to proteomics data.

    Science.gov (United States)

    Zhao, Weixiang; Davis, Cristina E

    2009-09-28

    This paper introduces the ant colony algorithm, a novel swarm intelligence based optimization method, to select appropriate wavelet coefficients from mass spectral data as a new feature selection method for ovarian cancer diagnostics. By determining the proper parameters for the ant colony algorithm (ACA) based searching algorithm, we perform the feature searching process for 100 times with the number of selected features fixed at 5. The results of this study show: (1) the classification accuracy based on the five selected wavelet coefficients can reach up to 100% for all the training, validating and independent testing sets; (2) the eight most popular selected wavelet coefficients of the 100 runs can provide 100% accuracy for the training set, 100% accuracy for the validating set, and 98.8% accuracy for the independent testing set, which suggests the robustness and accuracy of the proposed feature selection method; and (3) the mass spectral data corresponding to the eight popular wavelet coefficients can be located by reverse wavelet transformation and these located mass spectral data still maintain high classification accuracies (100% for the training set, 97.6% for the validating set, and 98.8% for the testing set) and also provide sufficient physical and medical meaning for future ovarian cancer mechanism studies. Furthermore, the corresponding mass spectral data (potential biomarkers) are in good agreement with other studies which have used the same sample set. Together these results suggest this feature extraction strategy will benefit the development of intelligent and real-time spectroscopy instrumentation based diagnosis and monitoring systems.

  13. An improved wrapper-based feature selection method for machinery fault diagnosis.

    Science.gov (United States)

    Hui, Kar Hoou; Ooi, Ching Sheng; Lim, Meng Hee; Leong, Mohd Salman; Al-Obaidi, Salah Mahdi

    2017-01-01

    A major issue of machinery fault diagnosis using vibration signals is that it is over-reliant on personnel knowledge and experience in interpreting the signal. Thus, machine learning has been adapted for machinery fault diagnosis. The quantity and quality of the input features, however, influence the fault classification performance. Feature selection plays a vital role in selecting the most representative feature subset for the machine learning algorithm. In contrast, the trade-off relationship between capability when selecting the best feature subset and computational effort is inevitable in the wrapper-based feature selection (WFS) method. This paper proposes an improved WFS technique before integration with a support vector machine (SVM) model classifier as a complete fault diagnosis system for a rolling element bearing case study. The bearing vibration dataset made available by the Case Western Reserve University Bearing Data Centre was executed using the proposed WFS and its performance has been analysed and discussed. The results reveal that the proposed WFS secures the best feature subset with a lower computational effort by eliminating the redundancy of re-evaluation. The proposed WFS has therefore been found to be capable and efficient to carry out feature selection tasks.

  14. [Study on Sleep Staging Based on Support Vector Machines and Feature Selection in Single Channel Electroencephalogram].

    Science.gov (United States)

    Lin, Xiujing; Xia, Yongming; Qian, Songrong

    2015-06-01

    Sleep electroencephalogram (EEG) is an important index in diagnosing sleep disorders and related diseases. Manual sleep staging is time-consuming and often influenced by subjective factors. Existing automatic sleep staging methods have high complexity and a low accuracy rate. A sleep staging method based on support vector machines (SVM) and feature selection using single channel EEG single is proposed in this paper. Thirty-eight features were extracted from the single channel EEG signal. Then based on the feature selection method F-Score's definition, it was extended to multiclass with an added eliminate factor in order to find proper features, which were used as SVM classifier inputs. The eliminate factor was adopted to reduce the negative interaction of features to the result. Research on the F-Score with an added eliminate factor was further accomplished with the data from a standard open source database and the results were compared with none feature selection and standard F-Score feature selection. The results showed that the present method could effectively improve the sleep staging accuracy and reduce the computation time.

  15. Selecting frequency feature for license plate detection based on AdaBoost

    Science.gov (United States)

    Tan, Huachun; Chen, Hao; Deng, Yafeng; Liu, Junhui

    2009-01-01

    In this paper, a new method for license plate detection based on AdaBoost is proposed. In the new method, character frequency feature, which is powerful feature for detecting license plate character, are introduced to feature pool. The frequency features obtained from the FFT of horizontal projection of binary image are selected by AdaBoost. Then, Haar-like features selected by AdaBoost are used to capture subtle structure of license plate. Furthermore, considering the characteristic of Chinese license plate that there are two types of license plate: deeper background-lighter character and lighter background-deeper character license plates, two detectors are designed to extract different license plates respectively. Experimental results show the efficiency of the proposed method.

  16. Feature selection toolbox

    Czech Academy of Sciences Publication Activity Database

    Somol, Petr; Pudil, Pavel

    2002-01-01

    Roč. 35, č. 12 (2002), s. 2749-2759 ISSN 0031-3203 R&D Projects: GA MŠk ME 187; GA ČR GA402/01/0981 Institutional research plan: CEZ:AV0Z1075907 Keywords : pattern recognition * feature selection * subset search Subject RIV: JC - Computer Hardware ; Software Impact factor: 1.038, year: 2002 http://library.utia.cas.cz/separaty/historie/somol- feature selection toolbox.pdf

  17. Tensor-based Multi-view Feature Selection with Applications to Brain Diseases

    Science.gov (United States)

    Cao, Bokai; He, Lifang; Kong, Xiangnan; Yu, Philip S.; Hao, Zhifeng; Ragin, Ann B.

    2015-01-01

    In the era of big data, we can easily access information from multiple views which may be obtained from different sources or feature subsets. Generally, different views provide complementary information for learning tasks. Thus, multi-view learning can facilitate the learning process and is prevalent in a wide range of application domains. For example, in medical science, measurements from a series of medical examinations are documented for each subject, including clinical, imaging, immunologic, serologic and cognitive measures which are obtained from multiple sources. Specifically, for brain diagnosis, we can have different quantitative analysis which can be seen as different feature subsets of a subject. It is desirable to combine all these features in an effective way for disease diagnosis. However, some measurements from less relevant medical examinations can introduce irrelevant information which can even be exaggerated after view combinations. Feature selection should therefore be incorporated in the process of multi-view learning. In this paper, we explore tensor product to bring different views together in a joint space, and present a dual method of tensor-based multi-view feature selection (dual-Tmfs) based on the idea of support vector machine recursive feature elimination. Experiments conducted on datasets derived from neurological disorder demonstrate the features selected by our proposed method yield better classification performance and are relevant to disease diagnosis. PMID:25937823

  18. Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

    Science.gov (United States)

    Koromyslova, A.; Semenkina, M.; Sergienko, R.

    2017-02-01

    The text classification problem for natural language call routing was considered in the paper. Seven different term weighting methods were applied. As dimensionality reduction methods, the feature selection based on self-adaptive GA is considered. k-NN, linear SVM and ANN were used as classification algorithms. The tasks of the research are the following: perform research of text classification for natural language call routing with different term weighting methods and classification algorithms and investigate the feature selection method based on self-adaptive GA. The numerical results showed that the most effective term weighting is TRR. The most effective classification algorithm is ANN. Feature selection with self-adaptive GA provides improvement of classification effectiveness and significant dimensionality reduction with all term weighting methods and with all classification algorithms.

  19. DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm

    KAUST Repository

    Soufan, Othman

    2015-02-26

    Many scientific problems can be formulated as classification tasks. Data that harbor relevant information are usually described by a large number of features. Frequently, many of these features are irrelevant for the class prediction. The efficient implementation of classification models requires identification of suitable combinations of features. The smaller number of features reduces the problem\\'s dimensionality and may result in higher classification performance. We developed DWFS, a web-based tool that allows for efficient selection of features for a variety of problems. DWFS follows the wrapper paradigm and applies a search strategy based on Genetic Algorithms (GAs). A parallel GA implementation examines and evaluates simultaneously large number of candidate collections of features. DWFS also integrates various filteringmethods thatmay be applied as a pre-processing step in the feature selection process. Furthermore, weights and parameters in the fitness function of GA can be adjusted according to the application requirements. Experiments using heterogeneous datasets from different biomedical applications demonstrate that DWFS is fast and leads to a significant reduction of the number of features without sacrificing performance as compared to several widely used existing methods. DWFS can be accessed online at www.cbrc.kaust.edu.sa/dwfs.

  20. An enhanced PSO-DEFS based feature selection with biometric authentication for identification of diabetic retinopathy

    Directory of Open Access Journals (Sweden)

    Umarani Balakrishnan

    2016-11-01

    Full Text Available Recently, automatic diagnosis of diabetic retinopathy (DR from the retinal image is the most significant research topic in the medical applications. Diabetic macular edema (DME is the major reason for the loss of vision in patients suffering from DR. Early identification of the DR enables to prevent the vision loss and encourage diabetic control activities. Many techniques are developed to diagnose the DR. The major drawbacks of the existing techniques are low accuracy and high time complexity. To overcome these issues, this paper proposes an enhanced particle swarm optimization-differential evolution feature selection (PSO-DEFS based feature selection approach with biometric authentication for the identification of DR. Initially, a hybrid median filter (HMF is used for pre-processing the input images. Then, the pre-processed images are embedded with each other by using least significant bit (LSB for authentication purpose. Simultaneously, the image features are extracted using convoluted local tetra pattern (CLTrP and Tamura features. Feature selection is performed using PSO-DEFS and PSO-gravitational search algorithm (PSO-GSA to reduce time complexity. Based on some performance metrics, the PSO-DEFS is chosen as a better choice for feature selection. The feature selection is performed based on the fitness value. A multi-relevance vector machine (M-RVM is introduced to classify the 13 normal and 62 abnormal images among 75 images from 60 patients. Finally, the DR patients are further classified by M-RVM. The experimental results exhibit that the proposed approach achieves better accuracy, sensitivity, and specificity than the existing techniques.

  1. An ant colony optimization based feature selection for web page classification.

    Science.gov (United States)

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  2. A ROC-based feature selection method for computer-aided detection and diagnosis

    Science.gov (United States)

    Wang, Songyuan; Zhang, Guopeng; Liao, Qimei; Zhang, Junying; Jiao, Chun; Lu, Hongbing

    2014-03-01

    Image-based computer-aided detection and diagnosis (CAD) has been a very active research topic aiming to assist physicians to detect lesions and distinguish them from benign to malignant. However, the datasets fed into a classifier usually suffer from small number of samples, as well as significantly less samples available in one class (have a disease) than the other, resulting in the classifier's suboptimal performance. How to identifying the most characterizing features of the observed data for lesion detection is critical to improve the sensitivity and minimize false positives of a CAD system. In this study, we propose a novel feature selection method mR-FAST that combines the minimal-redundancymaximal relevance (mRMR) framework with a selection metric FAST (feature assessment by sliding thresholds) based on the area under a ROC curve (AUC) generated on optimal simple linear discriminants. With three feature datasets extracted from CAD systems for colon polyps and bladder cancer, we show that the space of candidate features selected by mR-FAST is more characterizing for lesion detection with higher AUC, enabling to find a compact subset of superior features at low cost.

  3. A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms

    Directory of Open Access Journals (Sweden)

    Hongfang Zhou

    2016-01-01

    Full Text Available Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF, t-Test, and CMFS algorithms.

  4. An Empirical Study of Wrappers for Feature Subset Selection based on a Parallel Genetic Algorithm: The Multi-Wrapper Model

    KAUST Repository

    Soufan, Othman

    2012-09-01

    Feature selection is the first task of any learning approach that is applied in major fields of biomedical, bioinformatics, robotics, natural language processing and social networking. In feature subset selection problem, a search methodology with a proper criterion seeks to find the best subset of features describing data (relevance) and achieving better performance (optimality). Wrapper approaches are feature selection methods which are wrapped around a classification algorithm and use a performance measure to select the best subset of features. We analyze the proper design of the objective function for the wrapper approach and highlight an objective based on several classification algorithms. We compare the wrapper approaches to different feature selection methods based on distance and information based criteria. Significant improvement in performance, computational time, and selection of minimally sized feature subsets is achieved by combining different objectives for the wrapper model. In addition, considering various classification methods in the feature selection process could lead to a global solution of desirable characteristics.

  5. A PSO-based multi-objective multi-label feature selection method in classification.

    Science.gov (United States)

    Zhang, Yong; Gong, Dun-Wei; Sun, Xiao-Yan; Guo, Yi-Nan

    2017-03-23

    Feature selection is an important data preprocessing technique in multi-label classification. Although a large number of studies have been proposed to tackle feature selection problem, there are a few cases for multi-label data. This paper studies a multi-label feature selection algorithm using an improved multi-objective particle swarm optimization (PSO), with the purpose of searching for a Pareto set of non-dominated solutions (feature subsets). Two new operators are employed to improve the performance of the proposed PSO-based algorithm. One operator is adaptive uniform mutation with action range varying over time, which is used to extend the exploration capability of the swarm; another is a local learning strategy, which is designed to exploit the areas with sparse solutions in the search space. Moreover, the idea of the archive, and the crowding distance are applied to PSO for finding the Pareto set. Finally, experiments verify that the proposed algorithm is a useful approach of feature selection for multi-label classification problem.

  6. Automatic feature selection for model-based reinforcement learning in factored MDPs

    NARCIS (Netherlands)

    Kroon, M.; Whiteson, S.; Wani, M.A.; Kantardzic, M.; Palade, V.; Kurgan, L.; Qi, A.

    2009-01-01

    Feature selection is an important challenge in machine learning. Unfortunately, most methods for automating feature selection are designed for supervised learning tasks and are thus either inapplicable or impractical for reinforcement learning. This paper presents a new approach to feature selection

  7. Feature Selection based on Machine Learning in MRIs for Hippocampal Segmentation

    Science.gov (United States)

    Tangaro, Sabina; Amoroso, Nicola; Brescia, Massimo; Cavuoti, Stefano; Chincarini, Andrea; Errico, Rosangela; Paolo, Inglese; Longo, Giuseppe; Maglietta, Rosalia; Tateo, Andrea; Riccio, Giuseppe; Bellotti, Roberto

    2015-01-01

    Neurodegenerative diseases are frequently associated with structural changes in the brain. Magnetic resonance imaging (MRI) scans can show these variations and therefore can be used as a supportive feature for a number of neurodegenerative diseases. The hippocampus has been known to be a biomarker for Alzheimer disease and other neurological and psychiatric diseases. However, it requires accurate, robust, and reproducible delineation of hippocampal structures. Fully automatic methods are usually the voxel based approach; for each voxel a number of local features were calculated. In this paper, we compared four different techniques for feature selection from a set of 315 features extracted for each voxel: (i) filter method based on the Kolmogorov-Smirnov test; two wrapper methods, respectively, (ii) sequential forward selection and (iii) sequential backward elimination; and (iv) embedded method based on the Random Forest Classifier on a set of 10 T1-weighted brain MRIs and tested on an independent set of 25 subjects. The resulting segmentations were compared with manual reference labelling. By using only 23 feature for each voxel (sequential backward elimination) we obtained comparable state-of-the-art performances with respect to the standard tool FreeSurfer.

  8. A Hybrid PSO-DEFS Based Feature Selection for the Identification of Diabetic Retinopathy.

    Science.gov (United States)

    Balakrishnan, Umarani; Venkatachalapathy, Krishnamurthi; Marimuthu, Girirajkumar S

    2015-01-01

    Diabetic Retinopathy (DR) is an eye disease, which may cause blindness by the upsurge of insulin in blood. The major cause of visual loss in diabetic patient is macular edema. To diagnose and follow up Diabetic Macular Edema (DME), a powerful Optical Coherence Tomography (OCT) technique is used for the clinical assessment. Many existing methods found out the DME affected patients by estimating the fovea thickness. These methods have the issues of lower accuracy and higher time complexity. In order to overwhelm the above limitations, a hybrid approaches based DR detection is introduced in the proposed work. At first, the input image is preprocessed using green channel extraction and median filter. Subsequently, the features are extracted by gradient-based features like Histogram of Oriented Gradient (HOG) with Complete Local Binary Pattern (CLBP). The texture features are concentrated with various rotations to calculate the edges. We present a hybrid feature selection that combines the Particle Swarm Optimization (PSO) and Differential Evolution Feature Selection (DEFS) for minimizing the time complexity. A binary Support Vector Machine (SVM) classifier categorizes the 13 normal and 75 abnormal images from 60 patients. Finally, the patients affected by DR are further classified by Multi-Layer Perceptron (MLP). The experimental results exhibit better performance of accuracy, sensitivity, and specificity than the existing methods.

  9. A New Representation in PSO for Discretization-Based Feature Selection.

    Science.gov (United States)

    Tran, Binh; Xue, Bing; Zhang, Mengjie

    2017-06-23

    In machine learning, discretization and feature selection (FS) are important techniques for preprocessing data to improve the performance of an algorithm on high-dimensional data. Since many FS methods require discrete data, a common practice is to apply discretization before FS. In addition, for the sake of efficiency, features are usually discretized individually (or univariate). This scheme works based on the assumption that each feature independently influences the task, which may not hold in cases where feature interactions exist. Therefore, univariate discretization may degrade the performance of the FS stage since information showing feature interactions may be lost during the discretization process. Initial results of our previous proposed method [evolve particle swarm optimization (EPSO)] showed that combining discretization and FS in a single stage using bare-bones particle swarm optimization (BBPSO) can lead to a better performance than applying them in two separate stages. In this paper, we propose a new method called potential particle swarm optimization (PPSO) which employs a new representation that can reduce the search space of the problem and a new fitness function to better evaluate candidate solutions to guide the search. The results on ten high-dimensional datasets show that PPSO select less than 5% of the number of features for all datasets. Compared with the two-stage approach which uses BBPSO for FS on the discretized data, PPSO achieves significantly higher accuracy on seven datasets. In addition, PPSO obtains better (or similar) classification performance than EPSO on eight datasets with a smaller number of selected features on six datasets. Furthermore, PPSO also outperforms the three compared (traditional) methods and performs similar to one method on most datasets in terms of both generalization ability and learning capacity.

  10. Feature Selection and Blind Source Separation in an EEG-Based Brain-Computer Interface

    Directory of Open Access Journals (Sweden)

    Michael H. Thaut

    2005-11-01

    Full Text Available Most EEG-based BCI systems make use of well-studied patterns of brain activity. However, those systems involve tasks that indirectly map to simple binary commands such as “yes” or “no” or require many weeks of biofeedback training. We hypothesized that signal processing and machine learning methods can be used to discriminate EEG in a direct “yes”/“no” BCI from a single session. Blind source separation (BSS and spectral transformations of the EEG produced a 180-dimensional feature space. We used a modified genetic algorithm (GA wrapped around a support vector machine (SVM classifier to search the space of feature subsets. The GA-based search found feature subsets that outperform full feature sets and random feature subsets. Also, BSS transformations of the EEG outperformed the original time series, particularly in conjunction with a subset search of both spaces. The results suggest that BSS and feature selection can be used to improve the performance of even a “direct,” single-session BCI.

  11. A kernel-based multivariate feature selection method for microarray data classification.

    Directory of Open Access Journals (Sweden)

    Shiquan Sun

    Full Text Available High dimensionality and small sample sizes, and their inherent risk of overfitting, pose great challenges for constructing efficient classifiers in microarray data classification. Therefore a feature selection technique should be conducted prior to data classification to enhance prediction performance. In general, filter methods can be considered as principal or auxiliary selection mechanism because of their simplicity, scalability, and low computational complexity. However, a series of trivial examples show that filter methods result in less accurate performance because they ignore the dependencies of features. Although few publications have devoted their attention to reveal the relationship of features by multivariate-based methods, these methods describe relationships among features only by linear methods. While simple linear combination relationship restrict the improvement in performance. In this paper, we used kernel method to discover inherent nonlinear correlations among features as well as between feature and target. Moreover, the number of orthogonal components was determined by kernel Fishers linear discriminant analysis (FLDA in a self-adaptive manner rather than by manual parameter settings. In order to reveal the effectiveness of our method we performed several experiments and compared the results between our method and other competitive multivariate-based features selectors. In our comparison, we used two classifiers (support vector machine, [Formula: see text]-nearest neighbor on two group datasets, namely two-class and multi-class datasets. Experimental results demonstrate that the performance of our method is better than others, especially on three hard-classify datasets, namely Wang's Breast Cancer, Gordon's Lung Adenocarcinoma and Pomeroy's Medulloblastoma.

  12. Online feature selection with streaming features.

    Science.gov (United States)

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  13. Feature-based attention modulates direction-selective hemodynamic activity within human MT.

    Science.gov (United States)

    Stoppel, Christian Michael; Boehler, Carsten Nicolas; Strumpf, Hendrik; Heinze, Hans-Jochen; Noesselt, Toemme; Hopf, Jens-Max; Schoenfeld, Mircea Ariel

    2011-12-01

    Attending to the spatial location or to nonspatial features of a stimulus modulates neural activity in cortical areas that process its perceptual attributes. The feature-based attentional selection of the direction of a moving stimulus is associated with increased firing of individual neurons tuned to the direction of the movement in area V5/MT, while responses of neurons tuned to opposite directions are suppressed. However, it is not known how these multiplicatively scaled responses of individual neurons tuned to different motion-directions are integrated at the population level, in order to facilitate the processing of stimuli that match the perceptual goals. Using functional magnetic resonance imaging (fMRI) the present study revealed that attending to the movement direction of a dot field enhances the response in a number of areas including the human MT region (hMT) as a function of the coherence of the stimulus. Attending the opposite direction, however, lead to a suppressed response in hMT that was inversely correlated with stimulus-coherence. These findings demonstrate that the multiplicative scaling of single-neuron responses by feature-based attention results in an enhanced direction-selective population response within those cortical modules that processes the physical attributes of the attended stimuli. Our results provide strong support for the validity of the "feature similarity gain model" on the integrated population response as quantified by parametric fMRI in humans. Copyright © 2011 Wiley Periodicals, Inc.

  14. A Semidefinite Programming Based Search Strategy for Feature Selection with Mutual Information Measure.

    Science.gov (United States)

    Naghibi, Tofigh; Hoffmann, Sarah; Pfister, Beat

    2015-08-01

    Feature subset selection, as a special case of the general subset selection problem, has been the topic of a considerable number of studies due to the growing importance of data-mining applications. In the feature subset selection problem there are two main issues that need to be addressed: (i) Finding an appropriate measure function than can be fairly fast and robustly computed for high-dimensional data. (ii) A search strategy to optimize the measure over the subset space in a reasonable amount of time. In this article mutual information between features and class labels is considered to be the measure function. Two series expansions for mutual information are proposed, and it is shown that most heuristic criteria suggested in the literature are truncated approximations of these expansions. It is well-known that searching the whole subset space is an NP-hard problem. Here, instead of the conventional sequential search algorithms, we suggest a parallel search strategy based on semidefinite programming (SDP) that can search through the subset space in polynomial time. By exploiting the similarities between the proposed algorithm and an instance of the maximum-cut problem in graph theory, the approximation ratio of this algorithm is derived and is compared with the approximation ratio of the backward elimination method. The experiments show that it can be misleading to judge the quality of a measure solely based on the classification accuracy, without taking the effect of the non-optimum search strategy into account.

  15. A Meta-Heuristic Regression-Based Feature Selection for Predictive Analytics

    Directory of Open Access Journals (Sweden)

    Bharat Singh

    2014-11-01

    Full Text Available A high-dimensional feature selection having a very large number of features with an optimal feature subset is an NP-complete problem. Because conventional optimization techniques are unable to tackle large-scale feature selection problems, meta-heuristic algorithms are widely used. In this paper, we propose a particle swarm optimization technique while utilizing regression techniques for feature selection. We then use the selected features to classify the data. Classification accuracy is used as a criterion to evaluate classifier performance, and classification is accomplished through the use of k-nearest neighbour (KNN and Bayesian techniques. Various high dimensional data sets are used to evaluate the usefulness of the proposed approach. Results show that our approach gives better results when compared with other conventional feature selection algorithms.

  16. A hybrid approach to select features and classify diseases based on medical data

    Science.gov (United States)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  17. The Research and Application of SURF Algorithm Based on Feature Point Selection Algorithm

    Directory of Open Access Journals (Sweden)

    Zhang Fang Hu

    2014-04-01

    Full Text Available As the pixel information of depth image is derived from the distance information, when implementing SURF algorithm with KINECT sensor for static sign language recognition, there can be some mismatched pairs in palm area. This paper proposes a feature point selection algorithm, by filtering the SURF feature points step by step based on the number of feature points within adaptive radius r and the distance between the two points, it not only greatly improves the recognition rate, but also ensures the robustness under the environmental factors, such as skin color, illumination intensity, complex background, angle and scale changes. The experiment results show that the improved SURF algorithm can effectively improve the recognition rate, has a good robustness.

  18. Unsupervised Feature Selection Based on the Morisita Index for Hyperspectral Images

    Science.gov (United States)

    Golay, Jean; Kanevski, Mikhail

    2017-04-01

    Hyperspectral sensors are capable of acquiring images with hundreds of narrow and contiguous spectral bands. Compared with traditional multispectral imagery, the use of hyperspectral images allows better performance in discriminating between land-cover classes, but it also results in large redundancy and high computational data processing. To alleviate such issues, unsupervised feature selection techniques for redundancy minimization can be implemented. Their goal is to select the smallest subset of features (or bands) in such a way that all the information content of a data set is preserved as much as possible. The present research deals with the application to hyperspectral images of a recently introduced technique of unsupervised feature selection: the Morisita-Based filter for Redundancy Minimization (MBRM). MBRM is based on the (multipoint) Morisita index of clustering and on the Morisita estimator of Intrinsic Dimension (ID). The fundamental idea of the technique is to retain only the bands which contribute to increasing the ID of an image. In this way, redundant bands are disregarded, since they have no impact on the ID. Besides, MBRM has several advantages over benchmark techniques: in addition to its ability to deal with large data sets, it can capture highly-nonlinear dependences and its implementation is straightforward in any programming environment. Experimental results on freely available hyperspectral images show the good effectiveness of MBRM in remote sensing data processing. Comparisons with benchmark techniques are carried out and random forests are used to assess the performance of MBRM in reducing the data dimensionality without loss of relevant information. References [1] C. Traina Jr., A.J.M. Traina, L. Wu, C. Faloutsos, Fast feature selection using fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases, SBBD, pp. 158-171, 2000. [2] J. Golay, M. Kanevski, A new estimator of intrinsic dimension based on the multipoint

  19. Integrative approaches to the prediction of protein functions based on the feature selection

    Directory of Open Access Journals (Sweden)

    Lee Hyunju

    2009-12-01

    Full Text Available Abstract Background Protein function prediction has been one of the most important issues in functional genomics. With the current availability of various genomic data sets, many researchers have attempted to develop integration models that combine all available genomic data for protein function prediction. These efforts have resulted in the improvement of prediction quality and the extension of prediction coverage. However, it has also been observed that integrating more data sources does not always increase the prediction quality. Therefore, selecting data sources that highly contribute to the protein function prediction has become an important issue. Results We present systematic feature selection methods that assess the contribution of genome-wide data sets to predict protein functions and then investigate the relationship between genomic data sources and protein functions. In this study, we use ten different genomic data sources in Mus musculus, including: protein-domains, protein-protein interactions, gene expressions, phenotype ontology, phylogenetic profiles and disease data sources to predict protein functions that are labelled with Gene Ontology (GO terms. We then apply two approaches to feature selection: exhaustive search feature selection using a kernel based logistic regression (KLR, and a kernel based L1-norm regularized logistic regression (KL1LR. In the first approach, we exhaustively measure the contribution of each data set for each function based on its prediction quality. In the second approach, we use the estimated coefficients of features as measures of contribution of data sources. Our results show that the proposed methods improve the prediction quality compared to the full integration of all data sources and other filter-based feature selection methods. We also show that contributing data sources can differ depending on the protein function. Furthermore, we observe that highly contributing data sets can be similar among

  20. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines.

    Science.gov (United States)

    González, Alvaro J; Liao, Li

    2010-10-29

    Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http

  1. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    Directory of Open Access Journals (Sweden)

    Liao Li

    2010-10-01

    Full Text Available Abstract Background Protein-protein interaction (PPI plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs, based on domains represented as interaction profile hidden Markov models (ipHMM where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB. Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD. Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure, an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on

  2. Data Visualization and Feature Selection Methods in Gel-based Proteomics

    DEFF Research Database (Denmark)

    Silva, Tomé Santos; Richard, Nadege; Dias, Jorge P.

    2014-01-01

    must be given not only to ensure proper experimental design, experimental practice and 2DE technical performance, but also a valid approach for data acquisition, processing and analysis. This paper reviews and illustrates several different aspects of data analysis within the context of gel-based...... proteomics, summarizing the current state of research within this field. Particular focus is given on discussing the usefulness of available multivariate analysis tools both for data visualization and feature selection purposes. Visual examples are given using a real gel-based proteomic dataset as basis....

  3. An Appraisal Model Based on a Synthetic Feature Selection Approach for Students’ Academic Achievement

    Directory of Open Access Journals (Sweden)

    Ching-Hsue Cheng

    2017-11-01

    Full Text Available Obtaining necessary information (and even extracting hidden messages from existing big data, and then transforming them into knowledge, is an important skill. Data mining technology has received increased attention in various fields in recent years because it can be used to find historical patterns and employ machine learning to aid in decision-making. When we find unexpected rules or patterns from the data, they are likely to be of high value. This paper proposes a synthetic feature selection approach (SFSA, which is combined with a support vector machine (SVM to extract patterns and find the key features that influence students’ academic achievement. For verifying the proposed model, two databases, namely, “Student Profile” and “Tutorship Record”, were collected from an elementary school in Taiwan, and were concatenated into an integrated dataset based on students’ names as a research dataset. The results indicate the following: (1 the accuracy of the proposed feature selection approach is better than that of the Minimum-Redundancy-Maximum-Relevance (mRMR approach; (2 the proposed model is better than the listing methods when the six least influential features have been deleted; and (3 the proposed model can enhance the accuracy and facilitate the interpretation of the pattern from a hybrid-type dataset of students’ academic achievement.

  4. Rough-Fuzzy Clustering and Unsupervised Feature Selection for Wavelet Based MR Image Segmentation

    Science.gov (United States)

    Maji, Pradipta; Roy, Shaswati

    2015-01-01

    Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of brain magnetic resonance (MR) images. For many human experts, manual segmentation is a difficult and time consuming task, which makes an automated brain MR image segmentation method desirable. In this regard, this paper presents a new segmentation method for brain MR images, integrating judiciously the merits of rough-fuzzy computing and multiresolution image analysis technique. The proposed method assumes that the major brain tissues, namely, gray matter, white matter, and cerebrospinal fluid from the MR images are considered to have different textural properties. The dyadic wavelet analysis is used to extract the scale-space feature vector for each pixel, while the rough-fuzzy clustering is used to address the uncertainty problem of brain MR image segmentation. An unsupervised feature selection method is introduced, based on maximum relevance-maximum significance criterion, to select relevant and significant textural features for segmentation problem, while the mathematical morphology based skull stripping preprocessing step is proposed to remove the non-cerebral tissues like skull. The performance of the proposed method, along with a comparison with related approaches, is demonstrated on a set of synthetic and real brain MR images using standard validity indices. PMID:25848961

  5. Rough-fuzzy clustering and unsupervised feature selection for wavelet based MR image segmentation.

    Science.gov (United States)

    Maji, Pradipta; Roy, Shaswati

    2015-01-01

    Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of brain magnetic resonance (MR) images. For many human experts, manual segmentation is a difficult and time consuming task, which makes an automated brain MR image segmentation method desirable. In this regard, this paper presents a new segmentation method for brain MR images, integrating judiciously the merits of rough-fuzzy computing and multiresolution image analysis technique. The proposed method assumes that the major brain tissues, namely, gray matter, white matter, and cerebrospinal fluid from the MR images are considered to have different textural properties. The dyadic wavelet analysis is used to extract the scale-space feature vector for each pixel, while the rough-fuzzy clustering is used to address the uncertainty problem of brain MR image segmentation. An unsupervised feature selection method is introduced, based on maximum relevance-maximum significance criterion, to select relevant and significant textural features for segmentation problem, while the mathematical morphology based skull stripping preprocessing step is proposed to remove the non-cerebral tissues like skull. The performance of the proposed method, along with a comparison with related approaches, is demonstrated on a set of synthetic and real brain MR images using standard validity indices.

  6. Fisher Information Based Meteorological Factors Introduction and Features Selection for Short-Term Load Forecasting

    Directory of Open Access Journals (Sweden)

    Shuping Cai

    2018-03-01

    Full Text Available Weather information is an important factor in short-term load forecasting (STLF. However, for a long time, more importance has always been attached to forecasting models instead of other processes such as the introduction of weather factors or feature selection for STLF. The main aim of this paper is to develop a novel methodology based on Fisher information for meteorological variables introduction and variable selection in STLF. Fisher information computation for one-dimensional and multidimensional weather variables is first described, and then the introduction of meteorological factors and variables selection for STLF models are discussed in detail. On this basis, different forecasting models with the proposed methodology are established. The proposed methodology is implemented on real data obtained from Electric Power Utility of Zhenjiang, Jiangsu Province, in southeast China. The results show the advantages of the proposed methodology in comparison with other traditional ones regarding prediction accuracy, and it has very good practical significance. Therefore, it can be used as a unified method for introducing weather variables into STLF models, and selecting their features.

  7. Feature Selection and Kernel Learning for Local Learning-Based Clustering.

    Science.gov (United States)

    Zeng, Hong; Cheung, Yiu-ming

    2011-08-01

    The performance of the most clustering algorithms highly relies on the representation of data in the input space or the Hilbert space of kernel methods. This paper is to obtain an appropriate data representation through feature selection or kernel learning within the framework of the Local Learning-Based Clustering (LLC) (Wu and Schölkopf 2006) method, which can outperform the global learning-based ones when dealing with the high-dimensional data lying on manifold. Specifically, we associate a weight to each feature or kernel and incorporate it into the built-in regularization of the LLC algorithm to take into account the relevance of each feature or kernel for the clustering. Accordingly, the weights are estimated iteratively in the clustering process. We show that the resulting weighted regularization with an additional constraint on the weights is equivalent to a known sparse-promoting penalty. Hence, the weights of those irrelevant features or kernels can be shrunk toward zero. Extensive experiments show the efficacy of the proposed methods on the benchmark data sets.

  8. An MLP-based feature subset selection for HIV-1 protease cleavage site analysis.

    Science.gov (United States)

    Kim, Gilhan; Kim, Yeonjoo; Lim, Heuiseok; Kim, Hyeoncheol

    2010-01-01

    In recent years, several machine learning approaches have been applied to modeling the specificity of the human immunodeficiency virus type 1 (HIV-1) protease cleavage domain. However, the high dimensional domain dataset contains a small number of samples, which could misguide classification modeling and its interpretation. Appropriate feature selection can alleviate the problem by eliminating irrelevant and redundant features, and thus improve prediction performance. We introduce a new feature subset selection method, FS-MLP, that selects relevant features using multi-layered perceptron (MLP) learning. The method includes MLP learning with a training dataset and then feature subset selection using decompositional approach to analyze the trained MLP. Our method is able to select a subset of relevant features in high dimensional, multi-variate and non-linear domains. Using five artificial datasets that represent four data types, we verified the FS-MLP performance with seven other feature selection methods. Experimental results showed that the FS-MLP is superior at high dimensional, multi-variate and non-linear domains. In experiments with HIV-1 protease cleavage dataset, the FS-MLP selected a set of 14 highly relevant features among 160 original features. On a validation set of 131 test instances, classifiers that used the 14 features showed about 95% accuracy which outperformed other seven methods in terms of accuracy and the number of features. Our experimental results indicate that the FS-MLP is effective in analyzing multi-variate, non-linear and high dimensional datasets such as HIV-1 protease cleavage dataset. The 14 relevant features which were selected by the FS-MLP provide us with useful insights into the HIV-1 cleavage site domain as well. The FS-MLP is a useful method for computational sequence analysis in general. 2009 Elsevier B.V. All rights reserved.

  9. Genetic Particle Swarm Optimization-Based Feature Selection for Very-High-Resolution Remotely Sensed Imagery Object Change Detection.

    Science.gov (United States)

    Chen, Qiang; Chen, Yunhao; Jiang, Weiguo

    2016-07-30

    In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm.

  10. Research on Methods for Discovering and Selecting Cloud Infrastructure Services Based on Feature Modeling

    Directory of Open Access Journals (Sweden)

    Huamin Zhu

    2016-01-01

    Full Text Available Nowadays more and more cloud infrastructure service providers are providing large numbers of service instances which are a combination of diversified resources, such as computing, storage, and network. However, for cloud infrastructure services, the lack of a description standard and the inadequate research of systematic discovery and selection methods have exposed difficulties in discovering and choosing services for users. First, considering the highly configurable properties of a cloud infrastructure service, the feature model method is used to describe such a service. Second, based on the description of the cloud infrastructure service, a systematic discovery and selection method for cloud infrastructure services are proposed. The automatic analysis techniques of the feature model are introduced to verify the model’s validity and to perform the matching of the service and demand models. Finally, we determine the critical decision metrics and their corresponding measurement methods for cloud infrastructure services, where the subjective and objective weighting results are combined to determine the weights of the decision metrics. The best matching instances from various providers are then ranked by their comprehensive evaluations. Experimental results show that the proposed methods can effectively improve the accuracy and efficiency of cloud infrastructure service discovery and selection.

  11. Gene- and evidence-based candidate gene selection for schizophrenia and gene feature analysis.

    Science.gov (United States)

    Sun, Jingchun; Han, Leng; Zhao, Zhongming

    2010-01-01

    Schizophrenia is a chronic psychiatric disorder that affects about 1% of the population globally. A tremendous amount of effort has been expended in the past decade, including more than 2400 association studies, to identify genes influencing susceptibility to the disorder. However, few genes or markers have been reliably replicated. The wealth of this information calls for an integration of gene association data, evidence-based gene ranking, and follow-up replication in large sample. The objective of this study is to develop and evaluate evidence-based gene ranking methods and to examine the features of top-ranking candidate genes for schizophrenia. We proposed a gene-based approach for selecting and prioritizing candidate genes by combining odds ratios (ORs) of multiple markers in each association study and then combining ORs in multiple studies of a gene. We named it combination-combination OR method (CCOR). CCOR is similar to our recently published method, which first selects the largest OR of the markers in each study and then combines these ORs in multiple studies (i.e., selection-combination OR method, SCOR), but differs in selecting representative OR in each study. Features of top-ranking genes were examined by Gene Ontology terms and gene expression in tissues. Our evaluation suggested that the SCOR method overall outperforms the CCOR method. Using the SCOR, a list of 75 top-ranking genes was selected for schizophrenia candidate genes (SZGenes). We found that SZGenes had strong correlation with neuro-related functional terms and were highly expressed in brain-related tissues. The scientific landscape for schizophrenia genetics and other complex disease studies is expected to change dramatically in the next a few years, thus, the gene-based combined OR method is useful in candidate gene selection for follow-up association studies and in further artificial intelligence in medicine. This method for prioritization of candidate genes can be applied to other

  12. Feature Selection and Predictors of Falls with Foot Force Sensors Using KNN-Based Algorithms

    Directory of Open Access Journals (Sweden)

    Shengyun Liang

    2015-11-01

    Full Text Available The aging process may lead to the degradation of lower extremity function in the elderly population, which can restrict their daily quality of life and gradually increase the fall risk. We aimed to determine whether objective measures of physical function could predict subsequent falls. Ground reaction force (GRF data, which was quantified by sample entropy, was collected by foot force sensors. Thirty eight subjects (23 fallers and 15 non-fallers participated in functional movement tests, including walking and sit-to-stand (STS. A feature selection algorithm was used to select relevant features to classify the elderly into two groups: at risk and not at risk of falling down, for three KNN-based classifiers: local mean-based k-nearest neighbor (LMKNN, pseudo nearest neighbor (PNN, local mean pseudo nearest neighbor (LMPNN classification. We compared classification performances, and achieved the best results with LMPNN, with sensitivity, specificity and accuracy all 100%. Moreover, a subset of GRFs was significantly different between the two groups via Wilcoxon rank sum test, which is compatible with the classification results. This method could potentially be used by non-experts to monitor balance and the risk of falling down in the elderly population.

  13. A New Feature Selection Algorithm Based on the Mean Impact Variance

    Directory of Open Access Journals (Sweden)

    Weidong Cheng

    2014-01-01

    Full Text Available The selection of fewer or more representative features from multidimensional features is important when the artificial neural network (ANN algorithm is used as a classifier. In this paper, a new feature selection method called the mean impact variance (MIVAR method is proposed to determine the feature that is more suitable for classification. Moreover, this method is constructed on the basis of the training process of the ANN algorithm. To verify the effectiveness of the proposed method, the MIVAR value is used to rank the multidimensional features of the bearing fault diagnosis. In detail, (1 70-dimensional all waveform features are extracted from a rolling bearing vibration signal with four different operating states, (2 the corresponding MIVAR values of all 70-dimensional features are calculated to rank all features, (3 14 groups of 10-dimensional features are separately generated according to the ranking results and the principal component analysis (PCA algorithm and a back propagation (BP network is constructed, and (4 the validity of the ranking result is proven by training this BP network with these seven groups of 10-dimensional features and by comparing the corresponding recognition rates. The results prove that the features with larger MIVAR value can lead to higher recognition rates.

  14. A Learning-Based CT Prostate Segmentation Method via Joint Transductive Feature Selection and Regression.

    Science.gov (United States)

    Shi, Yinghuan; Gao, Yaozong; Liao, Shu; Zhang, Daoqiang; Gao, Yang; Shen, Dinggang

    2016-01-15

    In 1 recent years, there has been a great interest in prostate segmentation, which is a important and challenging task for CT image guided radiotherapy. In this paper, a learning-based segmentation method via joint transductive feature selection and transductive regression is presented, which incorporates the physician's simple manual specification (only taking a few seconds), to aid accurate segmentation, especially for the case with large irregular prostate motion. More specifically, for the current treatment image, experienced physician is first allowed to manually assign the labels for a small subset of prostate and non-prostate voxels, especially in the first and last slices of the prostate regions. Then, the proposed method follows the two step: in prostate-likelihood estimation step, two novel algorithms: tLasso and wLapRLS, will be sequentially employed for transductive feature selection and transductive regression, respectively, aiming to generate the prostate-likelihood map. In multi-atlases based label fusion step, the final segmentation result will be obtained according to the corresponding prostate-likelihood map and the previous images of the same patient. The proposed method has been substantially evaluated on a real prostate CT dataset including 24 patients with 330 CT images, and compared with several state-of-the-art methods. Experimental results show that the proposed method outperforms the state-of-the-arts in terms of higher Dice ratio, higher true positive fraction, and lower centroid distances. Also, the results demonstrate that simple manual specification can help improve the segmentation performance, which is clinically feasible in real practice.

  15. Feature selection for portfolio optimization

    DEFF Research Database (Denmark)

    Bjerring, Thomas Trier; Ross, Omri; Weissensteiner, Alex

    2016-01-01

    Most portfolio selection rules based on the sample mean and covariance matrix perform poorly out-of-sample. Moreover, there is a growing body of evidence that such optimization rules are not able to beat simple rules of thumb, such as 1/N. Parameter uncertainty has been identified as one major...... reason for these findings. A strand of literature addresses this problem by improving the parameter estimation and/or by relying on more robust portfolio selection methods. Independent of the chosen portfolio selection rule, we propose using feature selection first in order to reduce the asset menu....... While most of the diversification benefits are preserved, the parameter estimation problem is alleviated. We conduct out-of-sample back-tests to show that in most cases different well-established portfolio selection rules applied on the reduced asset universe are able to improve alpha relative...

  16. Short-term traffic flow forecasting based on feature selection with mutual information

    Science.gov (United States)

    Yuan, Zhengwu; Tu, Chuan

    2017-05-01

    Traffic flow forecasting is related to many traffic variables, and how to select appropriate traffic variable combination is very important to traffic flow forecasting, which can reduce the cost of calculation and improve the forecasting precision. In this paper, a feature selection technique with mutual information is proposed for this purpose. Firstly, the mutual information is used to evaluate the relevance and redundancy of variables, and feature selection is used to select the relevant variables and filter out the redundancy between the selected variables. Secondly, BP neural network is used as the forecasting engine. Finally, a numerical example of traffic flow data from Pems is used to verify the forecasting performance of the proposed method, the results indicate that the proposed method can effectively reduce the cost of calculation and also improve the model forecasting precision.

  17. Improved feature selection based on genetic algorithms for real time disruption prediction on JET

    Energy Technology Data Exchange (ETDEWEB)

    Ratta, G.A., E-mail: garatta@gateme.unsj.edu.ar [GATEME, Facultad de Ingenieria, Universidad Nacional de San Juan, Avda. San Martin 1109 (O), 5400 San Juan (Argentina); JET EFDA, Culham Science Centre, OX14 3DB Abingdon (United Kingdom); Vega, J. [Asociacion EURATOM/CIEMAT para Fusion, Avda. Complutense, 40, 28040 Madrid (Spain); JET EFDA, Culham Science Centre, OX14 3DB Abingdon (United Kingdom); Murari, A. [Associazione EURATOM-ENEA per la Fusione, Consorzio RFX, 4-35127 Padova (Italy); JET EFDA, Culham Science Centre, OX14 3DB Abingdon (United Kingdom)

    2012-09-15

    Highlights: Black-Right-Pointing-Pointer A new signal selection methodology to improve disruption prediction is reported. Black-Right-Pointing-Pointer The approach is based on Genetic Algorithms. Black-Right-Pointing-Pointer An advanced predictor has been created with the new set of signals. Black-Right-Pointing-Pointer The new system obtains considerably higher prediction rates. - Abstract: The early prediction of disruptions is an important aspect of the research in the field of Tokamak control. A very recent predictor, called 'Advanced Predictor Of Disruptions' (APODIS), developed for the 'Joint European Torus' (JET), implements the real time recognition of incoming disruptions with the best success rate achieved ever and an outstanding stability for long periods following training. In this article, a new methodology to select the set of the signals' parameters in order to maximize the performance of the predictor is reported. The approach is based on 'Genetic Algorithms' (GAs). With the feature selection derived from GAs, a new version of APODIS has been developed. The results are significantly better than the previous version not only in terms of success rates but also in extending the interval before the disruption in which reliable predictions are achieved. Correct disruption predictions with a success rate in excess of 90% have been achieved 200 ms before the time of the disruption. The predictor response is compared with that of JET's Protection System (JPS) and the ADODIS predictor is shown to be far superior. Both systems have been carefully tested with a wide number of discharges to understand their relative merits and the most profitable directions of further improvements.

  18. Improved feature selection based on genetic algorithms for real time disruption prediction on JET

    International Nuclear Information System (INIS)

    Rattá, G.A.; Vega, J.; Murari, A.

    2012-01-01

    Highlights: ► A new signal selection methodology to improve disruption prediction is reported. ► The approach is based on Genetic Algorithms. ► An advanced predictor has been created with the new set of signals. ► The new system obtains considerably higher prediction rates. - Abstract: The early prediction of disruptions is an important aspect of the research in the field of Tokamak control. A very recent predictor, called “Advanced Predictor Of Disruptions” (APODIS), developed for the “Joint European Torus” (JET), implements the real time recognition of incoming disruptions with the best success rate achieved ever and an outstanding stability for long periods following training. In this article, a new methodology to select the set of the signals’ parameters in order to maximize the performance of the predictor is reported. The approach is based on “Genetic Algorithms” (GAs). With the feature selection derived from GAs, a new version of APODIS has been developed. The results are significantly better than the previous version not only in terms of success rates but also in extending the interval before the disruption in which reliable predictions are achieved. Correct disruption predictions with a success rate in excess of 90% have been achieved 200 ms before the time of the disruption. The predictor response is compared with that of JET's Protection System (JPS) and the ADODIS predictor is shown to be far superior. Both systems have been carefully tested with a wide number of discharges to understand their relative merits and the most profitable directions of further improvements.

  19. A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest

    Directory of Open Access Journals (Sweden)

    Nantian Huang

    2016-09-01

    Full Text Available The prediction accuracy of short-term load forecast (STLF depends on prediction model choice and feature selection result. In this paper, a novel random forest (RF-based feature selection method for STLF is proposed. First, 243 related features were extracted from historical load data and the time information of prediction points to form the original feature set. Subsequently, the original feature set was used to train an RF as the original model. After the training process, the prediction error of the original model on the test set was recorded and the permutation importance (PI value of each feature was obtained. Then, an improved sequential backward search method was used to select the optimal forecasting feature subset based on the PI value of each feature. Finally, the optimal forecasting feature subset was used to train a new RF model as the final prediction model. Experiments showed that the prediction accuracy of RF trained by the optimal forecasting feature subset was higher than that of the original model and comparative models based on support vector regression and artificial neural network.

  20. Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data.

    Science.gov (United States)

    Liu, Zhenqiu; Hsiao, William; Cantarel, Brandi L; Drábek, Elliott Franco; Fraser-Liggett, Claire

    2011-12-01

    Direct sequencing of microbes in human ecosystems (the human microbiome) has complemented single genome cultivation and sequencing to understand and explore the impact of commensal microbes on human health. As sequencing technologies improve and costs decline, the sophistication of data has outgrown available computational methods. While several existing machine learning methods have been adapted for analyzing microbiome data recently, there is not yet an efficient and dedicated algorithm available for multiclass classification of human microbiota. By combining instance-based and model-based learning, we propose a novel sparse distance-based learning method for simultaneous class prediction and feature (variable or taxa, which is used interchangeably) selection from multiple treatment populations on the basis of 16S rRNA sequence count data. Our proposed method simultaneously minimizes the intraclass distance and maximizes the interclass distance with many fewer estimated parameters than other methods. It is very efficient for problems with small sample sizes and unbalanced classes, which are common in metagenomic studies. We implemented this method in a MATLAB toolbox called MetaDistance. We also propose several approaches for data normalization and variance stabilization transformation in MetaDistance. We validate this method on several real and simulated 16S rRNA datasets to show that it outperforms existing methods for classifying metagenomic data. This article is the first to address simultaneous multifeature selection and class prediction with metagenomic count data. The MATLAB toolbox is freely available online at http://metadistance.igs.umaryland.edu/. zliu@umm.edu Supplementary data are available at Bioinformatics online.

  1. Human activity recognition based on feature selection in smart home using back-propagation algorithm.

    Science.gov (United States)

    Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei

    2014-09-01

    In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  2. Fuzzy Mutual Information Based min-Redundancy and Max-Relevance Heterogeneous Feature Selection

    Directory of Open Access Journals (Sweden)

    Daren Yu

    2011-08-01

    Full Text Available Feature selection is an important preprocessing step in pattern classification and machine learning, and mutual information is widely used to measure relevance between features and decision. However, it is difficult to directly calculate relevance between continuous or fuzzy features using mutual information. In this paper we introduce the fuzzy information entropy and fuzzy mutual information for computing relevance between numerical or fuzzy features and decision. The relationship between fuzzy information entropy and differential entropy is also discussed. Moreover, we combine fuzzy mutual information with qmin-Redundancy-Max-Relevanceq, qMax-Dependencyq and min-Redundancy-Max-Dependencyq algorithms. The performance and stability of the proposed algorithms are tested on benchmark data sets. Experimental results show the proposed algorithms are effective and stable.

  3. TEHRAN AIR POLLUTANTS PREDICTION BASED ON RANDOM FOREST FEATURE SELECTION METHOD

    Directory of Open Access Journals (Sweden)

    A. Shamsoddini

    2017-09-01

    Full Text Available Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

  4. Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method

    Science.gov (United States)

    Shamsoddini, A.; Aboodi, M. R.; Karami, J.

    2017-09-01

    Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

  5. Multi-Stage Feature Selection by Using Genetic Algorithms for Fault Diagnosis in Gearboxes Based on Vibration Signal.

    Science.gov (United States)

    Cerrada, Mariela; Vinicio Sánchez, René; Cabrera, Diego; Zurita, Grover; Li, Chuan

    2015-09-18

    There are growing demands for condition-based monitoring of gearboxes, and techniques to improve the reliability, effectiveness and accuracy for fault diagnosis are considered valuable contributions. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance in the diagnosis system. The main aim of this research is to propose a multi-stage feature selection mechanism for selecting the best set of condition parameters on the time, frequency and time-frequency domains, which are extracted from vibration signals for fault diagnosis purposes in gearboxes. The selection is based on genetic algorithms, proposing in each stage a new subset of the best features regarding the classifier performance in a supervised environment. The selected features are augmented at each stage and used as input for a neural network classifier in the next step, while a new subset of feature candidates is treated by the selection process. As a result, the inherent exploration and exploitation of the genetic algorithms for finding the best solutions of the selection problem are locally focused. The Sensors 2015, 15 23904 approach is tested on a dataset from a real test bed with several fault classes under different running conditions of load and velocity. The model performance for diagnosis is over 98%.

  6. BLProt: Prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    KAUST Repository

    Kandaswamy, Krishna Kumar

    2011-08-17

    Background: Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence.Results: In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated.Conclusion: BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. 2011 Kandaswamy et al; licensee BioMed Central Ltd.

  7. A Flexible Mechanism of Rule Selection Enables Rapid Feature-Based Reinforcement Learning.

    Science.gov (United States)

    Balcarras, Matthew; Womelsdorf, Thilo

    2016-01-01

    Learning in a new environment is influenced by prior learning and experience. Correctly applying a rule that maps a context to stimuli, actions, and outcomes enables faster learning and better outcomes compared to relying on strategies for learning that are ignorant of task structure. However, it is often difficult to know when and how to apply learned rules in new contexts. In our study we explored how subjects employ different strategies for learning the relationship between stimulus features and positive outcomes in a probabilistic task context. We test the hypothesis that task naive subjects will show enhanced learning of feature specific reward associations by switching to the use of an abstract rule that associates stimuli by feature type and restricts selections to that dimension. To test this hypothesis we designed a decision making task where subjects receive probabilistic feedback following choices between pairs of stimuli. In the task, trials are grouped in two contexts by blocks, where in one type of block there is no unique relationship between a specific feature dimension (stimulus shape or color) and positive outcomes, and following an un-cued transition, alternating blocks have outcomes that are linked to either stimulus shape or color. Two-thirds of subjects (n = 22/32) exhibited behavior that was best fit by a hierarchical feature-rule model. Supporting the prediction of the model mechanism these subjects showed significantly enhanced performance in feature-reward blocks, and rapidly switched their choice strategy to using abstract feature rules when reward contingencies changed. Choice behavior of other subjects (n = 10/32) was fit by a range of alternative reinforcement learning models representing strategies that do not benefit from applying previously learned rules. In summary, these results show that untrained subjects are capable of flexibly shifting between behavioral rules by leveraging simple model-free reinforcement learning and context

  8. Feature selection method based on support vector machine and shape analysis for high-throughput medical data.

    Science.gov (United States)

    Liu, Qiong; Gu, Qiong; Wu, Zhao

    2017-12-01

    Proteomics data analysis based on the mass-spectrometry technique can provide a powerful tool for early diagnosis of tumors and other diseases. It can be used for exploring the features that reflect the difference between samples from high-throughput mass spectrometry data, which are important for the identification of tumor markers. Proteomics mass spectrometry data have the characteristics of too few samples, too many features and noise interference, which pose a great challenge to traditional machine learning methods. Traditional unsupervised dimensionality reduction methods do not utilize the label information effectively, so the subspaces they find may not be the most separable ones of the data. To overcome the shortcomings of traditional methods, in this paper, we present a novel feature selection method based on support vector machine (SVM) and shape analysis. In the process of feature selection, our method considers not only the interaction between features but also the relationship between features and class labels, which improves the classification performance. The experimental results obtained from four groups of proteomics data show that, compared with traditional unsupervised feature extraction methods (i.e., Principal Component Analysis - Procrustes Analysis, PCA-PA), our method not only ensures that fewer features are selected but also ensures a high recognition rate. In addition, compared with the two kinds of multivariate filter methods, i.e., Max-Relevance Min-Redundancy (MRMR) and Fast Correlation-Based Filter (FCBF), our method has a higher recognition rate. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Improving mass detection using combined feature representations from projection views and reconstructed volume of DBT and boosting based classification with feature selection

    International Nuclear Information System (INIS)

    Kim, Dae Hoe; Kim, Seong Tae; Ro, Yong Man

    2015-01-01

    In digital breast tomosynthesis (DBT), image characteristics of projection views and reconstructed volume are different and both have the advantage of detecting breast masses, e.g. reconstructed volume mitigates a tissue overlap, while projection views have less reconstruction blur artifacts. In this paper, an improved mass detection is proposed by using combined feature representations from projection views and reconstructed volume in the DBT. To take advantage of complementary effects on different image characteristics of both data, combined feature representations are extracted from both projection views and reconstructed volume concurrently. An indirect region-of-interest segmentation in projection views, which projects volume-of-interest in reconstructed volume into the corresponding projection views, is proposed to extract combined feature representations. In addition, a boosting based classification with feature selection has been employed for selecting effective feature representations among a large number of combined feature representations, and for reducing false positives. Experiments have been conducted on a clinical data set that contains malignant masses. Experimental results demonstrate that the proposed mass detection can achieve high sensitivity with a small number of false positives. In addition, the experimental results demonstrate that the selected feature representations for classifying masses complementarily come from both projection views and reconstructed volume. (paper)

  10. Selecting an informative features vocabulary for recognition algorithms based on Fourier-descriptors

    Directory of Open Access Journals (Sweden)

    V. Ya. Kolyuchkin

    2014-01-01

    Full Text Available Working vocabulary of features include most informative features of objects to be recognized. The aim is to develop a method of forming a working vocabulary of features for recognition algorithms based on Fourier-descriptors of the object image contours.To solve this problem the paper offers to use the method of functional maximization that is the ratio of the distance between the classes to the spread of objects within each of the classes represented in the feature space, which is formed on the basis of Fourier-descriptors.To check the effectiveness of the proposed method to form a working vocabulary of features the numerical experiments have been carried out. The experiments used two databases of reference images consisting of 10 and 13 reference images. Test images obtained by rotating the reference images, by zooming, as well as by adding the noise using the normal law of distribution have been created from these images. The proposed by the author algorithm, which uses the Prewitt operator, threshold segmentation, and morphological processing has marked the contours of images. The original vocabulary of features derived from the Fourier-descriptors has dimension of 98. The vocabularies of working features having the dimensions, respectively, 3 and 4 have been formed on the basis of functional maximization for both reference images. In the course of numerical experiments the frequency of correct decisions to recognise the features of reference bases of images for the original and working vocabularies has been evaluated. It has been proved that the algorithm of recognition with the formed working vocabularies of features provides a great efficiency of automatic recognition of objects.There are known publications, which use a similar method to form a working vocabulary of features in algorithms of human recognition by the image. But there are no publications on choosing the vocabulary of features for recognition algorithms based on the

  11. BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    Directory of Open Access Journals (Sweden)

    Hazrati Mehrnaz

    2011-08-01

    Full Text Available Abstract Background Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence. Results In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated. Conclusion BLProt achieves 80% accuracy from training (5 fold cross-validations and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. The BLProt software is available at http://www.inb.uni-luebeck.de/tools-demos/bioluminescent%20protein/BLProt

  12. Feature Selection and Classification of Electroencephalographic Signals: An Artificial Neural Network and Genetic Algorithm Based Approach.

    Science.gov (United States)

    Erguzel, Turker Tekin; Ozekes, Serhat; Tan, Oguz; Gultekin, Selahattin

    2015-10-01

    Feature selection is an important step in many pattern recognition systems aiming to overcome the so-called curse of dimensionality. In this study, an optimized classification method was tested in 147 patients with major depressive disorder (MDD) treated with repetitive transcranial magnetic stimulation (rTMS). The performance of the combination of a genetic algorithm (GA) and a back-propagation (BP) neural network (BPNN) was evaluated using 6-channel pre-rTMS electroencephalographic (EEG) patterns of theta and delta frequency bands. The GA was first used to eliminate the redundant and less discriminant features to maximize classification performance. The BPNN was then applied to test the performance of the feature subset. Finally, classification performance using the subset was evaluated using 6-fold cross-validation. Although the slow bands of the frontal electrodes are widely used to collect EEG data for patients with MDD and provide quite satisfactory classification results, the outcomes of the proposed approach indicate noticeably increased overall accuracy of 89.12% and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.904 using the reduced feature set. © EEG and Clinical Neuroscience Society (ECNS) 2014.

  13. Pitting binding against selection--electrophysiological measures of feature-based attention are attenuated by Gestalt object grouping.

    Science.gov (United States)

    Snyder, Adam C; Fiebelkorn, Ian C; Foxe, John J

    2012-03-01

    Humans have limited cognitive resources to process the nearly limitless information available in the environment. Endogenous, or 'top-down', selective attention to basic visual features such as color or motion is a common strategy for biasing resources in favor of the most relevant information sources in a given context. Opposing this top-down separation of features is a 'bottom-up' tendency to integrate, or bind, the various features that constitute objects. We pitted these two processes against each other in an electrophysiological experiment to test if top-down selective attention can overcome constitutive binding processes. Our results demonstrate that bottom-up binding processes can dominate top-down feature-based attention even when explicitly detrimental to task performance. © 2012 The Authors. European Journal of Neuroscience © 2012 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.

  14. Prediction of antimicrobial peptides based on sequence alignment and feature selection methods.

    Directory of Open Access Journals (Sweden)

    Ping Wang

    Full Text Available Antimicrobial peptides (AMPs represent a class of natural peptides that form a part of the innate immune system, and this kind of 'nature's antibiotics' is quite promising for solving the problem of increasing antibiotic resistance. In view of this, it is highly desired to develop an effective computational method for accurately predicting novel AMPs because it can provide us with more candidates and useful insights for drug design. In this study, a new method for predicting AMPs was implemented by integrating the sequence alignment method and the feature selection method. It was observed that, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was over 80.23%, and the Mathews correlation coefficient is 0.73, indicating a good prediction. Moreover, it is indicated by an in-depth feature analysis that the results are quite consistent with the previously known knowledge that some amino acids are preferential in AMPs and that these amino acids do play an important role for the antimicrobial activity. For the convenience of most experimental scientists who want to use the prediction method without the interest to follow the mathematical details, a user-friendly web-server is provided at http://amp.biosino.org/.

  15. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

    Science.gov (United States)

    Ma, Xin; Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

  16. Genetic search feature selection for affective modeling

    DEFF Research Database (Denmark)

    Martínez, Héctor P.; Yannakakis, Georgios N.

    2010-01-01

    Automatic feature selection is a critical step towards the generation of successful computational models of affect. This paper presents a genetic search-based feature selection method which is developed as a global-search algorithm for improving the accuracy of the affective models built....... The method is tested and compared against sequential forward feature selection and random search in a dataset derived from a game survey experiment which contains bimodal input features (physiological and gameplay) and expressed pairwise preferences of affect. Results suggest that the proposed method...

  17. Feature selection in wind speed prediction systems based on a hybrid coral reefs optimization – Extreme learning machine approach

    International Nuclear Information System (INIS)

    Salcedo-Sanz, S.; Pastor-Sánchez, A.; Prieto, L.; Blanco-Aguilera, A.; García-Herrera, R.

    2014-01-01

    Highlights: • A novel approach for short-term wind speed prediction is presented. • The system is formed by a coral reefs optimization algorithm and an extreme learning machine. • Feature selection is carried out with the CRO to improve the ELM performance. • The method is tested in real wind farm data in USA, for the period 2007–2008. - Abstract: This paper presents a novel approach for short-term wind speed prediction based on a Coral Reefs Optimization algorithm (CRO) and an Extreme Learning Machine (ELM), using meteorological predictive variables from a physical model (the Weather Research and Forecast model, WRF). The approach is based on a Feature Selection Problem (FSP) carried out with the CRO, that must obtain a reduced number of predictive variables out of the total available from the WRF. This set of features will be the input of an ELM, that finally provides the wind speed prediction. The CRO is a novel bio-inspired approach, based on the simulation of reef formation and coral reproduction, able to obtain excellent results in optimization problems. On the other hand, the ELM is a new paradigm in neural networks’ training, that provides a robust and extremely fast training of the network. Together, these algorithms are able to successfully solve this problem of feature selection in short-term wind speed prediction. Experiments in a real wind farm in the USA show the excellent performance of the CRO–ELM approach in this FSP wind speed prediction problem

  18. Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy.

    Science.gov (United States)

    Welikala, R A; Fraz, M M; Dehmeshki, J; Hoppe, A; Tah, V; Mann, S; Williamson, T H; Barman, S A

    2015-07-01

    Proliferative diabetic retinopathy (PDR) is a condition that carries a high risk of severe visual impairment. The hallmark of PDR is the growth of abnormal new vessels. In this paper, an automated method for the detection of new vessels from retinal images is presented. This method is based on a dual classification approach. Two vessel segmentation approaches are applied to create two separate binary vessel map which each hold vital information. Local morphology features are measured from each binary vessel map to produce two separate 4-D feature vectors. Independent classification is performed for each feature vector using a support vector machine (SVM) classifier. The system then combines these individual outcomes to produce a final decision. This is followed by the creation of additional features to generate 21-D feature vectors, which feed into a genetic algorithm based feature selection approach with the objective of finding feature subsets that improve the performance of the classification. Sensitivity and specificity results using a dataset of 60 images are 0.9138 and 0.9600, respectively, on a per patch basis and 1.000 and 0.975, respectively, on a per image basis. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Feature selection for anomaly–based network intrusion detection using cluster validity indices

    CSIR Research Space (South Africa)

    Naidoo, Tyrone

    2015-09-01

    Full Text Available data, which is rarely available in operational networks. It uses normalized cluster validity indices as an objective function that is optimized over the search space of candidate feature subsets via a genetic algorithm. Feature sets produced...

  20. Classification of Alzheimer's disease patients with hippocampal shape wrapper-based feature selection and support vector machine

    Science.gov (United States)

    Young, Jonathan; Ridgway, Gerard; Leung, Kelvin; Ourselin, Sebastien

    2012-02-01

    It is well known that hippocampal atrophy is a marker of the onset of Alzheimer's disease (AD) and as a result hippocampal volumetry has been used in a number of studies to provide early diagnosis of AD and predict conversion of mild cognitive impairment patients to AD. However, rates of atrophy are not uniform across the hippocampus making shape analysis a potentially more accurate biomarker. This study studies the hippocampi from 226 healthy controls, 148 AD patients and 330 MCI patients obtained from T1 weighted structural MRI images from the ADNI database. The hippocampi are anatomically segmented using the MAPS multi-atlas segmentation method, and the resulting binary images are then processed with SPHARM software to decompose their shapes as a weighted sum of spherical harmonic basis functions. The resulting parameterizations are then used as feature vectors in Support Vector Machine (SVM) classification. A wrapper based feature selection method was used as this considers the utility of features in discriminating classes in combination, fully exploiting the multivariate nature of the data and optimizing the selected set of features for the type of classifier that is used. The leave-one-out cross validated accuracy obtained on training data is 88.6% for classifying AD vs controls and 74% for classifying MCI-converters vs MCI-stable with very compact feature sets, showing that this is a highly promising method. There is currently a considerable fall in accuracy on unseen data indicating that the feature selection is sensitive to the data used, however feature ensemble methods may overcome this.

  1. Feature selection toolbox software package

    Czech Academy of Sciences Publication Activity Database

    Pudil, Pavel; Novovičová, Jana; Somol, Petr

    2002-01-01

    Roč. 23, č. 4 (2002), s. 487-492 ISSN 0167-8655 R&D Projects: GA ČR GA402/01/0981 Institutional research plan: CEZ:AV0Z1075907 Keywords : pattern recognition * feature selection * loating search algorithms Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.409, year: 2002

  2. A Modified Feature Selection and Artificial Neural Network-Based Day-Ahead Load Forecasting Model for a Smart Grid

    Directory of Open Access Journals (Sweden)

    Ashfaq Ahmad

    2015-12-01

    Full Text Available In the operation of a smart grid (SG, day-ahead load forecasting (DLF is an important task. The SG can enhance the management of its conventional and renewable resources with a more accurate DLF model. However, DLF model development is highly challenging due to the non-linear characteristics of load time series in SGs. In the literature, DLF models do exist; however, these models trade off between execution time and forecast accuracy. The newly-proposed DLF model will be able to accurately predict the load of the next day with a fair enough execution time. Our proposed model consists of three modules; the data preparation module, feature selection and the forecast module. The first module makes the historical load curve compatible with the feature selection module. The second module removes redundant and irrelevant features from the input data. The third module, which consists of an artificial neural network (ANN, predicts future load on the basis of selected features. Moreover, the forecast module uses a sigmoid function for activation and a multi-variate auto-regressive model for weight updating during the training process. Simulations are conducted in MATLAB to validate the performance of our newly-proposed DLF model in terms of accuracy and execution time. Results show that our proposed modified feature selection and modified ANN (m(FS + ANN-based model for SGs is able to capture the non-linearity(ies in the history load curve with 97 . 11 % accuracy. Moreover, this accuracy is achieved at the cost of a fair enough execution time, i.e., we have decreased the average execution time of the existing FS + ANN-based model by 38 . 50 % .

  3. LLE Score: A New Filter-Based Unsupervised Feature Selection Method Based on Nonlinear Manifold Embedding and Its Application to Image Recognition.

    Science.gov (United States)

    Chao Yao; Ya-Feng Liu; Bo Jiang; Jungong Han; Junwei Han

    2017-11-01

    The task of feature selection is to find the most representative features from the original high-dimensional data. Because of the absence of the information of class labels, selecting the appropriate features in unsupervised learning scenarios is much harder than that in supervised scenarios. In this paper, we investigate the potential of locally linear embedding (LLE), which is a popular manifold learning method, in feature selection task. It is straightforward to apply the idea of LLE to the graph-preserving feature selection framework. However, we find that this straightforward application suffers from some problems. For example, it fails when the elements in the feature are all equal; it does not enjoy the property of scaling invariance and cannot capture the change of the graph efficiently. To solve these problems, we propose a new filter-based feature selection method based on LLE in this paper, which is named as LLE score. The proposed criterion measures the difference between the local structure of each feature and that of the original data. Our experiments of classification task on two face image data sets, an object image data set, and a handwriting digits data set show that LLE score outperforms state-of-the-art methods, including data variance, Laplacian score, and sparsity score.

  4. Building an intrusion detection system using a filter-based feature selection algorithm

    NARCIS (Netherlands)

    Ambusaidi, Mohammed A.; He, Xiangjian; Nanda, Priyadarsi; Tan, Zhiyuan

    2016-01-01

    Redundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a

  5. A novel computer-aided diagnosis system for breast MRI based on feature selection and ensemble learning.

    Science.gov (United States)

    Lu, Wei; Li, Zhe; Chu, Jinghui

    2017-04-01

    Breast cancer is a common cancer among women. With the development of modern medical science and information technology, medical imaging techniques have an increasingly important role in the early detection and diagnosis of breast cancer. In this paper, we propose an automated computer-aided diagnosis (CADx) framework for magnetic resonance imaging (MRI). The scheme consists of an ensemble of several machine learning-based techniques, including ensemble under-sampling (EUS) for imbalanced data processing, the Relief algorithm for feature selection, the subspace method for providing data diversity, and Adaboost for improving the performance of base classifiers. We extracted morphological, various texture, and Gabor features. To clarify the feature subsets' physical meaning, subspaces are built by combining morphological features with each kind of texture or Gabor feature. We tested our proposal using a manually segmented Region of Interest (ROI) data set, which contains 438 images of malignant tumors and 1898 images of normal tissues or benign tumors. Our proposal achieves an area under the ROC curve (AUC) value of 0.9617, which outperforms most other state-of-the-art breast MRI CADx systems. Compared with other methods, our proposal significantly reduces the false-positive classification rate. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Decision boundary feature selection for non-parametric classifier

    Science.gov (United States)

    Lee, Chulhee; Landgrebe, David A.

    1991-01-01

    Feature selection has been one of the most important topics in pattern recognition. Although many authors have studied feature selection for parametric classifiers, few algorithms are available for feature selection for nonparametric classifiers. In this paper we propose a new feature selection algorithm based on decision boundaries for nonparametric classifiers. We first note that feature selection for pattern recognition is equivalent to retaining 'discriminantly informative features', and a discriminantly informative feature is related to the decision boundary. A procedure to extract discriminantly informative features based on a decision boundary for nonparametric classification is proposed. Experiments show that the proposed algorithm finds effective features for the nonparametric classifier with Parzen density estimation.

  7. KLASIFIKASI INTI SEL PAP SMEAR BERDASARKAN ANALISIS TEKSTUR MENGGUNAKAN CORRELATION-BASED FEATURE SELECTION BERBASIS ALGORITMA C4.5

    Directory of Open Access Journals (Sweden)

    Toni Arifin

    2014-09-01

    Full Text Available Abstract - Pap Smear is an early examination to diagnose whether there’s indication cervical cancer or not, the process of observations were done by observing pap smear cell under the microscope. There’s so many research has been done to differentiate between normal and abnormal cell. In this research presents a classification of pap smear cell based on texture analysis. This research is using the Harlev image which amounts to 280 images, 140 images are used as training data and 140 images other are used as testing. On the texture analysis used Gray level Co-occurance Matrix (GLCM method with 5 parameters that is correlation, energy, homogeneity and entropy added by counting the value of brightness. For choose which the best attribute used correlation-based feature selection method and than used C45 algorithm for produce classification rule. The result accuracy of the classification normal and abnormal used decision tree C45 is 96,43% and errors in predicting is 3,57%. Keywords : Classification, Pap Smear cell image, texture analysis, Correlation-based feature selection, C45 algorithm. Abstrak - Pap Smear merupakan pemeriksaan dini untuk mendiagnosa apakah ada indikasi kanker serviks atau tidak, proses pengamatan dilakukan dengan mengamati sel pap smear dibawah mikroskop. Banyak penelitian yang telah dilakukan untuk membedakan antara sel normal dan abnormal. Dalam penelitian ini menyajikan klasifikasi inti sel pap smear berdasarkan analisis tektur. Citra yang digunakan dalam penelitian ini adalah citra Harlev yang berjumlah 280 citra, 140 citra digunakan sebagai data training dan 140 citra lain digunakan sebagai testing. Pada analisis tekstur mengunakan metode Gray level Co-occurrence Matrix (GLCM menggunakan 5 parameter yaitu korelasi, energi, homogenitas dan entropi ditambah dengan menghitung nilai brightness. Untuk memilih mana atribut terbaik digunakan metode correlation-based feature selection lalu digunakan algoritma C45 untuk

  8. Spectral Feature Selection for Data Mining

    CERN Document Server

    Zhao, Zheng Alan

    2011-01-01

    Spectral Feature Selection for Data Mining introduces a novel feature selection technique that establishes a general platform for studying existing feature selection algorithms and developing new algorithms for emerging problems in real-world applications. This technique represents a unified framework for supervised, unsupervised, and semisupervised feature selection. The book explores the latest research achievements, sheds light on new research directions, and stimulates readers to make the next creative breakthroughs. It presents the intrinsic ideas behind spectral feature selection, its th

  9. Influence of Feature Selection Methods on Classification Sensitivity Based on the Example of A Study of Polish Voivodship Tourist Attractiveness

    Directory of Open Access Journals (Sweden)

    Bąk Iwona

    2014-07-01

    Full Text Available The purpose of this article is to determine the influence of various methods of selection of diagnostic features on the sensitivity of classification. Three options of feature selection are presented: a parametric feature selection method with a sum (option I, a median of the correlation coefficients matrix column elements (option II and the method of a reversed matrix (option III. Efficiency of the groupings was verified by the indicators of homogeneity, heterogeneity and the correctness of grouping. In the assessment of group efficiency the approach with the Weber median was used. The undertaken problem was illustrated with a research into the tourist attractiveness of voivodships in Poland in 2011.

  10. Arc-Welding Spectroscopic Monitoring based on Feature Selection and Neural Networks

    Directory of Open Access Journals (Sweden)

    Jose M. Lopez- Higuera

    2008-10-01

    Full Text Available A new spectral processing technique designed for application in the on-line detection and classification of arc-welding defects is presented in this paper. A noninvasive fiber sensor embedded within a TIG torch collects the plasma radiation originated during the welding process. The spectral information is then processed in two consecutive stages. A compression algorithm is first applied to the data, allowing real-time analysis. The selected spectral bands are then used to feed a classification algorithm, which will be demonstrated to provide an efficient weld defect detection and classification. The results obtained with the proposed technique are compared to a similar processing scheme presented in previous works, giving rise to an improvement in the performance of the monitoring system.

  11. The co-feature ratio, a novel method for the measurement of chromatographic and signal selectivity in LC-MS-based metabolomics

    Energy Technology Data Exchange (ETDEWEB)

    Elmsjö, Albert, E-mail: Albert.Elmsjo@farmkemi.uu.se [Department of Medicinal Chemistry, Division of Analytical Pharmaceutical Chemistry, Uppsala University (Sweden); Haglöf, Jakob; Engskog, Mikael K.R. [Department of Medicinal Chemistry, Division of Analytical Pharmaceutical Chemistry, Uppsala University (Sweden); Nestor, Marika [Department of Immunology, Genetics and Pathology, Uppsala University (Sweden); Arvidsson, Torbjörn [Department of Medicinal Chemistry, Division of Analytical Pharmaceutical Chemistry, Uppsala University (Sweden); Medical Product Agency, Uppsala (Sweden); Pettersson, Curt [Department of Medicinal Chemistry, Division of Analytical Pharmaceutical Chemistry, Uppsala University (Sweden)

    2017-03-01

    Evaluation of analytical procedures, especially in regards to measuring chromatographic and signal selectivity, is highly challenging in untargeted metabolomics. The aim of this study was to suggest a new straightforward approach for a systematic examination of chromatographic and signal selectivity in LC-MS-based metabolomics. By calculating the ratio between each feature and its co-eluting features (the co-features), a measurement of the chromatographic selectivity (i.e. extent of co-elution) as well as the signal selectivity (e.g. amount of adduct formation) of each feature could be acquired, the co-feature ratio. This approach was used to examine possible differences in chromatographic and signal selectivity present in samples exposed to three different sample preparation procedures. The capability of the co-feature ratio was evaluated both in a classical targeted setting using isotope labelled standards as well as without standards in an untargeted setting. For the targeted analysis, several metabolites showed a skewed quantitative signal due to poor chromatographic selectivity and/or poor signal selectivity. Moreover, evaluation of the untargeted approach through multivariate analysis of the co-feature ratios demonstrated the possibility to screen for metabolites displaying poor chromatographic and/or signal selectivity characteristics. We conclude that the co-feature ratio can be a useful tool in the development and evaluation of analytical procedures in LC-MS-based metabolomics investigations. Increased selectivity through proper choice of analytical procedures may decrease the false positive and false negative discovery rate and thereby increase the validity of any metabolomic investigation. - Highlights: • The co-feature ratio (CFR) is introduced. • CFR measures chromatographic and signal selectivity of a feature. • CFR can be used for evaluating experimental procedures in metabolomics. • CFR can aid in locating features with poor selectivity.

  12. Normed kernel function-based fuzzy possibilistic C-means (NKFPCM) algorithm for high-dimensional breast cancer database classification with feature selection is based on Laplacian Score

    Science.gov (United States)

    Lestari, A. W.; Rustam, Z.

    2017-07-01

    In the last decade, breast cancer has become the focus of world attention as this disease is one of the primary leading cause of death for women. Therefore, it is necessary to have the correct precautions and treatment. In previous studies, Fuzzy Kennel K-Medoid algorithm has been used for multi-class data. This paper proposes an algorithm to classify the high dimensional data of breast cancer using Fuzzy Possibilistic C-means (FPCM) and a new method based on clustering analysis using Normed Kernel Function-Based Fuzzy Possibilistic C-Means (NKFPCM). The objective of this paper is to obtain the best accuracy in classification of breast cancer data. In order to improve the accuracy of the two methods, the features candidates are evaluated using feature selection, where Laplacian Score is used. The results show the comparison accuracy and running time of FPCM and NKFPCM with and without feature selection.

  13. Exploring Language-Independent Emotional Acoustic Features via Feature Selection

    OpenAIRE

    Shaukat, Arslan; Chen, Ke

    2010-01-01

    We propose a novel feature selection strategy to discover language-independent acoustic features that tend to be responsible for emotions regardless of languages, linguistics and other factors. Experimental results suggest that the language-independent feature subset discovered yields the performance comparable to the full feature set on various emotional speech corpora.

  14. A hybrid feature selection and health indicator construction scheme for delay-time-based degradation modelling of rolling element bearings

    Science.gov (United States)

    Zhang, Bin; Deng, Congying; Zhang, Yi

    2018-03-01

    Rolling element bearings are mechanical components used frequently in most rotating machinery and they are also vulnerable links representing the main source of failures in such systems. Thus, health condition monitoring and fault diagnosis of rolling element bearings have long been studied to improve operational reliability and maintenance efficiency of rotatory machines. Over the past decade, prognosis that enables forewarning of failure and estimation of residual life attracted increasing attention. To accurately and efficiently predict failure of the rolling element bearing, the degradation requires to be well represented and modelled. For this purpose, degradation of the rolling element bearing is analysed with the delay-time-based model in this paper. Also, a hybrid feature selection and health indicator construction scheme is proposed for extraction of the bearing health relevant information from condition monitoring sensor data. Effectiveness of the presented approach is validated through case studies on rolling element bearing run-to-failure experiments.

  15. WAVELENGTH SELECTION OF HYPERSPECTRAL LIDAR BASED ON FEATURE WEIGHTING FOR ESTIMATION OF LEAF NITROGEN CONTENT IN RICE

    Directory of Open Access Journals (Sweden)

    L. Du

    2016-06-01

    Full Text Available Hyperspectral LiDAR (HSL is a novel tool in the field of active remote sensing, which has been widely used in many domains because of its advantageous ability of spectrum-gained. Especially in the precise monitoring of nitrogen in green plants, the HSL plays a dispensable role. The exiting HSL system used for nitrogen status monitoring has a multi-channel detector, which can improve the spectral resolution and receiving range, but maybe result in data redundancy, difficulty in system integration and high cost as well. Thus, it is necessary and urgent to pick out the nitrogen-sensitive feature wavelengths among the spectral range. The present study, aiming at solving this problem, assigns a feature weighting to each centre wavelength of HSL system by using matrix coefficient analysis and divergence threshold. The feature weighting is a criterion to amend the centre wavelength of the detector to accommodate different purpose, especially the estimation of leaf nitrogen content (LNC in rice. By this way, the wavelengths high-correlated to the LNC can be ranked in a descending order, which are used to estimate rice LNC sequentially. In this paper, a HSL system which works based on a wide spectrum emission and a 32-channel detector is conducted to collect the reflectance spectra of rice leaf. These spectra collected by HSL cover a range of 538 nm – 910 nm with a resolution of 12 nm. These 32 wavelengths are strong absorbed by chlorophyll in green plant among this range. The relationship between the rice LNC and reflectance-based spectra is modeled using partial least squares (PLS and support vector machines (SVMs based on calibration and validation datasets respectively. The results indicate that I wavelength selection method of HSL based on feature weighting is effective to choose the nitrogen-sensitive wavelengths, which can also be co-adapted with the hardware of HSL system friendly. II The chosen wavelength has a high correlation with rice LNC

  16. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  17. Feature selection using feature dissimilarity measure and density ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... ical data for improving the models involved in sequence analysis, microarray analysis, spectral analysis, literature mining, etc. 21. Feature selection is useful for multiple reasons. The main objectives of feature selection are as follows: (a) accelerating the model creation task; (b) avoiding model over-fitting or ...

  18. Feature selection using feature dissimilarity measure and density ...

    Indian Academy of Sciences (India)

    Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is ...

  19. A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities

    Science.gov (United States)

    Valizade Hasanloei, Mohammad Amin; Sheikhpour, Razieh; Sarram, Mehdi Agha; Sheikhpour, Elnaz; Sharifi, Hamdollah

    2018-02-01

    Quantitative structure-activity relationship (QSAR) is an effective computational technique for drug design that relates the chemical structures of compounds to their biological activities. Feature selection is an important step in QSAR based drug design to select the most relevant descriptors. One of the most popular feature selection methods for classification problems is Fisher score which aim is to minimize the within-class distance and maximize the between-class distance. In this study, the properties of Fisher criterion were extended for QSAR models to define the new distance metrics based on the continuous activity values of compounds with known activities. Then, a semi-supervised feature selection method was proposed based on the combination of Fisher and Laplacian criteria which exploits both compounds with known and unknown activities to select the relevant descriptors. To demonstrate the efficiency of the proposed semi-supervised feature selection method in selecting the relevant descriptors, we applied the method and other feature selection methods on three QSAR data sets such as serine/threonine-protein kinase PLK3 inhibitors, ROCK inhibitors and phenol compounds. The results demonstrated that the QSAR models built on the selected descriptors by the proposed semi-supervised method have better performance than other models. This indicates the efficiency of the proposed method in selecting the relevant descriptors using the compounds with known and unknown activities. The results of this study showed that the compounds with known and unknown activities can be helpful to improve the performance of the combined Fisher and Laplacian based feature selection methods.

  20. The FlyCatwalk: a high-throughput feature-based sorting system for artificial selection in Drosophila.

    Science.gov (United States)

    Medici, Vasco; Vonesch, Sibylle Chantal; Fry, Steven N; Hafen, Ernst

    2015-01-02

    Experimental evolution is a powerful tool for investigating complex traits. Artificial selection can be applied for a specific trait and the resulting phenotypically divergent populations pool-sequenced to identify alleles that occur at substantially different frequencies in the extreme populations. To maximize the proportion of loci that are causal to the phenotype among all enriched loci, population size and number of replicates need to be high. These requirements have, in fact, limited evolution studies in higher organisms, where the time investment required for phenotyping is often prohibitive for large-scale studies. Animal size is a highly multigenic trait that remains poorly understood, and an experimental evolution approach may thus aid in gaining new insights into the genetic basis of this trait. To this end, we developed the FlyCatwalk, a fully automated, high-throughput system to sort live fruit flies (Drosophila melanogaster) based on morphometric traits. With the FlyCatwalk, we can detect gender and quantify body and wing morphology parameters at a four-old higher throughput compared with manual processing. The phenotyping results acquired using the FlyCatwalk correlate well with those obtained using the standard manual procedure. We demonstrate that an automated, high-throughput, feature-based sorting system is able to avoid previous limitations in population size and replicate numbers. Our approach can likewise be applied for a variety of traits and experimental settings that require high-throughput phenotyping. Copyright © 2015 Medici et al.

  1. Adversarial Feature Selection Against Evasion Attacks.

    Science.gov (United States)

    Zhang, Fei; Chan, Patrick P K; Biggio, Battista; Yeung, Daniel S; Roli, Fabio

    2016-03-01

    Pattern recognition and machine learning techniques have been increasingly adopted in adversarial settings such as spam, intrusion, and malware detection, although their security against well-crafted attacks that aim to evade detection by manipulating data at test time has not yet been thoroughly assessed. While previous work has been mainly focused on devising adversary-aware classification algorithms to counter evasion attempts, only few authors have considered the impact of using reduced feature sets on classifier security against the same attacks. An interesting, preliminary result is that classifier security to evasion may be even worsened by the application of feature selection. In this paper, we provide a more detailed investigation of this aspect, shedding some light on the security properties of feature selection against evasion attacks. Inspired by previous work on adversary-aware classifiers, we propose a novel adversary-aware feature selection model that can improve classifier security against evasion attacks, by incorporating specific assumptions on the adversary's data manipulation strategy. We focus on an efficient, wrapper-based implementation of our approach, and experimentally validate its soundness on different application examples, including spam and malware detection.

  2. Feature Selection for Object-Based Classification of High-Resolution Remote Sensing Images Based on the Combination of a Genetic Algorithm and Tabu Search

    Directory of Open Access Journals (Sweden)

    Lei Shi

    2018-01-01

    Full Text Available In object-based image analysis of high-resolution images, the number of features can reach hundreds, so it is necessary to perform feature reduction prior to classification. In this paper, a feature selection method based on the combination of a genetic algorithm (GA and tabu search (TS is presented. The proposed GATS method aims to reduce the premature convergence of the GA by the use of TS. A prematurity index is first defined to judge the convergence situation during the search. When premature convergence does take place, an improved mutation operator is executed, in which TS is performed on individuals with higher fitness values. As for the other individuals with lower fitness values, mutation with a higher probability is carried out. Experiments using the proposed GATS feature selection method and three other methods, a standard GA, the multistart TS method, and ReliefF, were conducted on WorldView-2 and QuickBird images. The experimental results showed that the proposed method outperforms the other methods in terms of the final classification accuracy.

  3. Feature dimensionality reduction for myoelectric pattern recognition: a comparison study of feature selection and feature projection methods.

    Science.gov (United States)

    Liu, Jie

    2014-12-01

    This study investigates the effect of the feature dimensionality reduction strategies on the classification of surface electromyography (EMG) signals toward developing a practical myoelectric control system. Two dimensionality reduction strategies, feature selection and feature projection, were tested on both EMG feature sets, respectively. A feature selection based myoelectric pattern recognition system was introduced to select the features by eliminating the redundant features of EMG recordings instead of directly choosing a subset of EMG channels. The Markov random field (MRF) method and a forward orthogonal search algorithm were employed to evaluate the contribution of each individual feature to the classification, respectively. Our results from 15 healthy subjects indicate that, with a feature selection analysis, independent of the type of feature set, across all subjects high overall accuracies can be achieved in classification of seven different forearm motions with a small number of top ranked original EMG features obtained from the forearm muscles (average overall classification accuracy >95% with 12 selected EMG features). Compared to various feature dimensionality reduction techniques in myoelectric pattern recognition, the proposed filter-based feature selection approach is independent of the type of classification algorithms and features, which can effectively reduce the redundant information not only across different channels, but also cross different features in the same channel. This may enable robust EMG feature dimensionality reduction without needing to change ongoing, practical use of classification algorithms, an important step toward clinical utility. Copyright © 2014 IPEM. Published by Elsevier Ltd. All rights reserved.

  4. A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

    Science.gov (United States)

    Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

    2017-09-01

    Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.

  5. Multi-domain feature selection in auditory MisMatch Negativity via PARAFAC-based template matching approach.

    Science.gov (United States)

    Pouryazdian, Saeed; Chang, Andrew; Bosnyak, Dan J; Trainor, Laurel J; Beheshti, Soosan; Krishnan, Sridhar

    2016-08-01

    MisMatch Negativity (MMN) is a small event-related potential (ERP) that provide an index of sensory learning and perceptual accuracy for the cognitive research. Group-level analysis plays an important role for detecting differences at group or condition level, especially when the signal-to-noise ratio is low. Tensor factorization has provided a framework for group-level analysis of ERPs by exploiting more information of brain responses in more domains simultaneously. A 4-way ERP tensor of time × frequency × channel × subjects/condition is generated and decomposed via PARAFAC. A crucial step after PARAFAC decomposition is to select the component that corresponds to the event of interest and moreover differentiates the two groups\\conditions. This is usually done manually, which is tedious when the number of components is high. Here we propose a technique to select the multi-domain feature of an ERP among all extracted features by a template matching approach, that uses the MMN temporal and spectral signatures. Following a statistical test, the selected feature significantly discriminated subjects for the two experimental conditions.

  6. PolSAR Land Cover Classification Based on Roll-Invariant and Selected Hidden Polarimetric Features in the Rotation Domain

    Directory of Open Access Journals (Sweden)

    Chensong Tao

    2017-07-01

    Full Text Available Land cover classification is an important application for polarimetric synthetic aperture radar (PolSAR. Target polarimetric response is strongly dependent on its orientation. Backscattering responses of the same target with different orientations to the SAR flight path may be quite different. This target orientation diversity effect hinders PolSAR image understanding and interpretation. Roll-invariant polarimetric features such as entropy, anisotropy, mean alpha angle, and total scattering power are independent of the target orientation and are commonly adopted for PolSAR image classification. On the other aspect, target orientation diversity also contains rich information which may not be sensed by roll-invariant polarimetric features. In this vein, only using the roll-invariant polarimetric features may limit the final classification accuracy. To address this problem, this work uses the recently reported uniform polarimetric matrix rotation theory and a visualization and characterization tool of polarimetric coherence pattern to investigate hidden polarimetric features in the rotation domain along the radar line of sight. Then, a feature selection scheme is established and a set of hidden polarimetric features are selected in the rotation domain. Finally, a classification method is developed using the complementary information between roll-invariant and selected hidden polarimetric features with a support vector machine (SVM/decision tree (DT classifier. Comparison experiments are carried out with NASA/JPL AIRSAR and multi-temporal UAVSAR data. For AIRSAR data, the overall classification accuracy of the proposed classification method is 95.37% (with SVM/96.38% (with DT, while that of the conventional classification method is 93.87% (with SVM/94.12% (with DT, respectively. Meanwhile, for multi-temporal UAVSAR data, the mean overall classification accuracy of the proposed method is up to 97.47% (with SVM/99.39% (with DT, which is also higher

  7. Discriminative semi-supervised feature selection via manifold regularization.

    Science.gov (United States)

    Xu, Zenglin; King, Irwin; Lyu, Michael Rung-Tsong; Jin, Rong

    2010-07-01

    Feature selection has attracted a huge amount of interest in both research and application communities of data mining. We consider the problem of semi-supervised feature selection, where we are given a small amount of labeled examples and a large amount of unlabeled examples. Since a small number of labeled samples are usually insufficient for identifying the relevant features, the critical problem arising from semi-supervised feature selection is how to take advantage of the information underneath the unlabeled data. To address this problem, we propose a novel discriminative semi-supervised feature selection method based on the idea of manifold regularization. The proposed approach selects features through maximizing the classification margin between different classes and simultaneously exploiting the geometry of the probability distribution that generates both labeled and unlabeled data. In comparison with previous semi-supervised feature selection algorithms, our proposed semi-supervised feature selection method is an embedded feature selection method and is able to find more discriminative features. We formulate the proposed feature selection method into a convex-concave optimization problem, where the saddle point corresponds to the optimal solution. To find the optimal solution, the level method, a fairly recent optimization method, is employed. We also present a theoretic proof of the convergence rate for the application of the level method to our problem. Empirical evaluation on several benchmark data sets demonstrates the effectiveness of the proposed semi-supervised feature selection method.

  8. Feature Selection Method Based on Artificial Bee Colony Algorithm and Support Vector Machines for Medical Datasets Classification

    Directory of Open Access Journals (Sweden)

    Mustafa Serter Uzer

    2013-01-01

    Full Text Available This paper offers a hybrid approach that uses the artificial bee colony (ABC algorithm for feature selection and support vector machines for classification. The purpose of this paper is to test the effect of elimination of the unimportant and obsolete features of the datasets on the success of the classification, using the SVM classifier. The developed approach conventionally used in liver diseases and diabetes diagnostics, which are commonly observed and reduce the quality of life, is developed. For the diagnosis of these diseases, hepatitis, liver disorders and diabetes datasets from the UCI database were used, and the proposed system reached a classification accuracies of 94.92%, 74.81%, and 79.29%, respectively. For these datasets, the classification accuracies were obtained by the help of the 10-fold cross-validation method. The results show that the performance of the method is highly successful compared to other results attained and seems very promising for pattern recognition applications.

  9. Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy

    OpenAIRE

    Welikala, R; Fraz, M; Dehmeshki, J; Hoppe, A; Tah, V; Mann, S; Williamson, T H; Barman, S A

    2015-01-01

    Proliferative diabetic retinopathy (PDR) is a condition that carries a high risk of severe visual impairment. The hallmark of PDR is the growth of abnormal new vessels. In this paper, an automated method for the detection of new vessels from retinal images is presented. This method is based on a dual classification approach. Two vessel segmentation approaches are applied to create two separate binary vessel map which each hold vital information. Local morphology features are measured from eac...

  10. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-13

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.

  11. Receptive fields selection for binary feature description.

    Science.gov (United States)

    Fan, Bin; Kong, Qingqun; Trzcinski, Tomasz; Wang, Zhiheng; Pan, Chunhong; Fua, Pascal

    2014-06-01

    Feature description for local image patch is widely used in computer vision. While the conventional way to design local descriptor is based on expert experience and knowledge, learning-based methods for designing local descriptor become more and more popular because of their good performance and data-driven property. This paper proposes a novel data-driven method for designing binary feature descriptor, which we call receptive fields descriptor (RFD). Technically, RFD is constructed by thresholding responses of a set of receptive fields, which are selected from a large number of candidates according to their distinctiveness and correlations in a greedy way. Using two different kinds of receptive fields (namely rectangular pooling area and Gaussian pooling area) for selection, we obtain two binary descriptors RFDR and RFDG .accordingly. Image matching experiments on the well-known patch data set and Oxford data set demonstrate that RFD significantly outperforms the state-of-the-art binary descriptors, and is comparable with the best float-valued descriptors at a fraction of processing time. Finally, experiments on object recognition tasks confirm that both RFDR and RFDG successfully bridge the performance gap between binary descriptors and their floating-point competitors.

  12. Feature selection environment for genomic applications

    Directory of Open Access Journals (Sweden)

    Martins David

    2008-10-01

    Full Text Available Abstract Background Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction. There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. Results The intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes (targets or predictors is also implemented in the system. Conclusion The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

  13. Classification Using Markov Blanket for Feature Selection

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Luo, Jian

    2009-01-01

    Selecting relevant features is in demand when a large data set is of interest in a classification task. It produces a tractable number of features that are sufficient and possibly improve the classification performance. This paper studies a statistical method of Markov blanket induction algorithm...... for filtering features and then applies a classifier using the Markov blanket predictors. The Markov blanket contains a minimal subset of relevant features that yields optimal classification performance. We experimentally demonstrate the improved performance of several classifiers using a Markov blanket...... induction as a feature selection method. In addition, we point out an important assumption behind the Markov blanket induction algorithm and show its effect on the classification performance....

  14. Feature Selection Criteria for Real Time EKF-SLAM Algorithm

    Directory of Open Access Journals (Sweden)

    Fernando Auat Cheein

    2010-02-01

    Full Text Available This paper presents a seletion procedure for environmet features for the correction stage of a SLAM (Simultaneous Localization and Mapping algorithm based on an Extended Kalman Filter (EKF. This approach decreases the computational time of the correction stage which allows for real and constant-time implementations of the SLAM. The selection procedure consists in chosing the features the SLAM system state covariance is more sensible to. The entire system is implemented on a mobile robot equipped with a range sensor laser. The features extracted from the environment correspond to lines and corners. Experimental results of the real time SLAM algorithm and an analysis of the processing-time consumed by the SLAM with the feature selection procedure proposed are shown. A comparison between the feature selection approach proposed and the classical sequential EKF-SLAM along with an entropy feature selection approach is also performed.

  15. Feature Selection for Ridge Regression with Provable Guarantees.

    Science.gov (United States)

    Paul, Saurabh; Drineas, Petros

    2016-04-01

    We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.

  16. Feature Selection via Chaotic Antlion Optimization.

    Science.gov (United States)

    Zawbaa, Hossam M; Emary, E; Grosan, Crina

    2016-01-01

    Selecting a subset of relevant properties from a large set of features that describe a dataset is a challenging machine learning task. In biology, for instance, the advances in the available technologies enable the generation of a very large number of biomarkers that describe the data. Choosing the more informative markers along with performing a high-accuracy classification over the data can be a daunting task, particularly if the data are high dimensional. An often adopted approach is to formulate the feature selection problem as a biobjective optimization problem, with the aim of maximizing the performance of the data analysis model (the quality of the data training fitting) while minimizing the number of features used. We propose an optimization approach for the feature selection problem that considers a "chaotic" version of the antlion optimizer method, a nature-inspired algorithm that mimics the hunting mechanism of antlions in nature. The balance between exploration of the search space and exploitation of the best solutions is a challenge in multi-objective optimization. The exploration/exploitation rate is controlled by the parameter I that limits the random walk range of the ants/prey. This variable is increased iteratively in a quasi-linear manner to decrease the exploration rate as the optimization progresses. The quasi-linear decrease in the variable I may lead to immature convergence in some cases and trapping in local minima in other cases. The chaotic system proposed here attempts to improve the tradeoff between exploration and exploitation. The methodology is evaluated using different chaotic maps on a number of feature selection datasets. To ensure generality, we used ten biological datasets, but we also used other types of data from various sources. The results are compared with the particle swarm optimizer and with genetic algorithm variants for feature selection using a set of quality metrics.

  17. Feature Selection via Chaotic Antlion Optimization.

    Directory of Open Access Journals (Sweden)

    Hossam M Zawbaa

    Full Text Available Selecting a subset of relevant properties from a large set of features that describe a dataset is a challenging machine learning task. In biology, for instance, the advances in the available technologies enable the generation of a very large number of biomarkers that describe the data. Choosing the more informative markers along with performing a high-accuracy classification over the data can be a daunting task, particularly if the data are high dimensional. An often adopted approach is to formulate the feature selection problem as a biobjective optimization problem, with the aim of maximizing the performance of the data analysis model (the quality of the data training fitting while minimizing the number of features used.We propose an optimization approach for the feature selection problem that considers a "chaotic" version of the antlion optimizer method, a nature-inspired algorithm that mimics the hunting mechanism of antlions in nature. The balance between exploration of the search space and exploitation of the best solutions is a challenge in multi-objective optimization. The exploration/exploitation rate is controlled by the parameter I that limits the random walk range of the ants/prey. This variable is increased iteratively in a quasi-linear manner to decrease the exploration rate as the optimization progresses. The quasi-linear decrease in the variable I may lead to immature convergence in some cases and trapping in local minima in other cases. The chaotic system proposed here attempts to improve the tradeoff between exploration and exploitation. The methodology is evaluated using different chaotic maps on a number of feature selection datasets. To ensure generality, we used ten biological datasets, but we also used other types of data from various sources. The results are compared with the particle swarm optimizer and with genetic algorithm variants for feature selection using a set of quality metrics.

  18. Evaluating feature-selection stability in next-generation proteomics.

    Science.gov (United States)

    Goh, Wilson Wen Bin; Wong, Limsoon

    2016-10-01

    Identifying reproducible yet relevant features is a major challenge in biological research. This is well documented in genomics data. Using a proposed set of three reliability benchmarks, we find that this issue exists also in proteomics for commonly used feature-selection methods, e.g. [Formula: see text]-test and recursive feature elimination. Moreover, due to high test variability, selecting the top proteins based on [Formula: see text]-value ranks - even when restricted to high-abundance proteins - does not improve reproducibility. Statistical testing based on networks are believed to be more robust, but this does not always hold true: The commonly used hypergeometric enrichment that tests for enrichment of protein subnets performs abysmally due to its dependence on unstable protein pre-selection steps. We demonstrate here for the first time the utility of a novel suite of network-based algorithms called ranked-based network algorithms (RBNAs) on proteomics. These have originally been introduced and tested extensively on genomics data. We show here that they are highly stable, reproducible and select relevant features when applied to proteomics data. It is also evident from these results that use of statistical feature testing on protein expression data should be executed with due caution. Careless use of networks does not resolve poor-performance issues, and can even mislead. We recommend augmenting statistical feature-selection methods with concurrent analysis on stability and reproducibility to improve the quality of the selected features prior to experimental validation.

  19. Floating search methods in feature selection

    Czech Academy of Sciences Publication Activity Database

    Pudil, Pavel; Novovičová, Jana; Kittler, J.

    1994-01-01

    Roč. 15, - (1994), s. 1119-1125 ISSN 0167-8655 Grant - others:GA AV(CZ) A275107 Impact factor: 0.381, year: 1994 http://library.utia.cas.cz/separaty/historie/somol-floating search methods in feature selection.pdf

  20. Fizzy: feature subset selection for metagenomics.

    Science.gov (United States)

    Ditzler, Gregory; Morrison, J Calvin; Lan, Yemin; Rosen, Gail L

    2015-11-04

    Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

  1. Online Feature Selection for Classifying Emphysema in HRCT Images

    Directory of Open Access Journals (Sweden)

    M. Prasad

    2008-06-01

    Full Text Available Feature subset selection, applied as a pre- processing step to machine learning, is valuable in dimensionality reduction, eliminating irrelevant data and improving classifier performance. In the classic formulation of the feature selection problem, it is assumed that all the features are available at the beginning. However, in many real world problems, there are scenarios where not all features are present initially and must be integrated as they become available. In such scenarios, online feature selection provides an efficient way to sort through a large space of features. It is in this context that we introduce online feature selection for the classification of emphysema, a smoking related disease that appears as low attenuation regions in High Resolution Computer Tomography (HRCT images. The technique was successfully evaluated on 61 HRCT scans and compared with different online feature selection approaches, including hill climbing, best first search, grafting, and correlation-based feature selection. The results were also compared against ldensity maskr, a standard approach used for emphysema detection in medical image analysis.

  2. Discriminative feature selection for visual tracking

    Science.gov (United States)

    Ma, Junkai; Luo, Haibo; Zhou, Wei; Song, Yingchao; Hui, Bin; Chang, Zheng

    2017-06-01

    Visual tracking is an important role in computer vision tasks. The robustness of tracking algorithm is a challenge. Especially in complex scenarios such as clutter background, illumination variation and appearance changes etc. As an important component in tracking algorithm, the appropriateness of feature is closed related to the tracking precision. In this paper, an online discriminative feature selection is proposed to provide the tracker the most discriminative feature. Firstly, a feature pool which contains different information of the image such as gradient, gray value and edge is built. And when every frame is processed during tracking, all of these features will be extracted. Secondly, these features are ranked depend on their discrimination between target and background and the highest scored feature is chosen to represent the candidate image patch. Then, after obtaining the tracking result, the target model will be update to adapt the appearance variation. The experiment show that our method is robust when compared with other state-of-the-art algorithms.

  3. Hierarchical feature selection for erythema severity estimation

    Science.gov (United States)

    Wang, Li; Shi, Chenbo; Shu, Chang

    2014-10-01

    At present PASI system of scoring is used for evaluating erythema severity, which can help doctors to diagnose psoriasis [1-3]. The system relies on the subjective judge of doctors, where the accuracy and stability cannot be guaranteed [4]. This paper proposes a stable and precise algorithm for erythema severity estimation. Our contributions are twofold. On one hand, in order to extract the multi-scale redness of erythema, we design the hierarchical feature. Different from traditional methods, we not only utilize the color statistical features, but also divide the detect window into small window and extract hierarchical features. Further, a feature re-ranking step is introduced, which can guarantee that extracted features are irrelevant to each other. On the other hand, an adaptive boosting classifier is applied for further feature selection. During the step of training, the classifier will seek out the most valuable feature for evaluating erythema severity, due to its strong learning ability. Experimental results demonstrate the high precision and robustness of our algorithm. The accuracy is 80.1% on the dataset which comprise 116 patients' images with various kinds of erythema. Now our system has been applied for erythema medical efficacy evaluation in Union Hosp, China.

  4. A Hybrid Feature Selection Approach for Arabic Documents Classification

    NARCIS (Netherlands)

    Habib, Mena Badieh; Sarhan, Ahmed A. E.; Salem, Abdel-Badeeh M.; Fayed, Zaki T.; Gharib, Tarek F.

    Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge number of features. Feature selection tries to

  5. Tracing the breeding farm of domesticated pig using feature selection (

    Directory of Open Access Journals (Sweden)

    Taehyung Kwon

    2017-11-01

    Full Text Available Objective Increasing food safety demands in the animal product market have created a need for a system to trace the food distribution process, from the manufacturer to the retailer, and genetic traceability is an effective method to trace the origin of animal products. In this study, we successfully achieved the farm tracing of 6,018 multi-breed pigs, using single nucleotide polymorphism (SNP markers strictly selected through least absolute shrinkage and selection operator (LASSO feature selection. Methods We performed farm tracing of domesticated pig (Sus scrofa from SNP markers and selected the most relevant features for accurate prediction. Considering multi-breed composition of our data, we performed feature selection using LASSO penalization on 4,002 SNPs that are shared between breeds, which also includes 179 SNPs with small between-breed difference. The 100 highest-scored features were extracted from iterative simulations and then evaluated using machine-leaning based classifiers. Results We selected 1,341 SNPs from over 45,000 SNPs through iterative LASSO feature selection, to minimize between-breed differences. We subsequently selected 100 highest-scored SNPs from iterative scoring, and observed high statistical measures in classification of breeding farms by cross-validation only using these SNPs. Conclusion The study represents a successful application of LASSO feature selection on multi-breed pig SNP data to trace the farm information, which provides a valuable method and possibility for further researches on genetic traceability.

  6. Hadoop neural network for parallel and distributed feature selection.

    Science.gov (United States)

    Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

    2016-06-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  7. Adaptive floating search methods in feature selection

    Czech Academy of Sciences Publication Activity Database

    Somol, Petr; Pudil, Pavel; Novovičová, Jana; Paclík, Pavel

    1999-01-01

    Roč. 20, 11/13 (1999), s. 1157-1163 ISSN 0167-8655 R&D Projects: GA ČR GA402/97/1242 Grant - others:MŠMT(CZ) VS96063 Institutional research plan: AV0Z1075907 Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.315, year: 1999 http://library.utia.cas.cz/separaty/historie/somol-adaptive floating search methods in feature selection.pdf

  8. DYNAMIC FEATURE SELECTION FOR WEB USER IDENTIFICATION ON LINGUISTIC AND STYLISTIC FEATURES OF ONLINE TEXTS

    Directory of Open Access Journals (Sweden)

    A. A. Vorobeva

    2017-01-01

    Full Text Available The paper deals with identification and authentication of web users participating in the Internet information processes (based on features of online texts.In digital forensics web user identification based on various linguistic features can be used to discover identity of individuals, criminals or terrorists using the Internet to commit cybercrimes. Internet could be used as a tool in different types of cybercrimes (fraud and identity theft, harassment and anonymous threats, terrorist or extremist statements, distribution of illegal content and information warfare. Linguistic identification of web users is a kind of biometric identification, it can be used to narrow down the suspects, identify a criminal and prosecute him. Feature set includes various linguistic and stylistic features extracted from online texts. We propose dynamic feature selection for each web user identification task. Selection is based on calculating Manhattan distance to k-nearest neighbors (Relief-f algorithm. This approach improves the identification accuracy and minimizes the number of features. Experiments were carried out on several datasets with different level of class imbalance. Experiment results showed that features relevance varies in different set of web users (probable authors of some text; features selection for each set of web users improves identification accuracy by 4% at the average that is approximately 1% higher than with the use of static set of features. The proposed approach is most effective for a small number of training samples (messages per user.

  9. Effective Feature Selection for 5G IM Applications Traffic Classification

    Directory of Open Access Journals (Sweden)

    Muhammad Shafiq

    2017-01-01

    Full Text Available Recently, machine learning (ML algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.

  10. Hybrid feature selection for supporting lightweight intrusion detection systems

    Science.gov (United States)

    Song, Jianglong; Zhao, Wentao; Liu, Qiang; Wang, Xin

    2017-08-01

    Redundant and irrelevant features not only cause high resource consumption but also degrade the performance of Intrusion Detection Systems (IDS), especially when coping with big data. These features slow down the process of training and testing in network traffic classification. Therefore, a hybrid feature selection approach in combination with wrapper and filter selection is designed in this paper to build a lightweight intrusion detection system. Two main phases are involved in this method. The first phase conducts a preliminary search for an optimal subset of features, in which the chi-square feature selection is utilized. The selected set of features from the previous phase is further refined in the second phase in a wrapper manner, in which the Random Forest(RF) is used to guide the selection process and retain an optimized set of features. After that, we build an RF-based detection model and make a fair comparison with other approaches. The experimental results on NSL-KDD datasets show that our approach results are in higher detection accuracy as well as faster training and testing processes.

  11. An improved chaotic fruit fly optimization based on a mutation strategy for simultaneous feature selection and parameter optimization for SVM and its applications.

    Directory of Open Access Journals (Sweden)

    Fei Ye

    Full Text Available This paper proposes a new support vector machine (SVM optimization scheme based on an improved chaotic fly optimization algorithm (FOA with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm's performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem.

  12. Confidence-Based Feature Acquisition

    Science.gov (United States)

    Wagstaff, Kiri L.; desJardins, Marie; MacGlashan, James

    2010-01-01

    Confidence-based Feature Acquisition (CFA) is a novel, supervised learning method for acquiring missing feature values when there is missing data at both training (learning) and test (deployment) time. To train a machine learning classifier, data is encoded with a series of input features describing each item. In some applications, the training data may have missing values for some of the features, which can be acquired at a given cost. A relevant JPL example is that of the Mars rover exploration in which the features are obtained from a variety of different instruments, with different power consumption and integration time costs. The challenge is to decide which features will lead to increased classification performance and are therefore worth acquiring (paying the cost). To solve this problem, CFA, which is made up of two algorithms (CFA-train and CFA-predict), has been designed to greedily minimize total acquisition cost (during training and testing) while aiming for a specific accuracy level (specified as a confidence threshold). With this method, it is assumed that there is a nonempty subset of features that are free; that is, every instance in the data set includes these features initially for zero cost. It is also assumed that the feature acquisition (FA) cost associated with each feature is known in advance, and that the FA cost for a given feature is the same for all instances. Finally, CFA requires that the base-level classifiers produce not only a classification, but also a confidence (or posterior probability).

  13. Leukemia and colon tumor detection based on microarray data classification using momentum backpropagation and genetic algorithm as a feature selection method

    Science.gov (United States)

    Wisesty, Untari N.; Warastri, Riris S.; Puspitasari, Shinta Y.

    2018-03-01

    Cancer is one of the major causes of mordibility and mortality problems in the worldwide. Therefore, the need of a system that can analyze and identify a person suffering from a cancer by using microarray data derived from the patient’s Deoxyribonucleic Acid (DNA). But on microarray data has thousands of attributes, thus making the challenges in data processing. This is often referred to as the curse of dimensionality. Therefore, in this study built a system capable of detecting a patient whether contracted cancer or not. The algorithm used is Genetic Algorithm as feature selection and Momentum Backpropagation Neural Network as a classification method, with data used from the Kent Ridge Bio-medical Dataset. Based on system testing that has been done, the system can detect Leukemia and Colon Tumor with best accuracy equal to 98.33% for colon tumor data and 100% for leukimia data. Genetic Algorithm as feature selection algorithm can improve system accuracy, which is from 64.52% to 98.33% for colon tumor data and 65.28% to 100% for leukemia data, and the use of momentum parameters can accelerate the convergence of the system in the training process of Neural Network.

  14. Feature Extraction and Selection Strategies for Automated Target Recognition

    Science.gov (United States)

    Greene, W. Nicholas; Zhang, Yuhan; Lu, Thomas T.; Chao, Tien-Hsin

    2010-01-01

    Several feature extraction and selection methods for an existing automatic target recognition (ATR) system using JPLs Grayscale Optical Correlator (GOC) and Optimal Trade-Off Maximum Average Correlation Height (OT-MACH) filter were tested using MATLAB. The ATR system is composed of three stages: a cursory region of-interest (ROI) search using the GOC and OT-MACH filter, a feature extraction and selection stage, and a final classification stage. Feature extraction and selection concerns transforming potential target data into more useful forms as well as selecting important subsets of that data which may aide in detection and classification. The strategies tested were built around two popular extraction methods: Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Performance was measured based on the classification accuracy and free-response receiver operating characteristic (FROC) output of a support vector machine(SVM) and a neural net (NN) classifier.

  15. Feature Import Vector Machine: A General Classifier with Flexible Feature Selection.

    Science.gov (United States)

    Ghosh, Samiran; Wang, Yazhen

    2015-02-01

    The support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. General theme here is to construct classifiers based on the training data in a high dimensional space by using all available dimensions. The SVM achieves huge data compression by selecting only few observations which lie close to the boundary of the classifier function. However when the number of observations are not very large (small n ) but the number of dimensions/features are large (large p ), then it is not necessary that all available features are of equal importance in the classification context. Possible selection of an useful fraction of the available features may result in huge data compression. In this paper we propose an algorithmic approach by means of which such an optimal set of features could be selected. In short, we reverse the traditional sequential observation selection strategy of SVM to that of sequential feature selection. To achieve this we have modified the solution proposed by Zhu and Hastie (2005) in the context of import vector machine (IVM), to select an optimal sub-dimensional model to build the final classifier with sufficient accuracy.

  16. Pairwise Constraint-Guided Sparse Learning for Feature Selection.

    Science.gov (United States)

    Liu, Mingxia; Zhang, Daoqiang

    2016-01-01

    Feature selection aims to identify the most informative features for a compact and accurate data representation. As typical supervised feature selection methods, Lasso and its variants using L1-norm-based regularization terms have received much attention in recent studies, most of which use class labels as supervised information. Besides class labels, there are other types of supervised information, e.g., pairwise constraints that specify whether a pair of data samples belong to the same class (must-link constraint) or different classes (cannot-link constraint). However, most of existing L1-norm-based sparse learning methods do not take advantage of the pairwise constraints that provide us weak and more general supervised information. For addressing that problem, we propose a pairwise constraint-guided sparse (CGS) learning method for feature selection, where the must-link and the cannot-link constraints are used as discriminative regularization terms that directly concentrate on the local discriminative structure of data. Furthermore, we develop two variants of CGS, including: 1) semi-supervised CGS that utilizes labeled data, pairwise constraints, and unlabeled data and 2) ensemble CGS that uses the ensemble of pairwise constraint sets. We conduct a series of experiments on a number of data sets from University of California-Irvine machine learning repository, a gene expression data set, two real-world neuroimaging-based classification tasks, and two large-scale attribute classification tasks. Experimental results demonstrate the efficacy of our proposed methods, compared with several established feature selection methods.

  17. Feature Selection for Audio Surveillance in Urban Environment

    Directory of Open Access Journals (Sweden)

    KIKTOVA Eva

    2014-05-01

    Full Text Available This paper presents the work leading to the acoustic event detection system, which is designed to recognize two types of acoustic events (shot and breaking glass in urban environment. For this purpose, a huge front-end processing was performed for the effective parametric representation of an input sound. MFCC features and features computed during their extraction (MELSPEC and FBANK, then MPEG-7 audio descriptors and other temporal and spectral characteristics were extracted. High dimensional feature sets were created and in the next phase reduced by the mutual information based selection algorithms. Hidden Markov Model based classifier was applied and evaluated by the Viterbi decoding algorithm. Thus very effective feature sets were identified and also the less important features were found.

  18. Feature-selective attention in healthy old age: a selective decline in selective attention?

    Science.gov (United States)

    Quigley, Cliodhna; Müller, Matthias M

    2014-02-12

    Deficient selection against irrelevant information has been proposed to underlie age-related cognitive decline. We recently reported evidence for maintained early sensory selection when older and younger adults used spatial selective attention to perform a challenging task. Here we explored age-related differences when spatial selection is not possible and feature-selective attention must be deployed. We additionally compared the integrity of feedforward processing by exploiting the well established phenomenon of suppression of visual cortical responses attributable to interstimulus competition. Electroencephalogram was measured while older and younger human adults responded to brief occurrences of coherent motion in an attended stimulus composed of randomly moving, orientation-defined, flickering bars. Attention was directed to horizontal or vertical bars by a pretrial cue, after which two orthogonally oriented, overlapping stimuli or a single stimulus were presented. Horizontal and vertical bars flickered at different frequencies and thereby elicited separable steady-state visual-evoked potentials, which were used to examine the effect of feature-based selection and the competitive influence of a second stimulus on ongoing visual processing. Age differences were found in feature-selective attentional modulation of visual responses: older adults did not show consistent modulation of magnitude or phase. In contrast, the suppressive effect of a second stimulus was robust and comparable in magnitude across age groups, suggesting that bottom-up processing of the current stimuli is essentially unchanged in healthy old age. Thus, it seems that visual processing per se is unchanged, but top-down attentional control is compromised in older adults when space cannot be used to guide selection.

  19. A Variance Minimization Criterion to Feature Selection Using Laplacian Regularization.

    Science.gov (United States)

    He, Xiaofei; Ji, Ming; Zhang, Chiyuan; Bao, Hujun

    2011-10-01

    In many information processing tasks, one is often confronted with very high-dimensional data. Feature selection techniques are designed to find the meaningful feature subset of the original features which can facilitate clustering, classification, and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenarios, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. Based on Laplacian regularized least squares, which finds a smooth function on the data manifold and minimizes the empirical loss, we propose two novel feature selection algorithms which aim to minimize the expected prediction error of the regularized regression model. Specifically, we select those features such that the size of the parameter covariance matrix of the regularized regression model is minimized. Motivated from experimental design, we use trace and determinant operators to measure the size of the covariance matrix. Efficient computational schemes are also introduced to solve the corresponding optimization problems. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithms.

  20. Feature Selection on Hyperspectral Data for Dismount Skin Analysis

    Science.gov (United States)

    2014-03-27

    concentration can change based on many environmental factors to include pregnancy , aging, drug intake, and stress [5]. Since dismount characterization...predefined relevance threshold Output: S best: optimal subset of selected features 1 for i = 1 to N do 2 S Utemp ← S Ui,c for Fi 3 if S Utemp ≥ δ then 4 append

  1. Feature selection using feature dissimilarity measure and density ...

    Indian Academy of Sciences (India)

    2015-09-28

    Sep 28, 2015 ... quantify information loss between a pair of features. In the following derivation we show how the larger eigenvalue can be used as a measure of feature dissimilarity. Let x and y be two random variables and the covariance matrix of x and y be Σ: X = var x( ) cov x,y. ( ) cov x,y. ( ) var y( ) . We have to find the ...

  2. Research and Application of Hybrid Forecasting Model Based on an Optimal Feature Selection System—A Case Study on Electrical Load Forecasting

    Directory of Open Access Journals (Sweden)

    Yunxuan Dong

    2017-04-01

    Full Text Available The process of modernizing smart grid prominently increases the complexity and uncertainty in scheduling and operation of power systems, and, in order to develop a more reliable, flexible, efficient and resilient grid, electrical load forecasting is not only an important key but is still a difficult and challenging task as well. In this paper, a short-term electrical load forecasting model, with a unit for feature learning named Pyramid System and recurrent neural networks, has been developed and it can effectively promote the stability and security of the power grid. Nine types of methods for feature learning are compared in this work to select the best one for learning target, and two criteria have been employed to evaluate the accuracy of the prediction intervals. Furthermore, an electrical load forecasting method based on recurrent neural networks has been formed to achieve the relational diagram of historical data, and, to be specific, the proposed techniques are applied to electrical load forecasting using the data collected from New South Wales, Australia. The simulation results show that the proposed hybrid models can not only satisfactorily approximate the actual value but they are also able to be effective tools in the planning of smart grids.

  3. Toward literature-based feature selection for diagnostic classification: a meta-analysis of resting-state fMRI in depression.

    Science.gov (United States)

    Sundermann, Benedikt; Olde Lütke Beverborg, Mona; Pfleiderer, Bettina

    2014-01-01

    Information derived from functional magnetic resonance imaging (fMRI) during wakeful rest has been introduced as a candidate diagnostic biomarker in unipolar major depressive disorder (MDD). Multiple reports of resting state fMRI in MDD describe group effects. Such prior knowledge can be adopted to pre-select potentially discriminating features for diagnostic classification models with the aim to improve diagnostic accuracy. Purpose of this analysis was to consolidate spatial information about alterations of spontaneous brain activity in MDD, primarily to serve as feature selection for multivariate pattern analysis techniques (MVPA). Thirty two studies were included in final analyses. Coordinates extracted from the original reports were assigned to two categories based on directionality of findings. Meta-analyses were calculated using the non-additive activation likelihood estimation approach with coordinates organized by subject group to account for non-independent samples. Converging evidence revealed a distributed pattern of brain regions with increased or decreased spontaneous activity in MDD. The most distinct finding was hyperactivity/hyperconnectivity presumably reflecting the interaction of cortical midline structures (posterior default mode network components including the precuneus and neighboring posterior cingulate cortices associated with self-referential processing and the subgenual anterior cingulate and neighboring medial frontal cortices) with lateral prefrontal areas related to externally-directed cognition. Other areas of hyperactivity/hyperconnectivity include the left lateral parietal cortex, right hippocampus and right cerebellum whereas hypoactivity/hypoconnectivity was observed mainly in the left temporal cortex, the insula, precuneus, superior frontal gyrus, lentiform nucleus and thalamus. Results are made available in two different data formats to be used as spatial hypotheses in future studies, particularly for diagnostic classification

  4. Towards literature-based feature selection for diagnostic classification: A meta-analysis of resting-state fMRI in depression

    Directory of Open Access Journals (Sweden)

    Benedikt eSundermann

    2014-09-01

    Full Text Available Information derived from functional magnetic resonance imaging (fMRI during wakeful rest has been introduced as a candidate diagnostic biomarker in unipolar major depressive disorder (MDD. Multiple reports of resting state fMRI in MDD describe group effects. Such prior knowledge can be adopted to pre-select potentially discriminating features for diagnostic classification models with the aim to improve diagnostic accuracy. Purpose of this analysis was to consolidate spatial information about alterations of spontaneous brain activity in MDD, primarily to serve as feature selection for multivariate pattern analysis techniques (MVPA. 32 studies were included in final analyses. Coordinates extracted from the original reports were assigned to two categories based on directionality of findings. Meta-analyses were calculated using the non-additive activation likelihood estimation approach with coordinates organized by subject group to account for non-independent samples. Converging evidence revealed a distributed pattern of brain regions with increased or decreased spontaneous activity in MDD. The most distinct finding was hyperactivity/hyperconnectivity presumably reflecting the interaction of cortical midline structures (posterior default mode network components including the precuneus and neighboring posterior cingulate cortices associated with self-referential processing and the subgenual anterior cingulate and neighboring medial frontal cortices with lateral prefrontal areas related to externally-directed cognition. Other areas of hyperactivity/hyperconnectivity include the left lateral parietal cortex, right hippocampus and right cerebellum whereas hypoactivity/hypoconnectivity was observed mainly in the left temporal cortex, the insula, precuneus, superior frontal gyrus, lentiform nucleus and thalamus. Results are made available in two different data formats to be used as spatial hypotheses in future studies, particularly for diagnostic

  5. Feature selection using feature dissimilarity measure and density ...

    Indian Academy of Sciences (India)

    The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively ...

  6. Tile-based Fisher-ratio software for improved feature selection analysis of comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry data.

    Science.gov (United States)

    Marney, Luke C; Siegler, W Christopher; Parsons, Brendon A; Hoggard, Jamin C; Wright, Bob W; Synovec, Robert E

    2013-10-15

    Comprehensive two-dimensional (2D) gas chromatography coupled with time-of-flight mass spectrometry (GC × GC-TOFMS) is a highly capable instrumental platform that produces complex and information-rich multi-dimensional chemical data. The data can be initially overwhelming, especially when many samples (of various sample classes) are analyzed with multiple injections for each sample. Thus, the data must be analyzed in such a way as to extract the most meaningful information. The pixel-based and peak table-based Fisher ratio algorithmic approaches have been used successfully in the past to reduce the multi-dimensional data down to those chemical compounds that are changing between the sample classes relative to those that are not changing (i.e., chemical feature selection). We report on the initial development of a computationally fast novel tile-based Fisher-ratio software that addresses the challenges due to 2D retention time misalignment without explicitly aligning the data, which is often a shortcoming for both pixel-based and peak table-based algorithmic approaches. Concurrently, the tile-based Fisher-ratio algorithm significantly improves the sensitivity contrast of true positives against a background of potential false positives and noise. In this study, eight compounds, plus one internal standard, were spiked into diesel at various concentrations. The tile-based F-ratio algorithmic approach was able to "discover" all spiked analytes, within the complex diesel sample matrix with thousands of potential false positives, in each possible concentration comparison, even at the lowest absolute spiked analyte concentration ratio of 1.06, the ratio between the concentrations in the spiked diesel sample to the native concentration in diesel. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Correlation Feature Selection and Mutual Information Theory Based Quantitative Research on Meteorological Impact Factors of Module Temperature for Solar Photovoltaic Systems

    Directory of Open Access Journals (Sweden)

    Yujing Sun

    2016-12-01

    Full Text Available The module temperature is the most important parameter influencing the output power of solar photovoltaic (PV systems, aside from solar irradiance. In this paper, we focus on the interdisciplinary research that combines the correlation analysis, mutual information (MI and heat transfer theory, which aims to figure out the correlative relations between different meteorological impact factors (MIFs and PV module temperature from both quality and quantitative aspects. The identification and confirmation of primary MIFs of PV module temperature are investigated as the first step of this research from the perspective of physical meaning and mathematical analysis about electrical performance and thermal characteristic of PV modules based on PV effect and heat transfer theory. Furthermore, the quantitative description of the MIFs influence on PV module temperature is mathematically formulated as several indexes using correlation-based feature selection (CFS and MI theory to explore the specific impact degrees under four different typical weather statuses named general weather classes (GWCs. Case studies for the proposed methods were conducted using actual measurement data of a 500 kW grid-connected solar PV plant in China. The results not only verified the knowledge about the main MIFs of PV module temperatures, more importantly, but also provide the specific ratio of quantitative impact degrees of these three MIFs respectively through CFS and MI based measures under four different GWCs.

  8. Multiclass sequential feature selection and classification method for ...

    African Journals Online (AJOL)

    Multiclass sequential feature selection and classification method for gene expression data. ... The quality of the features selected by mk-SS algorithm was validated by hybridizing the feature selection mechanism into the standard SVM algorithm to develop the SVM-kSS classifier. The prediction performance of the SVM-kSS ...

  9. Towards cardinality-based service feature diagrams

    Directory of Open Access Journals (Sweden)

    Ghulam Mustafa Assad

    2015-03-01

    Full Text Available To provide efficient services to end-user it is essential to manage variability among services. Feature modelling is an important approach to manage variability and commonalities of a system in product line. Feature models are composed of feature diagrams. Service feature diagrams (an extended form of feature diagrams changed the basic framework of feature diagrams by proposing new feature types and their relevance. Service feature diagrams provide selection rights for variable features. In this paper we argue that it is essential to put cardinalities on service feature diagrams. That is, the selection of features should be done under some constraints, to provide a lower and upper limit for the selection of features. The use of cardinalities on service feature diagrams reduces the types of features to half, while keeping the integrity of all features.

  10. Neural network feature selection for breast cancer diagnosis

    Science.gov (United States)

    Kocur, Catherine M.; Rogers, Steven K.; Bauer, Kenneth W., Jr.; Steppe, Jean M.; Hoffmeister, Jeffrey W.

    1995-04-01

    More than 50 million women over the age of 40 are currently at risk for breast cancer in the United States. Computer-aided diagnosis, as a second opinion to radiologists, will aid in decreasing the number of false readings of mammograms. Neural network benefits are exploited at both the classification and feature selection stages in the development of a computer-aided breast cancer diagnostic system. The multilayer perceptron is used to classify and contrast three features (angular second moment, eigenmasses, and wavelets) developed to distinguish benign from malignant lesion in a database of 94 difficult-to-diagnose digitized microcalcification cases. System performance of 74 percent correct classifications is achieved. Feature selection techniques are presented which further improve performance. Neural and decision boundary-based methods are implemented, compared, and validated to isolate and remove useless features. The contribution from this analysis is an increase to 88 percent correct classification in system performance. These feature selection techniques can also process risk factor data.

  11. Naive Bayes-Guided Bat Algorithm for Feature Selection

    Science.gov (United States)

    Taha, Ahmed Majid; Mustapha, Aida; Chen, Soong-Der

    2013-01-01

    When the amount of data and information is said to double in every 20 months or so, feature selection has become highly important and beneficial. Further improvements in feature selection will positively affect a wide array of applications in fields such as pattern recognition, machine learning, or signal processing. Bio-inspired method called Bat Algorithm hybridized with a Naive Bayes classifier has been presented in this work. The performance of the proposed feature selection algorithm was investigated using twelve benchmark datasets from different domains and was compared to three other well-known feature selection algorithms. Discussion focused on four perspectives: number of features, classification accuracy, stability, and feature generalization. The results showed that BANB significantly outperformed other algorithms in selecting lower number of features, hence removing irrelevant, redundant, or noisy features while maintaining the classification accuracy. BANB is also proven to be more stable than other methods and is capable of producing more general feature subsets. PMID:24396295

  12. Application of Global Optimization Methods for Feature Selection and Machine Learning

    Directory of Open Access Journals (Sweden)

    Shaohua Wu

    2013-01-01

    Full Text Available The feature selection process constitutes a commonly encountered problem of global combinatorial optimization. The process reduces the number of features by removing irrelevant and redundant data. This paper proposed a novel immune clonal genetic algorithm based on immune clonal algorithm designed to solve the feature selection problem. The proposed algorithm has more exploration and exploitation abilities due to the clonal selection theory, and each antibody in the search space specifies a subset of the possible features. Experimental results show that the proposed algorithm simplifies the feature selection process effectively and obtains higher classification accuracy than other feature selection algorithms.

  13. Efficient Generation and Selection of Combined Features for Improved Classification

    KAUST Repository

    Shono, Ahmad N.

    2014-05-01

    This study contributes a methodology and associated toolkit developed to allow users to experiment with the use of combined features in classification problems. Methods are provided for efficiently generating combined features from an original feature set, for efficiently selecting the most discriminating of these generated combined features, and for efficiently performing a preliminary comparison of the classification results when using the original features exclusively against the results when using the selected combined features. The potential benefit of considering combined features in classification problems is demonstrated by applying the developed methodology and toolkit to three sample data sets where the discovery of combined features containing new discriminating information led to improved classification results.

  14. Feature subset selection and ranking for data dimensionality reduction

    OpenAIRE

    Wei, H.L.; Billings, S.A.

    2007-01-01

    A new unsupervised forward orthogonal search (FOS) algorithm is introduced for feature selection and ranking. In the new algorithm, features are selected in a stepwise way, one at a time, by estimating the capability of each specified candidate feature subset to represent the overall features in the measurement space. A squared correlation function is employed as the criterion to measure the dependency between features and this makes the new algorithm easy to implement. The forward orthogonal...

  15. A compound structure of ELM based on feature selection and parameter optimization using hybrid backtracking search algorithm for wind speed forecasting

    International Nuclear Information System (INIS)

    Zhang, Chu; Zhou, Jianzhong; Li, Chaoshun; Fu, Wenlong; Peng, Tian

    2017-01-01

    Highlights: • A novel hybrid approach is proposed for wind speed forecasting. • The variational mode decomposition (VMD) is optimized to decompose the original wind speed series. • The input matrix and parameters of ELM are optimized simultaneously by using a hybrid BSA. • Results show that OVMD-HBSA-ELM achieves better performance in terms of prediction accuracy. - Abstract: Reliable wind speed forecasting is essential for wind power integration in wind power generation system. The purpose of paper is to develop a novel hybrid model for short-term wind speed forecasting and demonstrates its efficiency. In the proposed model, a compound structure of extreme learning machine (ELM) based on feature selection and parameter optimization using hybrid backtracking search algorithm (HBSA) is employed as the predictor. The real-valued BSA (RBSA) is exploited to search for the optimal combination of weights and bias of ELM while the binary-valued BSA (BBSA) is exploited as a feature selection method applying on the candidate inputs predefined by partial autocorrelation function (PACF) values to reconstruct the input-matrix. Due to the volatility and randomness of wind speed signal, an optimized variational mode decomposition (OVMD) is employed to eliminate the redundant noises. The parameters of the proposed OVMD are determined according to the center frequencies of the decomposed modes and the residual evaluation index (REI). The wind speed signal is decomposed into a few modes via OVMD. The aggregation of the forecasting results of these modes constructs the final forecasting result of the proposed model. The proposed hybrid model has been applied on the mean half-hour wind speed observation data from two wind farms in Inner Mongolia, China and 10-min wind speed data from the Sotavento Galicia wind farm are studied as an additional case. Parallel experiments have been designed to compare with the proposed model. Results obtained from this study indicate that the

  16. The Short-Term Power Load Forecasting Based on Sperm Whale Algorithm and Wavelet Least Square Support Vector Machine with DWT-IR for Feature Selection

    Directory of Open Access Journals (Sweden)

    Jin-peng Liu

    2017-07-01

    Full Text Available Short-term power load forecasting is an important basis for the operation of integrated energy system, and the accuracy of load forecasting directly affects the economy of system operation. To improve the forecasting accuracy, this paper proposes a load forecasting system based on wavelet least square support vector machine and sperm whale algorithm. Firstly, the methods of discrete wavelet transform and inconsistency rate model (DWT-IR are used to select the optimal features, which aims to reduce the redundancy of input vectors. Secondly, the kernel function of least square support vector machine LSSVM is replaced by wavelet kernel function for improving the nonlinear mapping ability of LSSVM. Lastly, the parameters of W-LSSVM are optimized by sperm whale algorithm, and the short-term load forecasting method of W-LSSVM-SWA is established. Additionally, the example verification results show that the proposed model outperforms other alternative methods and has a strong effectiveness and feasibility in short-term power load forecasting.

  17. Input significance analysis: feature selection through synaptic ...

    African Journals Online (AJOL)

    This work is interested in ISA methods that can manipulate synaptic weights namely. Connection Weights (CW) and Garson's Algorithm (GA) and the classifier selected is. Evolving Fuzzy Neural Networks (EFuNNs). Firstly, it test FS method on a dataset selected from the UCI Machine Learning Repository and executed in an ...

  18. Improved feature-selection method considering the imbalance problem in text categorization.

    Science.gov (United States)

    Yang, Jieming; Qu, Zhaoyang; Liu, Zhiying

    2014-01-01

    The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was proposed, which can weaken the adverse effect caused by the imbalance factor in the corpus. We evaluated the improved versions of nine well-known feature-selection methods (Information Gain, Chi statistic, Document Frequency, Orthogonal Centroid Feature Selection, DIA association factor, Comprehensive Measurement Feature Selection, Deviation from Poisson Feature Selection, improved Gini index, and Mutual Information) using naïve Bayes and support vector machines on three benchmark document collections (20-Newsgroups, Reuters-21578, and WebKB). The experimental results show that the improved scheme can significantly enhance the performance of the feature-selection methods.

  19. Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

    Directory of Open Access Journals (Sweden)

    Jieming Yang

    2014-01-01

    Full Text Available The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was proposed, which can weaken the adverse effect caused by the imbalance factor in the corpus. We evaluated the improved versions of nine well-known feature-selection methods (Information Gain, Chi statistic, Document Frequency, Orthogonal Centroid Feature Selection, DIA association factor, Comprehensive Measurement Feature Selection, Deviation from Poisson Feature Selection, improved Gini index, and Mutual Information using naïve Bayes and support vector machines on three benchmark document collections (20-Newsgroups, Reuters-21578, and WebKB. The experimental results show that the improved scheme can significantly enhance the performance of the feature-selection methods.

  20. Integrated Clustering and Feature Selection Scheme for Text Documents.

    OpenAIRE

    M. Thangamani; P. Thangaraj

    2010-01-01

    Problem statement: Text documents are the unstructured databases that contain raw data collection. The clustering techniques are used group up the text documents with reference to its similarity. Approach: The feature selection techniques were used to improve the efficiency and accuracy of clustering process. The feature selection was done by eliminate the redundant and irrelevant items from the text document contents. Statistical methods were used in the text clustering and feature selection...

  1. Feature subset selection and ranking for data dimensionality reduction.

    Science.gov (United States)

    Wei, Hua-Liang; Billings, Stephen A

    2007-01-01

    A new unsupervised forward orthogonal search (FOS) algorithm is introduced for feature selection and ranking. In the new algorithm, features are selected in a stepwise way, one at a time, by estimating the capability of each specified candidate feature subset to represent the overall features in the measurement space. A squared correlation function is employed as the criterion to measure the dependency between features and this makes the new algorithm easy to implement. The forward orthogonalization strategy, which combines good effectiveness with high efficiency, enables the new algorithm to produce efficient feature subsets with a clear physical interpretation.

  2. AVC: Selecting discriminative features on basis of AUC by maximizing variable complementarity.

    Science.gov (United States)

    Sun, Lei; Wang, Jun; Wei, Jinmao

    2017-03-14

    The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.

  3. A Computer-Aided Diagnosis System for Dynamic Contrast-Enhanced MR Images Based on Level Set Segmentation and ReliefF Feature Selection

    Directory of Open Access Journals (Sweden)

    Zhiyong Pang

    2015-01-01

    Full Text Available This study established a fully automated computer-aided diagnosis (CAD system for the classification of malignant and benign masses via breast magnetic resonance imaging (BMRI. A breast segmentation method consisting of a preprocessing step to identify the air-breast interfacing boundary and curve fitting for chest wall line (CWL segmentation was included in the proposed CAD system. The Chan-Vese (CV model level set (LS segmentation method was adopted to segment breast mass and demonstrated sufficiently good segmentation performance. The support vector machine (SVM classifier with ReliefF feature selection was used to merge the extracted morphological and texture features into a classification score. The accuracy, sensitivity, and specificity measurements for the leave-half-case-out resampling method were 92.3%, 98.2%, and 76.2%, respectively. For the leave-one-case-out resampling method, the measurements were 90.0%, 98.7%, and 73.8%, respectively.

  4. Feature selection from hyperspectral imaging for guava fruit defects detection

    Science.gov (United States)

    Mat Jafri, Mohd. Zubir; Tan, Sou Ching

    2017-06-01

    Development of technology makes hyperspectral imaging commonly used for defect detection. In this research, a hyperspectral imaging system was setup in lab to target for guava fruits defect detection. Guava fruit was selected as the object as to our knowledge, there is fewer attempts were made for guava defect detection based on hyperspectral imaging. The common fluorescent light source was used to represent the uncontrolled lighting condition in lab and analysis was carried out in a specific wavelength range due to inefficiency of this particular light source. Based on the data, the reflectance intensity of this specific setup could be categorized in two groups. Sequential feature selection with linear discriminant (LD) and quadratic discriminant (QD) function were used to select features that could potentially be used in defects detection. Besides the ordinary training method, training dataset in discriminant was separated in two to cater for the uncontrolled lighting condition. These two parts were separated based on the brighter and dimmer area. Four evaluation matrixes were evaluated which are LD with common training method, QD with common training method, LD with two part training method and QD with two part training method. These evaluation matrixes were evaluated using F1-score with total 48 defected areas. Experiment shown that F1-score of linear discriminant with the compensated method hitting 0.8 score, which is the highest score among all.

  5. Efficient feature subset selection with probabilistic distance criteria. [pattern recognition

    Science.gov (United States)

    Chittineni, C. B.

    1979-01-01

    Recursive expressions are derived for efficiently computing the commonly used probabilistic distance measures as a change in the criteria both when a feature is added to and when a feature is deleted from the current feature subset. A combinatorial algorithm for generating all possible r feature combinations from a given set of s features in (s/r) steps with a change of a single feature at each step is presented. These expressions can also be used for both forward and backward sequential feature selection.

  6. Individual discriminative face recognition models based on subsets of features

    DEFF Research Database (Denmark)

    Clemmensen, Line Katrine Harder; Gomez, David Delgado; Ersbøll, Bjarne Kjær

    2007-01-01

    of the face recognition problem. The elastic net model is able to select a subset of features with low computational effort compared to other state-of-the-art feature selection methods. Furthermore, the fact that the number of features usually is larger than the number of images in the data base makes feature......The accuracy of data classification methods depends considerably on the data representation and on the selected features. In this work, the elastic net model selection is used to identify meaningful and important features in face recognition. Modelling the characteristics which distinguish one...... person from another using only subsets of features will both decrease the computational cost and increase the generalization capacity of the face recognition algorithm. Moreover, identifying which are the features that better discriminate between persons will also provide a deeper understanding...

  7. Feature Selection and Dimensionality Reduction in Genomics and Proteomics

    OpenAIRE

    Hauskrecht, Milos; Pelikan, Richard; Valko, Michal; Lyons-Weiler, James

    2006-01-01

    International audience; Finding reliable, meaningful patterns in data with high numbers of attributes can be extremely difficult. Feature selection helps us to decide what attributes or combination of attributes are most important for finding these patterns. In this chapter, we study feature selection methods for building classification models from high-throughput genomic (microarray) and proteomic (mass spectrometry) data sets. Thousands of feature candidates must be analyzed, compared and c...

  8. Performance Investigation of Feature Selection Methods

    OpenAIRE

    sharma, Anuj; Dey, Shubhamoy

    2013-01-01

    Sentiment analysis or opinion mining has become an open research domain after proliferation of Internet and Web 2.0 social media. People express their attitudes and opinions on social media including blogs, discussion forums, tweets, etc. and, sentiment analysis concerns about detecting and extracting sentiment or opinion from online text. Sentiment based text classification is different from topical text classification since it involves discrimination based on expressed opinion on a topic. F...

  9. Mutual information for enhanced feature selection in visual tracking

    Science.gov (United States)

    Stamatescu, Victor; Wong, Sebastien; Kearney, David; Lee, Ivan; Milton, Anthony

    2015-05-01

    In this paper we investigate the problem of fusing a set of features for a discriminative visual tracking algorithm, where good features are those that best discriminate an object from the local background. Using a principled Mutual Information approach, we introduce a novel online feature selection algorithm that preserves discriminative features while reducing redundant information. Applying this algorithm to a discriminative visual tracking system, we experimentally demonstrate improved tracking performance on standard data sets.

  10. Improving permafrost distribution modelling using feature selection algorithms

    Science.gov (United States)

    Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail

    2016-04-01

    The availability of an increasing number of spatial data on the occurrence of mountain permafrost allows the employment of machine learning (ML) classification algorithms for modelling the distribution of the phenomenon. One of the major problems when dealing with high-dimensional dataset is the number of input features (variables) involved. Application of ML classification algorithms to this large number of variables leads to the risk of overfitting, with the consequence of a poor generalization/prediction. For this reason, applying feature selection (FS) techniques helps simplifying the amount of factors required and improves the knowledge on adopted features and their relation with the studied phenomenon. Moreover, taking away irrelevant or redundant variables from the dataset effectively improves the quality of the ML prediction. This research deals with a comparative analysis of permafrost distribution models supported by FS variable importance assessment. The input dataset (dimension = 20-25, 10 m spatial resolution) was constructed using landcover maps, climate data and DEM derived variables (altitude, aspect, slope, terrain curvature, solar radiation, etc.). It was completed with permafrost evidences (geophysical and thermal data and rock glacier inventories) that serve as training permafrost data. Used FS algorithms informed about variables that appeared less statistically important for permafrost presence/absence. Three different algorithms were compared: Information Gain (IG), Correlation-based Feature Selection (CFS) and Random Forest (RF). IG is a filter technique that evaluates the worth of a predictor by measuring the information gain with respect to the permafrost presence/absence. Conversely, CFS is a wrapper technique that evaluates the worth of a subset of predictors by considering the individual predictive ability of each variable along with the degree of redundancy between them. Finally, RF is a ML algorithm that performs FS as part of its

  11. Single feature polymorphism (SFP)-based selective sweep identification and association mapping of growth-related metabolic traits in Arabidopsis thaliana.

    Science.gov (United States)

    Childs, Liam H; Witucka-Wall, Hanna; Günther, Torsten; Sulpice, Ronan; Korff, Maria V; Stitt, Mark; Walther, Dirk; Schmid, Karl J; Altmann, Thomas

    2010-03-20

    Natural accessions of Arabidopsis thaliana are characterized by a high level of phenotypic variation that can be used to investigate the extent and mode of selection on the primary metabolic traits. A collection of 54 A. thaliana natural accession-derived lines were subjected to deep genotyping through Single Feature Polymorphism (SFP) detection via genomic DNA hybridization to Arabidopsis Tiling 1.0 Arrays for the detection of selective sweeps, and identification of associations between sweep regions and growth-related metabolic traits. A total of 1,072,557 high-quality SFPs were detected and indications for 3,943 deletions and 1,007 duplications were obtained. A significantly lower than expected SFP frequency was observed in protein-, rRNA-, and tRNA-coding regions and in non-repetitive intergenic regions, while pseudogenes, transposons, and non-coding RNA genes are enriched with SFPs. Gene families involved in plant defence or in signalling were identified as highly polymorphic, while several other families including transcription factors are depleted of SFPs. 198 significant associations between metabolic genes and 9 metabolic and growth-related phenotypic traits were detected with annotation hinting at the nature of the relationship. Five significant selective sweep regions were also detected of which one associated significantly with a metabolic trait. We generated a high density polymorphism map for 54 A. thaliana accessions that highlights the variability of resistance genes across geographic ranges and used it to identify selective sweeps and associations between metabolic genes and metabolic phenotypes. Several associations show a clear biological relationship, while many remain requiring further investigation.

  12. Single feature polymorphism (SFP-based selective sweep identification and association mapping of growth-related metabolic traits in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Stitt Mark

    2010-03-01

    Full Text Available Abstract Background Natural accessions of Arabidopsis thaliana are characterized by a high level of phenotypic variation that can be used to investigate the extent and mode of selection on the primary metabolic traits. A collection of 54 A. thaliana natural accession-derived lines were subjected to deep genotyping through Single Feature Polymorphism (SFP detection via genomic DNA hybridization to Arabidopsis Tiling 1.0 Arrays for the detection of selective sweeps, and identification of associations between sweep regions and growth-related metabolic traits. Results A total of 1,072,557 high-quality SFPs were detected and indications for 3,943 deletions and 1,007 duplications were obtained. A significantly lower than expected SFP frequency was observed in protein-, rRNA-, and tRNA-coding regions and in non-repetitive intergenic regions, while pseudogenes, transposons, and non-coding RNA genes are enriched with SFPs. Gene families involved in plant defence or in signalling were identified as highly polymorphic, while several other families including transcription factors are depleted of SFPs. 198 significant associations between metabolic genes and 9 metabolic and growth-related phenotypic traits were detected with annotation hinting at the nature of the relationship. Five significant selective sweep regions were also detected of which one associated significantly with a metabolic trait. Conclusions We generated a high density polymorphism map for 54 A. thaliana accessions that highlights the variability of resistance genes across geographic ranges and used it to identify selective sweeps and associations between metabolic genes and metabolic phenotypes. Several associations show a clear biological relationship, while many remain requiring further investigation.

  13. Information Theory for Gabor Feature Selection for Face Recognition

    Directory of Open Access Journals (Sweden)

    Shen Linlin

    2006-01-01

    Full Text Available A discriminative and robust feature—kernel enhanced informative Gabor feature—is proposed in this paper for face recognition. Mutual information is applied to select a set of informative and nonredundant Gabor features, which are then further enhanced by kernel methods for recognition. Compared with one of the top performing methods in the 2004 Face Verification Competition (FVC2004, our methods demonstrate a clear advantage over existing methods in accuracy, computation efficiency, and memory cost. The proposed method has been fully tested on the FERET database using the FERET evaluation protocol. Significant improvements on three of the test data sets are observed. Compared with the classical Gabor wavelet-based approaches using a huge number of features, our method requires less than 4 milliseconds to retrieve a few hundreds of features. Due to the substantially reduced feature dimension, only 4 seconds are required to recognize 200 face images. The paper also unified different Gabor filter definitions and proposed a training sample generation algorithm to reduce the effects caused by unbalanced number of samples available in different classes.

  14. A fast approach for detection of erythemato-squamous diseases based on extreme learning machine with maximum relevance minimum redundancy feature selection

    Science.gov (United States)

    Liu, Tong; Hu, Liang; Ma, Chao; Wang, Zhi-Yan; Chen, Hui-Ling

    2015-04-01

    In this paper, a novel hybrid method, which integrates an effective filter maximum relevance minimum redundancy (MRMR) and a fast classifier extreme learning machine (ELM), has been introduced for diagnosing erythemato-squamous (ES) diseases. In the proposed method, MRMR is employed as a feature selection tool for dimensionality reduction in order to further improve the diagnostic accuracy of the ELM classifier. The impact of the type of activation functions, the number of hidden neurons and the size of the feature subsets on the performance of ELM have been investigated in detail. The effectiveness of the proposed method has been rigorously evaluated against the ES disease dataset, a benchmark dataset, from UCI machine learning database in terms of classification accuracy. Experimental results have demonstrated that our method has achieved the best classification accuracy of 98.89% and an average accuracy of 98.55% via 10-fold cross-validation technique. The proposed method might serve as a new candidate of powerful methods for diagnosing ES diseases.

  15. Feature selection using genetic algorithms for fetal heart rate analysis

    International Nuclear Information System (INIS)

    Xu, Liang; Redman, Christopher W G; Georgieva, Antoniya; Payne, Stephen J

    2014-01-01

    The fetal heart rate (FHR) is monitored on a paper strip (cardiotocogram) during labour to assess fetal health. If necessary, clinicians can intervene and assist with a prompt delivery of the baby. Data-driven computerized FHR analysis could help clinicians in the decision-making process. However, selecting the best computerized FHR features that relate to labour outcome is a pressing research problem. The objective of this study is to apply genetic algorithms (GA) as a feature selection method to select the best feature subset from 64 FHR features and to integrate these best features to recognize unfavourable FHR patterns. The GA was trained on 404 cases and tested on 106 cases (both balanced datasets) using three classifiers, respectively. Regularization methods and backward selection were used to optimize the GA. Reasonable classification performance is shown on the testing set for the best feature subset (Cohen's kappa values of 0.45 to 0.49 using different classifiers). This is, to our knowledge, the first time that a feature selection method for FHR analysis has been developed on a database of this size. This study indicates that different FHR features, when integrated, can show good performance in predicting labour outcome. It also gives the importance of each feature, which will be a valuable reference point for further studies. (paper)

  16. Feature selection for MLP neural network: the use of random permutation of probabilistic outputs.

    Science.gov (United States)

    Yang, Jian-Bo; Shen, Kai-Quan; Ong, Chong-Jin; Li, Xiao-Ping

    2009-12-01

    This paper presents a new wrapper-based feature selection method for multilayer perceptron (MLP) neural networks. It uses a feature ranking criterion to measure the importance of a feature by computing the aggregate difference, over the feature space, of the probabilistic outputs of the MLP with and without the feature. Thus, a score of importance with respect to every feature can be provided using this criterion. Based on the numerical experiments on several artificial and real-world data sets, the proposed method performs, in general, better than several selected feature selection methods for MLP, particularly when the data set is sparse or has many redundant features. In addition, as a wrapper-based approach, the computational cost for the proposed method is modest.

  17. Application of Global Optimization Methods for Feature Selection and Machine Learning

    OpenAIRE

    Wu, Shaohua; Hu, Yong; Wang, Wei; Feng, Xinyong; Shu, Wanneng

    2013-01-01

    The feature selection process constitutes a commonly encountered problem of global combinatorial optimization. The process reduces the number of features by removing irrelevant and redundant data. This paper proposed a novel immune clonal genetic algorithm based on immune clonal algorithm designed to solve the feature selection problem. The proposed algorithm has more exploration and exploitation abilities due to the clonal selection theory, and each antibody in the search space specifie...

  18. sCwc/sLcc: Highly Scalable Feature Selection Algorithms

    Directory of Open Access Journals (Sweden)

    Kilho Shin

    2017-12-01

    Full Text Available Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain the phenomena that the dataset describes, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, while feature selection algorithms that exhibit excellent accuracy have been developed, they are seldom used for analysis of high-dimensional data because high-dimensional data usually include too many instances and features, which make traditional feature selection algorithms inefficient. To eliminate this limitation, we tried to improve the run-time performance of two of the most accurate feature selection algorithms known in the literature. The result is two accurate and fast algorithms, namely sCwc and sLcc. Multiple experiments with real social media datasets have demonstrated that our algorithms improve the performance of their original algorithms remarkably. For example, we have two datasets, one with 15,568 instances and 15,741 features, and another with 200,569 instances and 99,672 features. sCwc performed feature selection on these datasets in 1.4 seconds and in 405 seconds, respectively. In addition, sLcc has turned out to be as fast as sCwc on average. This is a remarkable improvement because it is estimated that the original algorithms would need several hours to dozens of days to process the same datasets. In addition, we introduce a fast implementation of our algorithms: sCwc does not require any adjusting parameter, while sLcc requires a threshold parameter, which we can use to control the number of features that the algorithm selects.

  19. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data.

    Science.gov (United States)

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-05

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. An efficient swarm intelligence approach to feature selection based on invasive weed optimization: Application to multivariate calibration and classification using spectroscopic data

    Science.gov (United States)

    Sheykhizadeh, Saheleh; Naseri, Abdolhossein

    2018-04-01

    Variable selection plays a key role in classification and multivariate calibration. Variable selection methods are aimed at choosing a set of variables, from a large pool of available predictors, relevant to the analyte concentrations estimation, or to achieve better classification results. Many variable selection techniques have now been introduced among which, those which are based on the methodologies of swarm intelligence optimization have been more respected during a few last decades since they are mainly inspired by nature. In this work, a simple and new variable selection algorithm is proposed according to the invasive weed optimization (IWO) concept. IWO is considered a bio-inspired metaheuristic mimicking the weeds ecological behavior in colonizing as well as finding an appropriate place for growth and reproduction; it has been shown to be very adaptive and powerful to environmental changes. In this paper, the first application of IWO, as a very simple and powerful method, to variable selection is reported using different experimental datasets including FTIR and NIR data, so as to undertake classification and multivariate calibration tasks. Accordingly, invasive weed optimization - linear discrimination analysis (IWO-LDA) and invasive weed optimization- partial least squares (IWO-PLS) are introduced for multivariate classification and calibration, respectively.

  1. Novel Automatic Filter-Class Feature Selection for Machine Learning Regression

    DEFF Research Database (Denmark)

    Wollsen, Morten Gill; Hallam, John; Jørgensen, Bo Nørregaard

    2017-01-01

    With the increased focus on application of Big Data in all sectors of society, the performance of machine learning becomes essential. Efficient machine learning depends on efficient feature selection algorithms. Filter feature selection algorithms are model-free and therefore very fast, but require...... 4 other common automatic feature selection algorithms: Backward selection, forward selection, NLPCA and PCA as well as using no algorithms at all. The benchmarking will be performed through two experiments with two different data sets that are both time-series regression-based problems...... selection algorithms that can be used for many different applications....

  2. A comparison study on feature selection of DNA structural properties for promoter prediction.

    Science.gov (United States)

    Gan, Yanglan; Guan, Jihong; Zhou, Shuigeng

    2012-01-07

    Promoter prediction is an integrant step for understanding gene regulation and annotating genomes. Traditional promoter analysis is mainly based on sequence compositional features. Recently, many kinds of structural features have been employed in promoter prediction. However, considering the high-dimensionality and overfitting problems, it is unfeasible to utilize all available features for promoter prediction. Thus it is necessary to choose some appropriate features for the prediction task. This paper conducts an extensive comparison study on feature selection of DNA structural properties for promoter prediction. Firstly, to examine whether promoters possess some special structures, we carry out a systematical comparison among the profiles of thirteen structural features on promoter and non-promoter sequences. Secondly, we investigate the correlations between these structural features and promoter sequences. Thirdly, both filter and wrapper methods are utilized to select appropriate feature subsets from thirteen different kinds of structural features for promoter prediction, and the predictive power of the selected feature subsets is evaluated. Finally, we compare the prediction performance of the feature subsets selected in this paper with nine existing promoter prediction approaches. Experimental results show that the structural features are differentially correlated to promoters. Specifically, DNA-bending stiffness, DNA denaturation and energy-related features are highly correlated with promoters. The predictive power for promoter sequences differentiates greatly among different structural features. Selecting the relevant features can significantly improve the accuracy of promoter prediction.

  3. Methods for pattern selection, class-specific feature selection and classification for automated learning.

    Science.gov (United States)

    Roy, Asim; Mackin, Patrick D; Mukhopadhyay, Somnath

    2013-05-01

    This paper presents methods for training pattern (prototype) selection, class-specific feature selection and classification for automated learning. For training pattern selection, we propose a method of sampling that extracts a small number of representative training patterns (prototypes) from the dataset. The idea is to extract a set of prototype training patterns that represents each class region in a classification problem. In class-specific feature selection, we try to find a separate feature set for each class such that it is the best one to separate that class from the other classes. We then build a separate classifier for that class based on its own feature set. The paper also presents a new hypersphere classification algorithm. Hypersphere nets are similar to radial basis function (RBF) nets and belong to the group of kernel function nets. Polynomial time complexity of the methods is proven. Polynomial time complexity of learning algorithms is important to the field of neural networks. Computational results are provided for a number of well-known datasets. None of the parameters of the algorithm were fine tuned for any of the problems solved and this supports the idea of automation of learning methods. Automation of learning is crucial to wider deployment of learning technologies. Copyright © 2012 Elsevier Ltd. All rights reserved.

  4. Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

    Science.gov (United States)

    Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor

    2010-08-01

    Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods. Copyright © 2010 Elsevier Ltd. All rights reserved.

  5. Review of research in feature based design

    NARCIS (Netherlands)

    Salomons, O.W.; van Houten, Frederikus J.A.M.; Kals, H.J.J.

    1993-01-01

    Research in feature-based design is reviewed. Feature-based design is regarded as a key factor towards CAD/CAPP integration from a process planning point of view. From a design point of view, feature-based design offers possibilities for supporting the design process better than current CAD systems

  6. Feature Selection Using Adaboost for Face Expression Recognition

    National Research Council Canada - National Science Library

    Silapachote, Piyanuch; Karuppiah, Deepak R; Hanson, Allen R

    2005-01-01

    We propose a classification technique for face expression recognition using AdaBoost that learns by selecting the relevant global and local appearance features with the most discriminating information...

  7. Feature selection for domain knowledge representation through multitask learning

    CSIR Research Space (South Africa)

    Rosman, Benjamin S

    2014-10-01

    Full Text Available -1 Feature selection for domain knowledge representation through multitask learning Benjamin Rosman Mobile Intelligent Autonomous Systems CSIR South Africa BRosman@csir.co.za Representation learning is a difficult and important problem...

  8. Semisupervised feature selection via spline regression for video semantic recognition.

    Science.gov (United States)

    Han, Yahong; Yang, Yi; Yan, Yan; Ma, Zhigang; Sebe, Nicu; Zhou, Xiaofang

    2015-02-01

    To improve both the efficiency and accuracy of video semantic recognition, we can perform feature selection on the extracted video features to select a subset of features from the high-dimensional feature set for a compact and accurate video data representation. Provided the number of labeled videos is small, supervised feature selection could fail to identify the relevant features that are discriminative to target classes. In many applications, abundant unlabeled videos are easily accessible. This motivates us to develop semisupervised feature selection algorithms to better identify the relevant video features, which are discriminative to target classes by effectively exploiting the information underlying the huge amount of unlabeled video data. In this paper, we propose a framework of video semantic recognition by semisupervised feature selection via spline regression (S(2)FS(2)R) . Two scatter matrices are combined to capture both the discriminative information and the local geometry structure of labeled and unlabeled training videos: A within-class scatter matrix encoding discriminative information of labeled training videos and a spline scatter output from a local spline regression encoding data distribution. An l2,1 -norm is imposed as a regularization term on the transformation matrix to ensure it is sparse in rows, making it particularly suitable for feature selection. To efficiently solve S(2)FS(2)R , we develop an iterative algorithm and prove its convergency. In the experiments, three typical tasks of video semantic recognition, such as video concept detection, video classification, and human action recognition, are used to demonstrate that the proposed S(2)FS(2)R achieves better performance compared with the state-of-the-art methods.

  9. Unsupervised Feature Selection via Nonnegative Spectral Analysis and Redundancy Control.

    Science.gov (United States)

    Li, Zechao; Tang, Jinhui

    2015-12-01

    In many image processing and pattern recognition problems, visual contents of images are currently described by high-dimensional features, which are often redundant and noisy. Toward this end, we propose a novel unsupervised feature selection scheme, namely, nonnegative spectral analysis with constrained redundancy, by jointly leveraging nonnegative spectral clustering and redundancy analysis. The proposed method can directly identify a discriminative subset of the most useful and redundancy-constrained features. Nonnegative spectral analysis is developed to learn more accurate cluster labels of the input images, during which the feature selection is performed simultaneously. The joint learning of the cluster labels and feature selection matrix enables to select the most discriminative features. Row-wise sparse models with a general ℓ(2, p)-norm (0 image benchmarks, including face data, handwritten digit data, and object image data. The proposed method achieves encouraging the experimental results in comparison with several representative algorithms, which demonstrates the effectiveness of the proposed algorithm for unsupervised feature selection.

  10. Relevant test set using feature selection algorithm for early detection ...

    African Journals Online (AJOL)

    The objective of feature selection is to find the most relevant features for classification. Thus, the dimensionality of the information will be reduced and may improve classification's accuracy. This paper proposed a minimum set of relevant questions that can be used for early detection of dyslexia. In this research, we ...

  11. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets

    NARCIS (Netherlands)

    Aalaei, Shokoufeh; Shahraki, Hadi; Rowhanimanesh, Alireza; Eslami, Saeid

    2016-01-01

    This study addresses feature selection for breast cancer diagnosis. The present process uses a wrapper approach using GA-based on feature selection and PS-classifier. The results of experiment show that the proposed model is comparable to the other models on Wisconsin breast cancer datasets. To

  12. Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection.

    Science.gov (United States)

    Ang, Jun Chin; Mirzal, Andri; Haron, Habibollah; Hamed, Haza Nuzly Abdull

    2016-01-01

    Recently, feature selection and dimensionality reduction have become fundamental tools for many data mining tasks, especially for processing high-dimensional data such as gene expression microarray data. Gene expression microarray data comprises up to hundreds of thousands of features with relatively small sample size. Because learning algorithms usually do not work well with this kind of data, a challenge to reduce the data dimensionality arises. A huge number of gene selection are applied to select a subset of relevant features for model construction and to seek for better cancer classification performance. This paper presents the basic taxonomy of feature selection, and also reviews the state-of-the-art gene selection methods by grouping the literatures into three categories: supervised, unsupervised, and semi-supervised. The comparison of experimental results on top 5 representative gene expression datasets indicates that the classification accuracy of unsupervised and semi-supervised feature selection is competitive with supervised feature selection.

  13. Improving Naive Bayes with Online Feature Selection for Quick Adaptation to Evolving Feature Usefulness

    Energy Technology Data Exchange (ETDEWEB)

    Pon, R K; Cardenas, A F; Buttler, D J

    2007-09-19

    The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifier with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.

  14. Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen Teks

    Directory of Open Access Journals (Sweden)

    Mediana Aryuni

    2013-06-01

    Full Text Available An ensemble method is an approach where several classifiers are created from the training data which can be often more accurate than any of the single classifiers, especially if the base classifiers are accurate and different one each other. Menawhile, feature clustering can reduce feature space by joining similar words into one cluster. The objective of this research is to develop a text categorization system that employs feature clustering based on ensemble feature selection. The research methodology consists of text documents preprocessing, feature subspaces generation using the genetic algorithm-based iterative refinement, implementation of base classifiers by applying feature clustering, and classification result integration of each base classifier using both the static selection and majority voting methods. Experimental results show that the computational time consumed in classifying the dataset into 2 and 3 categories using the feature clustering method is 1.18 and 27.04 seconds faster in compared to those that do not employ the feature selection method, respectively. Also, using static selection method, the ensemble feature selection method with genetic algorithm-based iterative refinement produces 10% and 10.66% better accuracy in compared to those produced by the single classifier in classifying the dataset into 2 and 3 categories, respectively. Whilst, using the majority voting method for the same experiment, the similar ensemble method produces 10% and 12% better accuracy than those produced by the single classifier, respectively.

  15. Graphical matching rules for cardinality based service feature diagrams

    Directory of Open Access Journals (Sweden)

    Faiza Kanwal

    2017-03-01

    Full Text Available To provide efficient services to end-users, variability and commonality among the features of the product line is a challenge for industrialist and researchers. Feature modeling provides great services to deal with variability and commonality among the features of product line. Cardinality based service feature diagrams changed the basic framework of service feature diagrams by putting constraints to them, which make service specifications more flexible, but apart from their variation in selection third party services may have to be customizable. Although to control variability, cardinality based service feature diagrams provide high level visual notations. For specifying variability, the use of cardinality based service feature diagrams raises the problem of matching a required feature diagram against the set of provided diagrams.

  16. Super-selective cryogenic etching for sub-10 nm features

    Science.gov (United States)

    Liu, Zuwei; Wu, Ying; Harteneck, Bruce; Olynick, Deirdre

    2013-01-01

    Plasma etching is a powerful technique for transferring high-resolution lithographic masks into functional materials. Significant challenges arise with shrinking feature sizes, such as etching with thin masks. Traditionally this has been addressed with hard masks and consequently additional costly steps. Here we present a pathway to high selectivity soft mask pattern transfer using cryogenic plasma etching towards low-cost high throughput sub-10 nm nanofabrication. Cryogenic SF6/O2 gas chemistry is studied for high fidelity, high selectivity inductively coupled plasma etching of silicon. Selectivity was maximized on large features (400 nm-1.5 μm) with a focus on minimizing photoresist etch rates. An overall anisotropic profile with selectivity around 140:1 with a photoresist mask for feature size 1.5 μm was realized with this clean, low damage process. At the deep nanoscale, selectivity is reduced by an order of magnitude. Despite these limits, high selectivity is achieved for anisotropic high aspect ratio 10 nm scale etching with thin polymeric masks. Gentler ion bombardment resulted in planar-dependent etching and produced faceted sub-100 nm features.

  17. HUMAN IDENTIFICATION BASED ON EXTRACTED GAIT FEATURES

    OpenAIRE

    Hu Ng; Hau-Lee Ton; Wooi-Haw Tan; Timothy Tzen-Vun Yap; Pei-Fen Chong; Junaidi Abdullah

    2011-01-01

    This paper presents a human identification system based on automatically extracted gait features. The proposed approach consists of three parts: extraction of human gait features from enhanced human silhouette, smoothing process on extracted gait features and classification by three classification techniques: fuzzy k- nearest neighbour, linear discriminate analysis and linear support vector machine. The gait features extracted are height, width, crotch height, step-size of the human silhouett...

  18. Feature Extraction Based on Decision Boundaries

    Science.gov (United States)

    Lee, Chulhee; Landgrebe, David A.

    1993-01-01

    In this paper, a novel approach to feature extraction for classification is proposed based directly on the decision boundaries. We note that feature extraction is equivalent to retaining informative features or eliminating redundant features; thus, the terms 'discriminantly information feature' and 'discriminantly redundant feature' are first defined relative to feature extraction for classification. Next, it is shown how discriminantly redundant features and discriminantly informative features are related to decision boundaries. A novel characteristic of the proposed method arises by noting that usually only a portion of the decision boundary is effective in discriminating between classes, and the concept of the effective decision boundary is therefore introduced. Next, a procedure to extract discriminantly informative features based on a decision boundary is proposed. The proposed feature extraction algorithm has several desirable properties: (1) It predicts the minimum number of features necessary to achieve the same classification accuracy as in the original space for a given pattern recognition problem; and (2) it finds the necessary feature vectors. The proposed algorithm does not deteriorate under the circumstances of equal class means or equal class covariances as some previous algorithms do. Experiments show that the performance of the proposed algorithm compares favorably with those of previous algorithms.

  19. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

    Directory of Open Access Journals (Sweden)

    Daniel Peralta

    2015-01-01

    Full Text Available Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.

  20. Feature Selection Methods for Zero-Shot Learning of Neural Activity

    Directory of Open Access Journals (Sweden)

    Carlos A. Caceres

    2017-06-01

    Full Text Available Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.

  1. Simultaneous feature selection and classification via Minimax Probability Machine

    Directory of Open Access Journals (Sweden)

    Liming Yang

    2010-12-01

    Full Text Available This paper presents a novel method for simultaneous feature selection and classification by incorporating a robust L1-norm into the objective function of Minimax Probability Machine (MPM. A fractional programming framework is derived by using a bound on the misclassification error involving the mean and covariance of the data. Furthermore, the problems are solved by the Quadratic Interpolation method. Experiments show that our methods can select fewer features to improve the generalization compared to MPM, which illustrates the effectiveness of the proposed algorithms.

  2. Flexible-hybrid sequential floating search in statistical feature selection

    Czech Academy of Sciences Publication Activity Database

    Somol, Petr; Novovičová, Jana; Pudil, Pavel

    2006-01-01

    Roč. 44, č. 4109 (2006), s. 623-639 ISSN 0302-9743. [Joint IAPR International Workshops SSPR 2006 and SPR 2006. Hong Kong , 17.08.2006-19.08.2006] R&D Projects: GA AV ČR IAA2075302; GA MŠk 1M0572 Grant - others:Commision EU(XE) FP6507752 Institutional research plan: CEZ:AV0Z10750506 Keywords : feature selection * flexible-hybrid * sequential floating search Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.402, year: 2005 http://library.utia.cas.cz/separaty/historie/somol-flexible-hybrid sequential floating search in statistical feature selection.pdf

  3. Effect of feature-selective attention on neuronal responses in macaque area MT

    Science.gov (United States)

    Chen, X.; Hoffmann, K.-P.; Albright, T. D.

    2012-01-01

    Attention influences visual processing in striate and extrastriate cortex, which has been extensively studied for spatial-, object-, and feature-based attention. Most studies exploring neural signatures of feature-based attention have trained animals to attend to an object identified by a certain feature and ignore objects/displays identified by a different feature. Little is known about the effects of feature-selective attention, where subjects attend to one stimulus feature domain (e.g., color) of an object while features from different domains (e.g., direction of motion) of the same object are ignored. To study this type of feature-selective attention in area MT in the middle temporal sulcus, we trained macaque monkeys to either attend to and report the direction of motion of a moving sine wave grating (a feature for which MT neurons display strong selectivity) or attend to and report its color (a feature for which MT neurons have very limited selectivity). We hypothesized that neurons would upregulate their firing rate during attend-direction conditions compared with attend-color conditions. We found that feature-selective attention significantly affected 22% of MT neurons. Contrary to our hypothesis, these neurons did not necessarily increase firing rate when animals attended to direction of motion but fell into one of two classes. In one class, attention to color increased the gain of stimulus-induced responses compared with attend-direction conditions. The other class displayed the opposite effects. Feature-selective activity modulations occurred earlier in neurons modulated by attention to color compared with neurons modulated by attention to motion direction. Thus feature-selective attention influences neuronal processing in macaque area MT but often exhibited a mismatch between the preferred stimulus dimension (direction of motion) and the preferred attention dimension (attention to color). PMID:22170961

  4. Histological image classification using biologically interpretable shape-based features

    International Nuclear Information System (INIS)

    Kothari, Sonal; Phan, John H; Young, Andrew N; Wang, May D

    2013-01-01

    Automatic cancer diagnostic systems based on histological image classification are important for improving therapeutic decisions. Previous studies propose textural and morphological features for such systems. These features capture patterns in histological images that are useful for both cancer grading and subtyping. However, because many of these features lack a clear biological interpretation, pathologists may be reluctant to adopt these features for clinical diagnosis. We examine the utility of biologically interpretable shape-based features for classification of histological renal tumor images. Using Fourier shape descriptors, we extract shape-based features that capture the distribution of stain-enhanced cellular and tissue structures in each image and evaluate these features using a multi-class prediction model. We compare the predictive performance of the shape-based diagnostic model to that of traditional models, i.e., using textural, morphological and topological features. The shape-based model, with an average accuracy of 77%, outperforms or complements traditional models. We identify the most informative shapes for each renal tumor subtype from the top-selected features. Results suggest that these shapes are not only accurate diagnostic features, but also correlate with known biological characteristics of renal tumors. Shape-based analysis of histological renal tumor images accurately classifies disease subtypes and reveals biologically insightful discriminatory features. This method for shape-based analysis can be extended to other histological datasets to aid pathologists in diagnostic and therapeutic decisions

  5. Individual discriminative face recognition models based on subsets of features

    DEFF Research Database (Denmark)

    Clemmensen, Line Katrine Harder; Gomez, David Delgado; Ersbøll, Bjarne Kjær

    2007-01-01

    The accuracy of data classification methods depends considerably on the data representation and on the selected features. In this work, the elastic net model selection is used to identify meaningful and important features in face recognition. Modelling the characteristics which distinguish one...... selection techniques such as forward selection or lasso regression become inadequate. In the experimental section, the performance of the elastic net model is compared with geometrical and color based algorithms widely used in face recognition such as Procrustes nearest neighbor, Eigenfaces, or Fisher...

  6. Boosting color feature selection for color face recognition.

    Science.gov (United States)

    Choi, Jae Young; Ro, Yong Man; Plataniotis, Konstantinos N

    2011-05-01

    This paper introduces the new color face recognition (FR) method that makes effective use of boosting learning as color-component feature selection framework. The proposed boosting color-component feature selection framework is designed for finding the best set of color-component features from various color spaces (or models), aiming to achieve the best FR performance for a given FR task. In addition, to facilitate the complementary effect of the selected color-component features for the purpose of color FR, they are combined using the proposed weighted feature fusion scheme. The effectiveness of our color FR method has been successfully evaluated on the following five public face databases (DBs): CMU-PIE, Color FERET, XM2VTSDB, SCface, and FRGC 2.0. Experimental results show that the results of the proposed method are impressively better than the results of other state-of-the-art color FR methods over different FR challenges including highly uncontrolled illumination, moderate pose variation, and small resolution face images.

  7. A counter-example in linear feature selection theory

    Science.gov (United States)

    Brown, D. R.; Omalley, M. J.

    1976-01-01

    The paper shows that it is possible to construct two k x n matrices, both of which maximize divergence in the transformed space of the linear feature selection problem in multiclass pattern recognition, and which are not row equivalent. Thus, even under extremely strong conditions, it is not possible to assume that all matrix solutions which maximize transformed divergence are row equivalent.

  8. Technical Evaluation Report 27: Educational Wikis: Features and selection criteria

    Directory of Open Access Journals (Sweden)

    Jim Rudolph

    2004-04-01

    Full Text Available This report discusses the educational uses of the ‘wiki,’ an increasingly popular approach to online community development. Wikis are defined and compared with ‘blogging’ methods; characteristics of major wiki engines are described; and wiki features and selection criteria are examined.

  9. Feature extraction and sensor selection for NPP initiating event identification

    International Nuclear Information System (INIS)

    Lin, Ting-Han; Wu, Shun-Chi; Chen, Kuang-You; Chou, Hwai-Pwu

    2017-01-01

    Highlights: • A two-stage feature extraction scheme for NPP initiating event identification. • With stBP, interrelations among the sensors can be retained for identification. • With dSFS, sensors that are crucial for identification can be efficiently selected. • Efficacy of the scheme is illustrated with data from the Maanshan NPP simulator. - Abstract: Initiating event identification is essential in managing nuclear power plant (NPP) severe accidents. In this paper, a novel two-stage feature extraction scheme that incorporates the proposed sensor type-wise block projection (stBP) and deflatable sequential forward selection (dSFS) is used to elicit the discriminant information in the data obtained from various NPP sensors to facilitate event identification. With the stBP, the primal features can be extracted without eliminating the interrelations among the sensors of the same type. The extracted features are then subjected to a further dimensionality reduction by selecting the sensors that are most relevant to the events under consideration. This selection is not easy, and a combinatorial optimization technique is normally required. With the dSFS, an optimal sensor set can be found with less computational load. Moreover, its sensor deflation stage allows sensors in the preselected set to be iteratively refined to avoid being trapped into a local optimum. Results from detailed experiments containing data of 12 event categories and a total of 112 events generated with a Taiwan’s Maanshan NPP simulator are presented to illustrate the efficacy of the proposed scheme.

  10. Novel 3D ultrasound image-based biomarkers based on a feature selection from a 2D standardized vessel wall thickness map: a tool for sensitive assessment of therapies for carotid atherosclerosis

    International Nuclear Information System (INIS)

    Chiu, Bernard; Li Bing; Chow, Tommy W S

    2013-01-01

    With the advent of new therapies and management strategies for carotid atherosclerosis, there is a parallel need for measurement tools or biomarkers to evaluate the efficacy of these new strategies. 3D ultrasound has been shown to provide reproducible measurements of plaque area/volume and vessel wall volume. However, since carotid atherosclerosis is a focal disease that predominantly occurs at bifurcations, biomarkers based on local plaque change may be more sensitive than global volumetric measurements in demonstrating efficacy of new therapies. The ultimate goal of this paper is to develop a biomarker that is based on the local distribution of vessel-wall-plus-plaque thickness change (VWT-Change) that has occurred during the course of a clinical study. To allow comparison between different treatment groups, the VWT-Change distribution of each subject must first be mapped to a standardized domain. In this study, we developed a technique to map the 3D VWT-Change distribution to a 2D standardized template. We then applied a feature selection technique to identify regions on the 2D standardized map on which subjects in different treatment groups exhibit greater difference in VWT-Change. The proposed algorithm was applied to analyse the VWT-Change of 20 subjects in a placebo-controlled study of the effect of atorvastatin (Lipitor). The average VWT-Change for each subject was computed (i) over all points in the 2D map and (ii) over feature points only. For the average computed over all points, 97 subjects per group would be required to detect an effect size of 25% that of atorvastatin in a six-month study. The sample size is reduced to 25 subjects if the average were computed over feature points only. The introduction of this sensitive quantification technique for carotid atherosclerosis progression/regression would allow many proof-of-principle studies to be performed before a more costly and longer study involving a larger population is held to confirm the treatment

  11. Feature selection for high-dimensional integrated data

    KAUST Repository

    Zheng, Charles

    2012-04-26

    Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.

  12. Degree of contribution (DoC) feature selection algorithm for structural brain MRI volumetric features in depression detection.

    Science.gov (United States)

    Kipli, Kuryati; Kouzani, Abbas Z

    2015-07-01

    Accurate detection of depression at an individual level using structural magnetic resonance imaging (sMRI) remains a challenge. Brain volumetric changes at a structural level appear to have importance in depression biomarkers studies. An automated algorithm is developed to select brain sMRI volumetric features for the detection of depression. A feature selection (FS) algorithm called degree of contribution (DoC) is developed for selection of sMRI volumetric features. This algorithm uses an ensemble approach to determine the degree of contribution in detection of major depressive disorder. The DoC is the score of feature importance used for feature ranking. The algorithm involves four stages: feature ranking, subset generation, subset evaluation, and DoC analysis. The performance of DoC is evaluated on the Duke University Multi-site Imaging Research in the Analysis of Depression sMRI dataset. The dataset consists of 115 brain sMRI scans of 88 healthy controls and 27 depressed subjects. Forty-four sMRI volumetric features are used in the evaluation. The DoC score of forty-four features was determined as the accuracy threshold (Acc_Thresh) was varied. The DoC performance was compared with that of four existing FS algorithms. At all defined Acc_Threshs, DoC outperformed the four examined FS algorithms for the average classification score and the maximum classification score. DoC has a good ability to generate reduced-size subsets of important features that could yield high classification accuracy. Based on the DoC score, the most discriminant volumetric features are those from the left-brain region.

  13. Joint embedding learning and sparse regression: a framework for unsupervised feature selection.

    Science.gov (United States)

    Hou, Chenping; Nie, Feiping; Li, Xuelong; Yi, Dongyun; Wu, Yi

    2014-06-01

    Feature selection has aroused considerable research interests during the last few decades. Traditional learning-based feature selection methods separate embedding learning and feature ranking. In this paper, we propose a novel unsupervised feature selection framework, termed as the joint embedding learning and sparse regression (JELSR), in which the embedding learning and sparse regression are jointly performed. Specifically, the proposed JELSR joins embedding learning with sparse regression to perform feature selection. To show the effectiveness of the proposed framework, we also provide a method using the weight via local linear approximation and adding the l2,1 -norm regularization, and design an effective algorithm to solve the corresponding optimization problem. Furthermore, we also conduct some insightful discussion on the proposed feature selection approach, including the convergence analysis, computational complexity, and parameter determination. In all, the proposed framework not only provides a new perspective to view traditional methods but also evokes some other deep researches for feature selection. Compared with traditional unsupervised feature selection methods, our approach could integrate the merits of embedding learning and sparse regression. Promising experimental results on different kinds of data sets, including image, voice data and biological data, have validated the effectiveness of our proposed algorithm.

  14. A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

    NARCIS (Netherlands)

    Christin, Christin; Hoefsloot, Huub C. J.; Smilde, Age K.; Hoekman, B.; Suits, Frank; Bischoff, Rainer; Horvatovich, Peter

    In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination

  15. iFeature: a python package and web server for features extraction and selection from protein and peptide sequences.

    Science.gov (United States)

    Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning

    2018-03-08

    Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.

  16. Economic indicators selection for crime rates forecasting using cooperative feature selection

    Science.gov (United States)

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Salleh Sallehuddin, Roselina

    2013-04-01

    Features selection in multivariate forecasting model is very important to ensure that the model is accurate. The purpose of this study is to apply the Cooperative Feature Selection method for features selection. The features are economic indicators that will be used in crime rate forecasting model. The Cooperative Feature Selection combines grey relational analysis and artificial neural network to establish a cooperative model that can rank and select the significant economic indicators. Grey relational analysis is used to select the best data series to represent each economic indicator and is also used to rank the economic indicators according to its importance to the crime rate. After that, the artificial neural network is used to select the significant economic indicators for forecasting the crime rates. In this study, we used economic indicators of unemployment rate, consumer price index, gross domestic product and consumer sentiment index, as well as data rates of property crime and violent crime for the United States. Levenberg-Marquardt neural network is used in this study. From our experiments, we found that consumer price index is an important economic indicator that has a significant influence on the violent crime rate. While for property crime rate, the gross domestic product, unemployment rate and consumer price index are the influential economic indicators. The Cooperative Feature Selection is also found to produce smaller errors as compared to Multiple Linear Regression in forecasting property and violent crime rates.

  17. Prediction of periventricular leukomalacia. Part II: Selection of hemodynamic features using computational intelligence.

    Science.gov (United States)

    Samanta, Biswanath; Bird, Geoffrey L; Kuijpers, Marijn; Zimmerman, Robert A; Jarvik, Gail P; Wernovsky, Gil; Clancy, Robert R; Licht, Daniel J; Gaynor, J William; Nataraj, Chandrasekhar

    2009-07-01

    The objective of Part II is to analyze the dataset of extracted hemodynamic features (Case 3 of Part I) through computational intelligence (CI) techniques for identification of potential prognostic factors for periventricular leukomalacia (PVL) occurrence in neonates with congenital heart disease. The extracted features (Case 3 dataset of Part I) were used as inputs to CI based classifiers, namely, multi-layer perceptron (MLP) and probabilistic neural network (PNN) in combination with genetic algorithms (GA) for selection of the most suitable features predicting the occurrence of PVL. The selected features were next used as inputs to a decision tree (DT) algorithm for generating easily interpretable rules of PVL prediction. Prediction performance for two CI based classifiers, MLP and PNN coupled with GA are presented for different number of selected features. The best prediction performances were achieved with 6 and 7 selected features. The prediction success was 100% in training and the best ranges of sensitivity (SN), specificity (SP) and accuracy (AC) in test were 60-73%, 74-84% and 71-74%, respectively. The identified features when used with the DT algorithm gave best SN, SP and AC in the ranges of 87-90% in training and 80-87%, 74-79% and 79-82% in test. Among the variables selected in CI, systolic and diastolic blood pressures, and pCO(2) figured prominently similar to Part I. Decision tree based rules for prediction of PVL occurrence were obtained using the CI selected features. The proposed approach combines the generalization capability of CI based feature selection approach and generation of easily interpretable classification rules of the decision tree. The combination of CI techniques with DT gave substantially better test prediction performance than using CI and DT separately.

  18. Statistical feature extraction based iris recognition system

    Indian Academy of Sciences (India)

    Atul Bansal

    Abstract. Iris recognition systems have been proposed by numerous researchers using different feature extraction techniques for accurate and reliable biometric authentication. In this paper, a statistical feature extraction technique based on correlation between adjacent pixels has been proposed and implemented. Ham-.

  19. Separable mechanisms underlying global feature-based attention.

    Science.gov (United States)

    Bondarenko, Rowena; Boehler, Carsten N; Stoppel, Christian M; Heinze, Hans-Jochen; Schoenfeld, Mircea A; Hopf, Jens-Max

    2012-10-31

    Feature-based attention is known to operate in a spatially global manner, in that the selection of attended features is not bound to the spatial focus of attention. Here we used electromagnetic recordings in human observers to characterize the spatiotemporal signature of such global selection of an orientation feature. Observers performed a simple orientation-discrimination task while ignoring task-irrelevant orientation probes outside the focus of attention. We observed that global feature-based selection, indexed by the brain response to unattended orientation probes, is composed of separable functional components. One such component reflects global selection based on the similarity of the probe with task-relevant orientation values ("template matching"), which is followed by a component reflecting selection based on the similarity of the probe with the orientation value under discrimination in the focus of attention ("discrimination matching"). Importantly, template matching occurs at ∼150 ms after stimulus onset, ∼80 ms before the onset of discrimination matching. Moreover, source activity underlying template matching and discrimination matching was found to originate from ventral extrastriate cortex, with the former being generated in more anterolateral and the latter in more posteromedial parts, suggesting template matching to occur in visual cortex higher up in the visual processing hierarchy than discrimination matching. We take these observations to indicate that the population-level signature of global feature-based selection reflects a sequence of hierarchically ordered operations in extrastriate visual cortex, in which the selection based on task relevance has temporal priority over the selection based on the sensory similarity between input representations.

  20. Semantics of cardinality-based service feature diagrams based on linear logic

    Directory of Open Access Journals (Sweden)

    Ghulam Mustafa Assad

    2015-12-01

    Full Text Available To provide efficient services to end-user it is essential to manage variability among services. Feature modeling is an important approach to manage variability and commonalities of a system in product line. Feature models are composed of feature diagrams. Service feature diagrams (an extended form of feature diagrams introduced some new notations to classical feature diagrams. Service feature diagrams provide selection rights for variable features. In our previous work, we introduced cardinalities for the selection of features from a service feature diagram which we call cardinality-based service feature diagrams (CSFD. In this paper, we provide semantics to CSFDs. These semantics are backed by the formal calculus of Linear Logic. We provide rules to interpret CSFDs into linear logical formula. Our results show that the linear formulas of CSFDs give the same results as expected from the CSFDs.

  1. Fast Branch & Bound algorithms for optimal feature selection

    Czech Academy of Sciences Publication Activity Database

    Somol, Petr; Pudil, Pavel; Kittler, J.

    2004-01-01

    Roč. 26, č. 7 (2004), s. 900-912 ISSN 0162-8828 R&D Projects: GA ČR GA402/02/1271; GA ČR GA402/03/1310; GA AV ČR KSK1019101 Institutional research plan: CEZ:AV0Z1075907 Keywords : subset search * feature selection * search tree Subject RIV: BD - Theory of Information Impact factor: 4.352, year: 2004

  2. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data

    KAUST Repository

    Abusamra, Heba

    2013-05-01

    Microarray technology has enriched the study of gene expression in such a way that scientists are now able to measure the expression levels of thousands of genes in a single experiment. Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification, interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This thesis aims on a comparative study of state-of-the-art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k- nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t- statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used for this study. Different experiments have been applied to compare the performance of the classification methods with and without performing feature selection. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in

  3. A feature selection approach towards progressive vector transmission over the Internet

    Science.gov (United States)

    Miao, Ru; Song, Jia; Feng, Min

    2017-09-01

    WebGIS has been applied for visualizing and sharing geospatial information popularly over the Internet. In order to improve the efficiency of the client applications, the web-based progressive vector transmission approach is proposed. Important features should be selected and transferred firstly, and the methods for measuring the importance of features should be further considered in the progressive transmission. However, studies on progressive transmission for large-volume vector data have mostly focused on map generalization in the field of cartography, but rarely discussed on the selection of geographic features quantitatively. This paper applies information theory for measuring the feature importance of vector maps. A measurement model for the amount of information of vector features is defined based upon the amount of information for dealing with feature selection issues. The measurement model involves geometry factor, spatial distribution factor and thematic attribute factor. Moreover, a real-time transport protocol (RTP)-based progressive transmission method is then presented to improve the transmission of vector data. To clearly demonstrate the essential methodology and key techniques, a prototype for web-based progressive vector transmission is presented, and an experiment of progressive selection and transmission for vector features is conducted. The experimental results indicate that our approach clearly improves the performance and end-user experience of delivering and manipulating large vector data over the Internet.

  4. Novel feature selection method based on Stochastic Methods Coupled to Support Vector Machines using H- NMR data (data of olive and hazelnut oils

    Directory of Open Access Journals (Sweden)

    Oscar Eduardo Gualdron

    2014-12-01

    Full Text Available One of the principal inconveniences that analysis and information processing presents is that of the representation of dataset. Normally, one encounters a high number of samples, each one with thousands of variables, and in many cases with irrelevant information and noise. Therefore, in order to represent findings in a clearer way, it is necessary to reduce the amount of variables. In this paper, a novel variable selection technique for multivariable data analysis, inspired on stochastic methods and designed to work with support vector machines (SVM, is described. The approach is demonstrated in a food application involving the detection of adulteration of olive oil (more expensive with hazelnut oil (cheaper. Fingerprinting by H NMR spectroscopy was used to analyze the different samples. Results show that it is possible to reduce the number of variables without affecting classification results.

  5. Selection of LiDAR geometric features with adaptive neighborhood size for urban land cover classification

    Science.gov (United States)

    Dong, Weihua; Lan, Jianhang; Liang, Shunlin; Yao, Wei; Zhan, Zhicheng

    2017-08-01

    LiDAR has been an effective technology for acquiring urban land cover data in recent decades. Previous studies indicate that geometric features have a strong impact on land cover classification. Here, we analyzed an urban LiDAR dataset to explore the optimal feature subset from 25 geometric features incorporating 25 scales under 6 definitions for urban land cover classification. We performed a feature selection strategy to remove irrelevant or redundant features based on the correlation coefficient between features and classification accuracy of each features. The neighborhood scales were divided into small (0.5-1.5 m), medium (1.5-6 m) and large (>6 m) scale. Combining features with lower correlation coefficient and better classification performance would improve classification accuracy. The feature depicting homogeneity or heterogeneity of points would be calculated at a small scale, and the features to smooth points at a medium scale and the features of height different at large scale. As to the neighborhood definition, cuboid and cylinder were recommended. This study can guide the selection of optimal geometric features with adaptive neighborhood scale for urban land cover classification.

  6. A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease

    Science.gov (United States)

    Maryam, Setiawan, Noor Akhmad; Wahyunggoro, Oyas

    2017-08-01

    The diagnosis of erythemato-squamous disease is a complex problem and difficult to detect in dermatology. Besides that, it is a major cause of skin cancer. Data mining implementation in the medical field helps expert to diagnose precisely, accurately, and inexpensively. In this research, we use data mining technique to developed a diagnosis model based on multiclass SVM with a novel hybrid feature selection method to diagnose erythemato-squamous disease. Our hybrid feature selection method, named ChiGA (Chi Square and Genetic Algorithm), uses the advantages from filter and wrapper methods to select the optimal feature subset from original feature. Chi square used as filter method to remove redundant features and GA as wrapper method to select the ideal feature subset with SVM used as classifier. Experiment performed with 10 fold cross validation on erythemato-squamous diseases dataset taken from University of California Irvine (UCI) machine learning database. The experimental result shows that the proposed model based multiclass SVM with Chi Square and GA can give an optimum feature subset. There are 18 optimum features with 99.18% accuracy.

  7. Multi-Stage Recognition of Speech Emotion Using Sequential Forward Feature Selection

    Directory of Open Access Journals (Sweden)

    Liogienė Tatjana

    2016-07-01

    Full Text Available The intensive research of speech emotion recognition introduced a huge collection of speech emotion features. Large feature sets complicate the speech emotion recognition task. Among various feature selection and transformation techniques for one-stage classification, multiple classifier systems were proposed. The main idea of multiple classifiers is to arrange the emotion classification process in stages. Besides parallel and serial cases, the hierarchical arrangement of multi-stage classification is most widely used for speech emotion recognition. In this paper, we present a sequential-forward-feature-selection-based multi-stage classification scheme. The Sequential Forward Selection (SFS and Sequential Floating Forward Selection (SFFS techniques were employed for every stage of the multi-stage classification scheme. Experimental testing of the proposed scheme was performed using the German and Lithuanian emotional speech datasets. Sequential-feature-selection-based multi-stage classification outperformed the single-stage scheme by 12–42 % for different emotion sets. The multi-stage scheme has shown higher robustness to the growth of emotion set. The decrease in recognition rate with the increase in emotion set for multi-stage scheme was lower by 10–20 % in comparison with the single-stage case. Differences in SFS and SFFS employment for feature selection were negligible.

  8. Survival Prediction and Feature Selection in Patients with Breast Cancer Using Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Shahrbanoo Goli

    2016-01-01

    Full Text Available The Support Vector Regression (SVR model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model.

  9. Comparing of athletic performance and biometric features of selected teenagers based on the specific talent identification pattern of Karate with elite athletes

    Directory of Open Access Journals (Sweden)

    seyed Ehsan Naghibi

    2017-12-01

    Conclusion: According to that no significant difference in both groups between athletic performance and biometric parameters except one variable, we can conclude that the process of talent identification in the club studied, in order to distinguish and talented people from other potentially effective. Also according to the native data used in the analysis of the tests in this process can be patterns of biometric indices based on talent and sport performance in karate developed.

  10. Comparison of feature selection and classification for MALDI-MS data

    Directory of Open Access Journals (Sweden)

    Yang Mary

    2009-07-01

    Full Text Available Abstract Introduction In the classification of Mass Spectrometry (MS proteomics data, peak detection, feature selection, and learning classifiers are critical to classification accuracy. To better understand which methods are more accurate when classifying data, some publicly available peak detection algorithms for Matrix assisted Laser Desorption Ionization Mass Spectrometry (MALDI-MS data were recently compared; however, the issue of different feature selection methods and different classification models as they relate to classification performance has not been addressed. With the application of intelligent computing, much progress has been made in the development of feature selection methods and learning classifiers for the analysis of high-throughput biological data. The main objective of this paper is to compare the methods of feature selection and different learning classifiers when applied to MALDI-MS data and to provide a subsequent reference for the analysis of MS proteomics data. Results We compared a well-known method of feature selection, Support Vector Machine Recursive Feature Elimination (SVMRFE, and a recently developed method, Gradient based Leave-one-out Gene Selection (GLGS that effectively performs microarray data analysis. We also compared several learning classifiers including K-Nearest Neighbor Classifier (KNNC, Naïve Bayes Classifier (NBC, Nearest Mean Scaled Classifier (NMSC, uncorrelated normal based quadratic Bayes Classifier recorded as UDC, Support Vector Machines, and a distance metric learning for Large Margin Nearest Neighbor classifier (LMNN based on Mahanalobis distance. To compare, we conducted a comprehensive experimental study using three types of MALDI-MS data. Conclusion Regarding feature selection, SVMRFE outperformed GLGS in classification. As for the learning classifiers, when classification models derived from the best training were compared, SVMs performed the best with respect to the expected testing

  11. Audiovisual laughter detection based on temporal features

    NARCIS (Netherlands)

    Petridis, Stavros; Nijholt, Antinus; Nijholt, A.; Pantic, M.; Pantic, Maja; Poel, Mannes; Poel, M.; Hondorp, G.H.W.

    2008-01-01

    Previous research on automatic laughter detection has mainly been focused on audio-based detection. In this study we present an audiovisual approach to distinguishing laughter from speech based on temporal features and we show that the integration of audio and visual information leads to improved

  12. Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction

    Directory of Open Access Journals (Sweden)

    Hui Liu

    2015-01-01

    Full Text Available It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that p2, p1, p1′, and p2′ are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Decision fusion of subsets that getting after feature selection obtains excellent prediction performance, which proves feature selection combined with decision fusion is an effective and useful method for the task of HIV-1 protease cleavage site prediction. The results and analysis in this paper can provide useful instruction and help designing HIV-1 protease inhibitor in the future.

  13. Automatic migraine classification via feature selection committee and machine learning techniques over imaging and questionnaire data.

    Science.gov (United States)

    Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya; Gomez-Beldarrain, Marian; Fernandez-Ruanova, Begonya; Garcia-Monco, Juan Carlos

    2017-04-13

    Feature selection methods are commonly used to identify subsets of relevant features to facilitate the construction of models for classification, yet little is known about how feature selection methods perform in diffusion tensor images (DTIs). In this study, feature selection and machine learning classification methods were tested for the purpose of automating diagnosis of migraines using both DTIs and questionnaire answers related to emotion and cognition - factors that influence of pain perceptions. We select 52 adult subjects for the study divided into three groups: control group (15), subjects with sporadic migraine (19) and subjects with chronic migraine and medication overuse (18). These subjects underwent magnetic resonance with diffusion tensor to see white matter pathway integrity of the regions of interest involved in pain and emotion. The tests also gather data about pathology. The DTI images and test results were then introduced into feature selection algorithms (Gradient Tree Boosting, L1-based, Random Forest and Univariate) to reduce features of the first dataset and classification algorithms (SVM (Support Vector Machine), Boosting (Adaboost) and Naive Bayes) to perform a classification of migraine group. Moreover we implement a committee method to improve the classification accuracy based on feature selection algorithms. When classifying the migraine group, the greatest improvements in accuracy were made using the proposed committee-based feature selection method. Using this approach, the accuracy of classification into three types improved from 67 to 93% when using the Naive Bayes classifier, from 90 to 95% with the support vector machine classifier, 93 to 94% in boosting. The features that were determined to be most useful for classification included are related with the pain, analgesics and left uncinate brain (connected with the pain and emotions). The proposed feature selection committee method improved the performance of migraine diagnosis

  14. A Comparative Study of Feature Selection and Classification Methods for Gene Expression Data of Glioma

    KAUST Repository

    Abusamra, Heba

    2013-11-01

    Microarray gene expression data gained great importance in recent years due to its role in disease diagnoses and prognoses which help to choose the appropriate treatment plan for patients. This technology has shifted a new era in molecular classification. Interpreting gene expression data remains a difficult problem and an active research area due to their native nature of “high dimensional low sample size”. Such problems pose great challenges to existing classification methods. Thus, effective feature selection techniques are often needed in this case to aid to correctly classify different tumor types and consequently lead to a better understanding of genetic signatures as well as improve treatment strategies. This paper aims on a comparative study of state-of-the- art feature selection methods, classification methods, and the combination of them, based on gene expression data. We compared the efficiency of three different classification methods including: support vector machines, k-nearest neighbor and random forest, and eight different feature selection methods, including: information gain, twoing rule, sum minority, max minority, gini index, sum of variances, t-statistics, and one-dimension support vector machine. Five-fold cross validation was used to evaluate the classification performance. Two publicly available gene expression data sets of glioma were used in the experiments. Results revealed the important role of feature selection in classifying gene expression data. By performing feature selection, the classification accuracy can be significantly boosted by using a small number of genes. The relationship of features selected in different feature selection methods is investigated and the most frequent features selected in each fold among all methods for both datasets are evaluated.

  15. A HYBRID FILTER AND WRAPPER FEATURE SELECTION APPROACH FOR DETECTING CONTAMINATION IN DRINKING WATER MANAGEMENT SYSTEM

    Directory of Open Access Journals (Sweden)

    S. VISALAKSHI

    2017-07-01

    Full Text Available Feature selection is an important task in predictive models which helps to identify the irrelevant features in the high - dimensional dataset. In this case of water contamination detection dataset, the standard wrapper algorithm alone cannot be applied because of the complexity. To overcome this computational complexity problem and making it lighter, filter-wrapper based algorithm has been proposed. In this work, reducing the feature space is a significant component of water contamination. The main findings are as follows: (1 The main goal is speeding up the feature selection process, so the proposed filter - based feature pre-selection is applied and guarantees that useful data are improbable to be detached in the initial stage which discussed briefly in this paper. (2 The resulting features are again filtered by using the Genetic Algorithm coded with Support Vector Machine method, where it facilitates to nutshell the subset of features with high accuracy and decreases the expense. Experimental results show that the proposed methods trim down redundant features effectively and achieved better classification accuracy.

  16. Compensatory selection for roads over natural linear features by wolves in northern Ontario: Implications for caribou conservation.

    Science.gov (United States)

    Newton, Erica J; Patterson, Brent R; Anderson, Morgan L; Rodgers, Arthur R; Vander Vennen, Lucas M; Fryxell, John M

    2017-01-01

    Woodland caribou (Rangifer tarandus caribou) in Ontario are a threatened species that have experienced a substantial retraction of their historic range. Part of their decline has been attributed to increasing densities of anthropogenic linear features such as trails, roads, railways, and hydro lines. These features have been shown to increase the search efficiency and kill rate of wolves. However, it is unclear whether selection for anthropogenic linear features is additive or compensatory to selection for natural (water) linear features which may also be used for travel. We studied the selection of water and anthropogenic linear features by 52 resident wolves (Canis lupus x lycaon) over four years across three study areas in northern Ontario that varied in degrees of forestry activity and human disturbance. We used Euclidean distance-based resource selection functions (mixed-effects logistic regression) at the seasonal range scale with random coefficients for distance to water linear features, primary/secondary roads/railways, and hydro lines, and tertiary roads to estimate the strength of selection for each linear feature and for several habitat types, while accounting for availability of each feature. Next, we investigated the trade-off between selection for anthropogenic and water linear features. Wolves selected both anthropogenic and water linear features; selection for anthropogenic features was stronger than for water during the rendezvous season. Selection for anthropogenic linear features increased with increasing density of these features on the landscape, while selection for natural linear features declined, indicating compensatory selection of anthropogenic linear features. These results have implications for woodland caribou conservation. Prey encounter rates between wolves and caribou seem to be strongly influenced by increasing linear feature densities. This behavioral mechanism-a compensatory functional response to anthropogenic linear feature

  17. Compensatory selection for roads over natural linear features by wolves in northern Ontario: Implications for caribou conservation.

    Directory of Open Access Journals (Sweden)

    Erica J Newton

    Full Text Available Woodland caribou (Rangifer tarandus caribou in Ontario are a threatened species that have experienced a substantial retraction of their historic range. Part of their decline has been attributed to increasing densities of anthropogenic linear features such as trails, roads, railways, and hydro lines. These features have been shown to increase the search efficiency and kill rate of wolves. However, it is unclear whether selection for anthropogenic linear features is additive or compensatory to selection for natural (water linear features which may also be used for travel. We studied the selection of water and anthropogenic linear features by 52 resident wolves (Canis lupus x lycaon over four years across three study areas in northern Ontario that varied in degrees of forestry activity and human disturbance. We used Euclidean distance-based resource selection functions (mixed-effects logistic regression at the seasonal range scale with random coefficients for distance to water linear features, primary/secondary roads/railways, and hydro lines, and tertiary roads to estimate the strength of selection for each linear feature and for several habitat types, while accounting for availability of each feature. Next, we investigated the trade-off between selection for anthropogenic and water linear features. Wolves selected both anthropogenic and water linear features; selection for anthropogenic features was stronger than for water during the rendezvous season. Selection for anthropogenic linear features increased with increasing density of these features on the landscape, while selection for natural linear features declined, indicating compensatory selection of anthropogenic linear features. These results have implications for woodland caribou conservation. Prey encounter rates between wolves and caribou seem to be strongly influenced by increasing linear feature densities. This behavioral mechanism-a compensatory functional response to anthropogenic

  18. Multispectral image fusion based on fractal features

    Science.gov (United States)

    Tian, Jie; Chen, Jie; Zhang, Chunhua

    2004-01-01

    they can be dealt with special rules. There has been various fusion rules developed with each one aiming at a special problem. These rules have different performance, so it is very important to select an appropriate rule during the design of an image fusion system. Recent research denotes that the rule should be adjustable so that it is always suitable to extrude the features of targets and to preserve the pixels of useful information. In this paper, owing to the consideration that fractal dimension is one of the main features to distinguish man-made targets from natural objects, the fusion rule was defined that if the studied region of image contains man-made target, the pixels of the source image whose fractal dimension is minimal are saved to be the pixels of the fused image, otherwise, a weighted average operator is adopted to avoid loss of information. The main idea of this rule is to store the pixels with low fractal dimensions, so it can be named Minimal Fractal dimensions (MFD) fusion rule. This fractal-based algorithm is compared with a common weighted average fusion algorithm. An objective assessment is taken to the two fusion results. The criteria of Entropy, Cross-Entropy, Peak Signal-to-Noise Ratio (PSNR) and Standard Gray Scale Difference are defined in this paper. Reversely to the idea of constructing an ideal image as the assessing reference, the source images are selected to be the reference in this paper. It can be deemed that this assessment is to calculate how much the image quality has been enhanced and the quantity of information has been increased when the fused image is compared with the source images. The experimental results imply that the fractal-based multi-spectral fusion algorithm can effectively preserve the information of man-made objects with a high contrast. It is proved that this algorithm could well preserve features of military targets because that battlefield targets are most man-made objects and in common their images differ from

  19. Notes on the evolution of feature selection methodology

    Czech Academy of Sciences Publication Activity Database

    Somol, Petr; Novovičová, Jana; Pudil, Pavel

    2007-01-01

    Roč. 43, č. 5 (2007), s. 713-730 ISSN 0023-5954 R&D Projects: GA ČR GA102/07/1594; GA MŠk 1M0572; GA AV ČR IAA2075302 EU Projects: European Commission(XE) 507752 - MUSCLE Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : feature selection * branch and bound * sequential search * mixture model Subject RIV: IN - Informatics, Computer Science Impact factor: 0.552, year: 2007

  20. An Ensemble Method with Integration of Feature Selection and Classifier Selection to Detect the Landslides

    Science.gov (United States)

    Zhongqin, G.; Chen, Y.

    2017-12-01

    Abstract Quickly identify the spatial distribution of landslides automatically is essential for the prevention, mitigation and assessment of the landslide hazard. It's still a challenging job owing to the complicated characteristics and vague boundary of the landslide areas on the image. The high resolution remote sensing image has multi-scales, complex spatial distribution and abundant features, the object-oriented image classification methods can make full use of the above information and thus effectively detect the landslides after the hazard happened. In this research we present a new semi-supervised workflow, taking advantages of recent object-oriented image analysis and machine learning algorithms to quick locate the different origins of landslides of some areas on the southwest part of China. Besides a sequence of image segmentation, feature selection, object classification and error test, this workflow ensemble the feature selection and classifier selection. The feature this study utilized were normalized difference vegetation index (NDVI) change, textural feature derived from the gray level co-occurrence matrices (GLCM), spectral feature and etc. The improvement of this study shows this algorithm significantly removes some redundant feature and the classifiers get fully used. All these improvements lead to a higher accuracy on the determination of the shape of landslides on the high resolution remote sensing image, in particular the flexibility aimed at different kinds of landslides.

  1. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data.

    Science.gov (United States)

    Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina

    2016-02-06

    The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew's Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions.

  2. Bankruptcy prediction using SVM models with a new approach to combine features selection and parameter optimisation

    Science.gov (United States)

    Zhou, Ligang; Keung Lai, Kin; Yen, Jerome

    2014-03-01

    Due to the economic significance of bankruptcy prediction of companies for financial institutions, investors and governments, many quantitative methods have been used to develop effective prediction models. Support vector machine (SVM), a powerful classification method, has been used for this task; however, the performance of SVM is sensitive to model form, parameter setting and features selection. In this study, a new approach based on direct search and features ranking technology is proposed to optimise features selection and parameter setting for 1-norm and least-squares SVM models for bankruptcy prediction. This approach is also compared to the SVM models with parameter optimisation and features selection by the popular genetic algorithm technique. The experimental results on a data set with 2010 instances show that the proposed models are good alternatives for bankruptcy prediction.

  3. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification.

    Science.gov (United States)

    Wen, Tingxi; Zhang, Zhongnan

    2017-05-01

    In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.

  4. Feature-Selective Attention Adaptively Shifts Noise Correlations in Primary Auditory Cortex.

    Science.gov (United States)

    Downer, Joshua D; Rapone, Brittany; Verhein, Jessica; O'Connor, Kevin N; Sutter, Mitchell L

    2017-05-24

    Sensory environments often contain an overwhelming amount of information, with both relevant and irrelevant information competing for neural resources. Feature attention mediates this competition by selecting the sensory features needed to form a coherent percept. How attention affects the activity of populations of neurons to support this process is poorly understood because population coding is typically studied through simulations in which one sensory feature is encoded without competition. Therefore, to study the effects of feature attention on population-based neural coding, investigations must be extended to include stimuli with both relevant and irrelevant features. We measured noise correlations ( r noise ) within small neural populations in primary auditory cortex while rhesus macaques performed a novel feature-selective attention task. We found that the effect of feature-selective attention on r noise depended not only on the population tuning to the attended feature, but also on the tuning to the distractor feature. To attempt to explain how these observed effects might support enhanced perceptual performance, we propose an extension of a simple and influential model in which shifts in r noise can simultaneously enhance the representation of the attended feature while suppressing the distractor. These findings present a novel mechanism by which attention modulates neural populations to support sensory processing in cluttered environments. SIGNIFICANCE STATEMENT Although feature-selective attention constitutes one of the building blocks of listening in natural environments, its neural bases remain obscure. To address this, we developed a novel auditory feature-selective attention task and measured noise correlations ( r noise ) in rhesus macaque A1 during task performance. Unlike previous studies showing that the effect of attention on r noise depends on population tuning to the attended feature, we show that the effect of attention depends on the tuning

  5. Feature Selection, Flaring Size and Time-to-Flare Prediction Using Support Vector Regression, and Automated Prediction of Flaring Behavior Based on Spatio-Temporal Measures Using Hidden Markov Models

    Science.gov (United States)

    Al-Ghraibah, Amani

    Solar flares release stored magnetic energy in the form of radiation and can have significant detrimental effects on earth including damage to technological infrastructure. Recent work has considered methods to predict future flare activity on the basis of quantitative measures of the solar magnetic field. Accurate advanced warning of solar flare occurrence is an area of increasing concern and much research is ongoing in this area. Our previous work 111] utilized standard pattern recognition and classification techniques to determine (classify) whether a region is expected to flare within a predictive time window, using a Relevance Vector Machine (RVM) classification method. We extracted 38 features which describing the complexity of the photospheric magnetic field, the result classification metrics will provide the baseline against which we compare our new work. We find a true positive rate (TPR) of 0.8, true negative rate (TNR) of 0.7, and true skill score (TSS) of 0.49. This dissertation proposes three basic topics; the first topic is an extension to our previous work [111, where we consider a feature selection method to determine an appropriate feature subset with cross validation classification based on a histogram analysis of selected features. Classification using the top five features resulting from this analysis yield better classification accuracies across a large unbalanced dataset. In particular, the feature subsets provide better discrimination of the many regions that flare where we find a TPR of 0.85, a TNR of 0.65 sightly lower than our previous work, and a TSS of 0.5 which has an improvement comparing with our previous work. In the second topic, we study the prediction of solar flare size and time-to-flare using support vector regression (SVR). When we consider flaring regions only, we find an average error in estimating flare size of approximately half a GOES class. When we additionally consider non-flaring regions, we find an increased average

  6. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    Science.gov (United States)

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  7. Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection.

    Science.gov (United States)

    Pölsterl, Sebastian; Conjeti, Sailesh; Navab, Nassir; Katouzian, Amin

    2016-09-01

    In clinical research, the primary interest is often the time until occurrence of an adverse event, i.e., survival analysis. Its application to electronic health records is challenging for two main reasons: (1) patient records are comprised of high-dimensional feature vectors, and (2) feature vectors are a mix of categorical and real-valued features, which implies varying statistical properties among features. To learn from high-dimensional data, researchers can choose from a wide range of methods in the fields of feature selection and feature extraction. Whereas feature selection is well studied, little work focused on utilizing feature extraction techniques for survival analysis. We investigate how well feature extraction methods can deal with features having varying statistical properties. In particular, we consider multiview spectral embedding algorithms, which specifically have been developed for these situations. We propose to use random survival forests to accurately determine local neighborhood relations from right censored survival data. We evaluated 10 combinations of feature extraction methods and 6 survival models with and without intrinsic feature selection in the context of survival analysis on 3 clinical datasets. Our results demonstrate that for small sample sizes - less than 500 patients - models with built-in feature selection (Cox model with ℓ1 penalty, random survival forest, and gradient boosted models) outperform feature extraction methods by a median margin of 6.3% in concordance index (inter-quartile range: [-1.2%;14.6%]). If the number of samples is insufficient, feature extraction methods are unable to reliably identify the underlying manifold, which makes them of limited use in these situations. For large sample sizes - in our experiments, 2500 samples or more - feature extraction methods perform as well as feature selection methods. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data.

    Science.gov (United States)

    Balabin, Roman M; Smirnov, Sergey V

    2011-04-29

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  9. Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

    International Nuclear Information System (INIS)

    Balabin, Roman M.; Smirnov, Sergey V.

    2011-01-01

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm -1 ) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  10. Spatiotemporal Features for Asynchronous Event-based Data

    Directory of Open Access Journals (Sweden)

    Xavier eLagorce

    2015-02-01

    Full Text Available Bio-inspired asynchronous event-based vision sensors are currently introducing a paradigm shift in visual information processing. These new sensors rely on a stimulus-driven principle of light acquisition similar to biological retinas. They are event-driven and fully asynchronous, thereby reducing redundancy and encoding exact times of input signal changes, leading to a very precise temporal resolution. Approaches for higher-level computer vision often rely on the realiable detection of features in visual frames, but similar definitions of features for the novel dynamic and event-based visual input representation of silicon retinas have so far been lacking. This article addresses the problem of learning and recognizing features for event-based vision sensors, which capture properties of truly spatiotemporal volumes of sparse visual event information. A novel computational architecture for learning and encoding spatiotemporal features is introduced based on a set of predictive recurrent reservoir networks, competing via winner-take-all selection. Features are learned in an unsupervised manner from real-world input recorded with event-based vision sensors. It is shown that the networks in the architecture learn distinct and task-specific dynamic visual features, and can predict their trajectories over time.

  11. Schizophrenia with prominent catatonic features: A selective review.

    Science.gov (United States)

    Ungvari, Gabor S; Gerevich, Jozsef; Takács, Rozália; Gazdag, Gábor

    2017-08-14

    A widely accepted consensus holds that a variety of motor symptoms subsumed under the term 'catatonia' have been an integral part of the symptomatology of schizophrenia since 1896, when Kraepelin proposed the concept of dementia praecox (schizophrenia). Until recently, psychiatric classifications included catatonic schizophrenia mainly through tradition, without compelling evidence of its validity as a schizophrenia subtype. This selective review briefly summarizes the history, psychopathology, demographic and epidemiological data, and treatment options for schizophrenia with prominent catatonic features. Although most catatonic signs and symptoms are easy to observe and measure, the lack of conceptual clarity of catatonia and consensus about the threshold and criteria for its diagnosis have hampered our understanding of how catatonia contributes to the pathophysiology of schizophrenic psychoses. Diverse study samples and methodologies have further hindered research on schizophrenia with prominent catatonic features. A focus on the motor aspects of broadly defined schizophrenia using modern methods of detecting and quantifying catatonic signs and symptoms coupled with sophisticated neuroimaging techniques offers a new approach to research in this long-overlooked field. Copyright © 2017. Published by Elsevier B.V.

  12. Choice: 36 band feature selection software with applications to multispectral pattern recognition

    Science.gov (United States)

    Jones, W. C.

    1973-01-01

    Feature selection software was developed at the Earth Resources Laboratory that is capable of inputting up to 36 channels and selecting channel subsets according to several criteria based on divergence. One of the criterion used is compatible with the table look-up classifier requirements. The software indicates which channel subset best separates (based on average divergence) each class from all other classes. The software employs an exhaustive search technique, and computer time is not prohibitive. A typical task to select the best 4 of 22 channels for 12 classes takes 9 minutes on a Univac 1108 computer.

  13. A Comparison of Different Dimensionality Reduction and Feature Selection Methods for Single Trial ERP Detection

    Science.gov (United States)

    Lan, Tian; Erdogmus, Deniz; Black, Lois; Van Santen, Jan

    2014-01-01

    Dimensionality reduction and feature selection is an important aspect of electroencephalography based event related potential detection systems such as brain computer interfaces. In our study, a predefined sequence of letters was presented to subjects in a Rapid Serial Visual Presentation (RSVP) paradigm. EEG data were collected and analyzed offline. A linear discriminant analysis (LDA) classifier was designed as the ERP (Event Related Potential) detector for its simplicity. Different dimensionality reduction and feature selection methods were applied and compared in a greedy wrapper framework. Experimental results showed that PCA with the first 10 principal components for each channel performed best and could be used in both online and offline systems. PMID:21097171

  14. A comparison of different dimensionality reduction and feature selection methods for single trial ERP detection.

    Science.gov (United States)

    Lan, Tian; Erdogmus, Deniz; Black, Lois; Van Santen, Jan

    2010-01-01

    Dimensionality reduction and feature selection is an important aspect of electroencephalography based event related potential detection systems such as brain computer interfaces. In our study, a predefined sequence of letters was presented to subjects in a Rapid Serial Visual Presentation (RSVP) paradigm. EEG data were collected and analyzed offline. A linear discriminant analysis (LDA) classifier was designed as the ERP (Event Related Potential) detector for its simplicity. Different dimensionality reduction and feature selection methods were applied and compared in a greedy wrapper framework. Experimental results showed that PCA with the first 10 principal components for each channel performed best and could be used in both online and offline systems.

  15. Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors

    Directory of Open Access Journals (Sweden)

    Hong Zhao

    2013-01-01

    Full Text Available Feature selection is an essential process in data mining applications since it reduces a model’s complexity. However, feature selection with various types of costs is still a new research topic. In this paper, we study the cost-sensitive feature selection problem of numeric data with measurement errors. The major contributions of this paper are fourfold. First, a new data model is built to address test costs and misclassification costs as well as error boundaries. It is distinguished from the existing models mainly on the error boundaries. Second, a covering-based rough set model with normal distribution measurement errors is constructed. With this model, coverings are constructed from data rather than assigned by users. Third, a new cost-sensitive feature selection problem is defined on this model. It is more realistic than the existing feature selection problems. Fourth, both backtracking and heuristic algorithms are proposed to deal with the new problem. Experimental results show the efficiency of the pruning techniques for the backtracking algorithm and the effectiveness of the heuristic algorithm. This study is a step toward realistic applications of the cost-sensitive learning.

  16. Familiarity facilitates feature-based face processing

    Science.gov (United States)

    Wheeler, Kelsey G.; Cipolli, Carlo; Gobbini, M. Ida

    2017-01-01

    Recognition of personally familiar faces is remarkably efficient, effortless and robust. We asked if feature-based face processing facilitates detection of familiar faces by testing the effect of face inversion on a visual search task for familiar and unfamiliar faces. Because face inversion disrupts configural and holistic face processing, we hypothesized that inversion would diminish the familiarity advantage to the extent that it is mediated by such processing. Subjects detected personally familiar and stranger target faces in arrays of two, four, or six face images. Subjects showed significant facilitation of personally familiar face detection for both upright and inverted faces. The effect of familiarity on target absent trials, which involved only rejection of unfamiliar face distractors, suggests that familiarity facilitates rejection of unfamiliar distractors as well as detection of familiar targets. The preserved familiarity effect for inverted faces suggests that facilitation of face detection afforded by familiarity reflects mostly feature-based processes. PMID:28582439

  17. A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.

    Science.gov (United States)

    Gunavathi, Chellamuthu; Premalatha, Kandasamy

    2014-01-01

    Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.

  18. FEATURE EXTRACTION FOR EMG BASED PROSTHESES CONTROL

    Directory of Open Access Journals (Sweden)

    R. Aishwarya

    2013-01-01

    Full Text Available The control of prosthetic limb would be more effective if it is based on Surface Electromyogram (SEMG signals from remnant muscles. The analysis of SEMG signals depend on a number of factors, such as amplitude as well as time- and frequency-domain properties. Time series analysis using Auto Regressive (AR model and Mean frequency which is tolerant to white Gaussian noise are used as feature extraction techniques. EMG Histogram is used as another feature vector that was seen to give more distinct classification. The work was done with SEMG dataset obtained from the NINAPRO DATABASE, a resource for bio robotics community. Eight classes of hand movements hand open, hand close, Wrist extension, Wrist flexion, Pointing index, Ulnar deviation, Thumbs up, Thumb opposite to little finger are taken into consideration and feature vectors are extracted. The feature vectors can be given to an artificial neural network for further classification in controlling the prosthetic arm which is not dealt in this paper.

  19. Memory Driven Feature-Based Design

    Science.gov (United States)

    1993-01-01

    memory , measures of similarity, and the question of how to manage remembering and recollecting on the basis of similarity [18]. There is a large body...is also influenced by the Dynamic Memory ideas of Schank [20], by the episodic memory ideas of Kolodner [21], and by the Case-based planning approach...AD-A264 697 WL-TR-93-4021 MEMORY DRIVEN FEATURE-BASED DESIGN DTIC Y.H. PAO AY 11993 F.L. MERAT G.M. RADACK CASE WESTERN RESERVE UNIVERSITY ELECTRICAL

  20. Modified kernel-based nonlinear feature extraction.

    Energy Technology Data Exchange (ETDEWEB)

    Ma, J. (Junshui); Perkins, S. J. (Simon J.); Theiler, J. P. (James P.); Ahalt, S. (Stanley)

    2002-01-01

    Feature Extraction (FE) techniques are widely used in many applications to pre-process data in order to reduce the complexity of subsequent processes. A group of Kernel-based nonlinear FE ( H E ) algorithms has attracted much attention due to their high performance. However, a serious limitation that is inherent in these algorithms -- the maximal number of features extracted by them is limited by the number of classes involved -- dramatically degrades their flexibility. Here we propose a modified version of those KFE algorithms (MKFE), This algorithm is developed from a special form of scatter-matrix, whose rank is not determined by the number of classes involved, and thus breaks the inherent limitation in those KFE algorithms. Experimental results suggest that MKFE algorithm is .especially useful when the training set is small.

  1. A Local Asynchronous Distributed Privacy Preserving Feature Selection Algorithm for Large Peer-to-Peer Networks

    Data.gov (United States)

    National Aeronautics and Space Administration — In this paper we develop a local distributed privacy preserving algorithm for feature selection in a large peer-to-peer environment. Feature selection is often used...

  2. A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.

    Directory of Open Access Journals (Sweden)

    Brittany Baur

    Full Text Available DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.

  3. A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.

    Science.gov (United States)

    Baur, Brittany; Bozdag, Serdar

    2016-01-01

    DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.

  4. A counter example in linear feature selection theory

    Science.gov (United States)

    Brown, D. R.; Omalley, M. J.

    1975-01-01

    The linear feature selection problem in multi-class pattern recognition is described as that of linearly transforming statistical information from n-dimensional (real Euclidean) space into k-dimensional space, while requiring that average interclass divergence in the transformed space decrease as little as possible. Divergence is the expected interclass divergence derived from Hajek two-class divergence; it is known that there always exists a k x n matrix B such that the transformation determined by B maximizes the divergence in k-dimensional space. It is known that, if Q is any k x k invertible matrix, and B is as defined above, then QB again maximizes the divergence in k-space. It is shown that the converse of this result is false: two matrices exist, B sub 1 and B sub 2, each of which maximizes transformed divergence, which are not related in the fashion B sub 2 = QB sub 1 for any k x k matrix Q.

  5. Qualified matching feature collection for feature point-based copy-move forgery detection

    Science.gov (United States)

    Yu, Liyang; Han, Qi; Niu, Xiamu

    2015-03-01

    The feature matching step plays a critical role during the copy-move forgery detection procedure. However, when several highly similar features simultaneously exist in the feature space, current feature matching methods will miss a considerable number of genuine matching feature pairs. To this end, we propose a clustering-based method to collect qualified matching features for the feature point-based methods. The proposed method can collect far more genuine matching features than existing methods do, and thus significantly improve the detection performance, especially for multiple pasting cases. Experimental results confirm the efficacy of the proposed method.

  6. Feature selection and multi-kernel learning for adaptive graph regularized nonnegative matrix factorization

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-09-20

    Nonnegative matrix factorization (NMF), a popular part-based representation technique, does not capture the intrinsic local geometric structure of the data space. Graph regularized NMF (GNMF) was recently proposed to avoid this limitation by regularizing NMF with a nearest neighbor graph constructed from the input data set. However, GNMF has two main bottlenecks. First, using the original feature space directly to construct the graph is not necessarily optimal because of the noisy and irrelevant features and nonlinear distributions of data samples. Second, one possible way to handle the nonlinear distribution of data samples is by kernel embedding. However, it is often difficult to choose the most suitable kernel. To solve these bottlenecks, we propose two novel graph-regularized NMF methods, AGNMFFS and AGNMFMK, by introducing feature selection and multiple-kernel learning to the graph regularized NMF, respectively. Instead of using a fixed graph as in GNMF, the two proposed methods learn the nearest neighbor graph that is adaptive to the selected features and learned multiple kernels, respectively. For each method, we propose a unified objective function to conduct feature selection/multi-kernel learning, NMF and adaptive graph regularization simultaneously. We further develop two iterative algorithms to solve the two optimization problems. Experimental results on two challenging pattern classification tasks demonstrate that the proposed methods significantly outperform state-of-the-art data representation methods.

  7. Feature selection and multi-kernel learning for sparse representation on a manifold

    KAUST Repository

    Wang, Jim Jing-Yan

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao etal. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods. © 2013 Elsevier Ltd.

  8. Feature selection and multi-kernel learning for sparse representation on a manifold.

    Science.gov (United States)

    Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin

    2014-03-01

    Sparse representation has been widely studied as a part-based data representation method and applied in many scientific and engineering fields, such as bioinformatics and medical imaging. It seeks to represent a data sample as a sparse linear combination of some basic items in a dictionary. Gao et al. (2013) recently proposed Laplacian sparse coding by regularizing the sparse codes with an affinity graph. However, due to the noisy features and nonlinear distribution of the data samples, the affinity graph constructed directly from the original feature space is not necessarily a reliable reflection of the intrinsic manifold of the data samples. To overcome this problem, we integrate feature selection and multiple kernel learning into the sparse coding on the manifold. To this end, unified objectives are defined for feature selection, multiple kernel learning, sparse coding, and graph regularization. By optimizing the objective functions iteratively, we develop novel data representation algorithms with feature selection and multiple kernel learning respectively. Experimental results on two challenging tasks, N-linked glycosylation prediction and mammogram retrieval, demonstrate that the proposed algorithms outperform the traditional sparse coding methods. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. On the use of feature selection to improve the detection of sea oil spills in SAR images

    Science.gov (United States)

    Mera, David; Bolon-Canedo, Veronica; Cotos, J. M.; Alonso-Betanzos, Amparo

    2017-03-01

    Fast and effective oil spill detection systems are crucial to ensure a proper response to environmental emergencies caused by hydrocarbon pollution on the ocean's surface. Typically, these systems uncover not only oil spills, but also a high number of look-alikes. The feature extraction is a critical and computationally intensive phase where each detected dark spot is independently examined. Traditionally, detection systems use an arbitrary set of features to discriminate between oil spills and look-alikes phenomena. However, Feature Selection (FS) methods based on Machine Learning (ML) have proved to be very useful in real domains for enhancing the generalization capabilities of the classifiers, while discarding the existing irrelevant features. In this work, we present a generic and systematic approach, based on FS methods, for choosing a concise and relevant set of features to improve the oil spill detection systems. We have compared five FS methods: Correlation-based feature selection (CFS), Consistency-based filter, Information Gain, ReliefF and Recursive Feature Elimination for Support Vector Machine (SVM-RFE). They were applied on a 141-input vector composed of features from a collection of outstanding studies. Selected features were validated via a Support Vector Machine (SVM) classifier and the results were compared with previous works. Test experiments revealed that the classifier trained with the 6-input feature vector proposed by SVM-RFE achieved the best accuracy and Cohen's kappa coefficient (87.1% and 74.06% respectively). This is a smaller feature combination with similar or even better classification accuracy than previous works. The presented finding allows to speed up the feature extraction phase without reducing the classifier accuracy. Experiments also confirmed the significance of the geometrical features since 75.0% of the different features selected by the applied FS methods as well as 66.67% of the proposed 6-input feature vector belong to

  10. Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier.

    Science.gov (United States)

    Paul, Desbordes; Su, Ruan; Romain, Modzelewski; Sébastien, Vauclin; Pierre, Vera; Isabelle, Gardin

    2017-09-01

    The outcome prediction of patients can greatly help to personalize cancer treatment. A large amount of quantitative features (clinical exams, imaging, …) are potentially useful to assess the patient outcome. The challenge is to choose the most predictive subset of features. In this paper, we propose a new feature selection strategy called GARF (genetic algorithm based on random forest) extracted from positron emission tomography (PET) images and clinical data. The most relevant features, predictive of the therapeutic response or which are prognoses of the patient survival 3 years after the end of treatment, were selected using GARF on a cohort of 65 patients with a local advanced oesophageal cancer eligible for chemo-radiation therapy. The most relevant predictive results were obtained with a subset of 9 features leading to a random forest misclassification rate of 18±4% and an areas under the of receiver operating characteristic (ROC) curves (AUC) of 0.823±0.032. The most relevant prognostic results were obtained with 8 features leading to an error rate of 20±7% and an AUC of 0.750±0.108. Both predictive and prognostic results show better performances using GARF than using 4 other studied methods. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Game Theoretic Approach for Systematic Feature Selection; Application in False Alarm Detection in Intensive Care Units

    Directory of Open Access Journals (Sweden)

    Fatemeh Afghah

    2018-03-01

    Full Text Available Intensive Care Units (ICUs are equipped with many sophisticated sensors and monitoring devices to provide the highest quality of care for critically ill patients. However, these devices might generate false alarms that reduce standard of care and result in desensitization of caregivers to alarms. Therefore, reducing the number of false alarms is of great importance. Many approaches such as signal processing and machine learning, and designing more accurate sensors have been developed for this purpose. However, the significant intrinsic correlation among the extracted features from different sensors has been mostly overlooked. A majority of current data mining techniques fail to capture such correlation among the collected signals from different sensors that limits their alarm recognition capabilities. Here, we propose a novel information-theoretic predictive modeling technique based on the idea of coalition game theory to enhance the accuracy of false alarm detection in ICUs by accounting for the synergistic power of signal attributes in the feature selection stage. This approach brings together techniques from information theory and game theory to account for inter-features mutual information in determining the most correlated predictors with respect to false alarm by calculating Banzhaf power of each feature. The numerical results show that the proposed method can enhance classification accuracy and improve the area under the ROC (receiver operating characteristic curve compared to other feature selection techniques, when integrated in classifiers such as Bayes-Net that consider inter-features dependencies.

  12. Feature Selection for Better Identification of Subtypes of Guillain-Barré Syndrome

    Directory of Open Access Journals (Sweden)

    José Hernández-Torruco

    2014-01-01

    Full Text Available Guillain-Barré syndrome (GBS is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS, chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.

  13. Feature-Based Nonlocal Polarimetric SAR Filtering

    Directory of Open Access Journals (Sweden)

    Xiaoli Xing

    2017-10-01

    Full Text Available Polarimetric synthetic aperture radar (PolSAR images are inherently contaminated by multiplicative speckle noise, which complicates the image interpretation and image analyses. To reduce the speckle effect, several adaptive speckle filters have been developed based on the weighted average of the similarity measures commonly depending on the model or probability distribution, which are often affected by the distribution parameters and modeling texture components. In this paper, a novel filtering method introduces the coefficient of variance ( CV and Pauli basis (PB to measure the similarity, and the two features are combined with the framework of the nonlocal mean filtering. The CV is used to describe the complexity of various scenes and distinguish the scene heterogeneity; moreover, the Pauli basis is able to express the polarimetric information in PolSAR image processing. This proposed filtering combines the CV and Pauli basis to improve the estimation accuracy of the similarity weights. Then, the similarity of the features is deduced according to the test statistic. Subsequently, the filtering is proceeded by using the nonlocal weighted estimation. The performance of the proposed filter is tested with the simulated images and real PolSAR images, which are acquired by AIRSAR system and ESAR system. The qualitative and quantitative experiments indicate the validity of the proposed method by comparing with the widely-used despeckling methods.

  14. Image Recommendation Algorithm Using Feature-Based Collaborative Filtering

    Science.gov (United States)

    Kim, Deok-Hwan

    As the multimedia contents market continues its rapid expansion, the amount of image contents used in mobile phone services, digital libraries, and catalog service is increasing remarkably. In spite of this rapid growth, users experience high levels of frustration when searching for the desired image. Even though new images are profitable to the service providers, traditional collaborative filtering methods cannot recommend them. To solve this problem, in this paper, we propose feature-based collaborative filtering (FBCF) method to reflect the user's most recent preference by representing his purchase sequence in the visual feature space. The proposed approach represents the images that have been purchased in the past as the feature clusters in the multi-dimensional feature space and then selects neighbors by using an inter-cluster distance function between their feature clusters. Various experiments using real image data demonstrate that the proposed approach provides a higher quality recommendation and better performance than do typical collaborative filtering and content-based filtering techniques.

  15. Underwater Object Segmentation Based on Optical Features

    Directory of Open Access Journals (Sweden)

    Zhe Chen

    2018-01-01

    Full Text Available Underwater optical environments are seriously affected by various optical inputs, such as artificial light, sky light, and ambient scattered light. The latter two can block underwater object segmentation tasks, since they inhibit the emergence of objects of interest and distort image information, while artificial light can contribute to segmentation. Artificial light often focuses on the object of interest, and, therefore, we can initially identify the region of target objects if the collimation of artificial light is recognized. Based on this concept, we propose an optical feature extraction, calculation, and decision method to identify the collimated region of artificial light as a candidate object region. Then, the second phase employs a level set method to segment the objects of interest within the candidate region. This two-phase structure largely removes background noise and highlights the outline of underwater objects. We test the performance of the method with diverse underwater datasets, demonstrating that it outperforms previous methods.

  16. Feature-driven model-based segmentation

    Science.gov (United States)

    Qazi, Arish A.; Kim, John; Jaffray, David A.; Pekar, Vladimir

    2011-03-01

    The accurate delineation of anatomical structures is required in many medical image analysis applications. One example is radiation therapy planning (RTP), where traditional manual delineation is tedious, labor intensive, and can require hours of clinician's valuable time. Majority of automated segmentation methods in RTP belong to either model-based or atlas-based approaches. One substantial limitation of model-based segmentation is that its accuracy may be restricted by the uncertainties in image content, specifically when segmenting low-contrast anatomical structures, e.g. soft tissue organs in computed tomography images. In this paper, we introduce a non-parametric feature enhancement filter which replaces raw intensity image data by a high level probabilistic map which guides the deformable model to reliably segment low-contrast regions. The method is evaluated by segmenting the submandibular and parotid glands in the head and neck region and comparing the results to manual segmentations in terms of the volume overlap. Quantitative results show that we are in overall good agreement with expert segmentations, achieving volume overlap of up to 80%. Qualitatively, we demonstrate that we are able to segment low-contrast regions, which otherwise are difficult to delineate with deformable models relying on distinct object boundaries from the original image data.

  17. Suitable features selection for monitoring thermal condition of electrical equipment using infrared thermography

    Science.gov (United States)

    Huda, A. S. N.; Taib, S.

    2013-11-01

    Monitoring the thermal condition of electrical equipment is necessary for maintaining the reliability of electrical system. The degradation of electrical equipment can cause excessive overheating, which can lead to the eventual failure of the equipment. Additionally, failure of equipment requires a lot of maintenance cost, manpower and can also be catastrophic- causing injuries or even deaths. Therefore, the recognition processof equipment conditions as normal and defective is an essential step towards maintaining reliability and stability of the system. The study introduces infrared thermography based condition monitoring of electrical equipment. Manual analysis of thermal image for detecting defects and classifying the status of equipment take a lot of time, efforts and can also lead to incorrect diagnosis results. An intelligent system that can separate the equipment automatically could help to overcome these problems. This paper discusses an intelligent classification system for the conditions of equipment using neural networks. Three sets of features namely first order histogram based statistical, grey level co-occurrence matrix and component based intensity features are extracted by image analysis, which are used as input data for the neural networks. The multilayered perceptron networks are trained using four different training algorithms namely Resilient back propagation, Bayesian Regulazation, Levenberg-Marquardt and Scale conjugate gradient. The experimental results show that the component based intensity features perform better compared to other two sets of features. Finally, after selecting the best features, multilayered perceptron network trained using Levenberg-Marquardt algorithm achieved the best results to classify the conditions of electrical equipment.

  18. Text feature extraction based on deep learning: a review.

    Science.gov (United States)

    Liang, Hong; Sun, Xiao; Sun, Yunlei; Gao, Yuan

    2017-01-01

    Selection of text feature item is a basic and important matter for text mining and information retrieval. Traditional methods of feature extraction require handcrafted features. To hand-design, an effective feature is a lengthy process, but aiming at new applications, deep learning enables to acquire new effective feature representation from training data. As a new feature extraction method, deep learning has made achievements in text mining. The major difference between deep learning and conventional methods is that deep learning automatically learns features from big data, instead of adopting handcrafted features, which mainly depends on priori knowledge of designers and is highly impossible to take the advantage of big data. Deep learning can automatically learn feature representation from big data, including millions of parameters. This thesis outlines the common methods used in text feature extraction first, and then expands frequently used deep learning methods in text feature extraction and its applications, and forecasts the application of deep learning in feature extraction.

  19. Localization of neural efficiency of the mathematically gifted brain through a feature subset selection method.

    Science.gov (United States)

    Zhang, Li; Gan, John Q; Wang, Haixian

    2015-10-01

    Based on the neural efficiency hypothesis and task-induced EEG gamma-band response (GBR), this study investigated the brain regions where neural resource could be most efficiently recruited by the math-gifted adolescents in response to varying cognitive demands. In this experiment, various GBR-based mental states were generated with three factors (level of mathematical ability, task complexity, and short-term learning) modulating the level of neural activation. A feature subset selection method based on the sequential forward floating search algorithm was used to identify an "optimal" combination of EEG channel locations, where the corresponding GBR feature subset could obtain the highest accuracy in discriminating pairwise mental states influenced by each experiment factor. The integrative results from multi-factor selections suggest that the right-lateral fronto-parietal system is highly involved in neural efficiency of the math-gifted brain, primarily including the bilateral superior frontal, right inferior frontal, right-lateral central and right temporal regions. By means of the localization method based on single-trial classification of mental states, new GBR features and EEG channel-based brain regions related to mathematical giftedness were identified, which could be useful for the brain function improvement of children/adolescents in mathematical learning through brain-computer interface systems.

  20. Biosensor method and system based on feature vector extraction

    Science.gov (United States)

    Greenbaum, Elias [Knoxville, TN; Rodriguez, Jr., Miguel; Qi, Hairong [Knoxville, TN; Wang, Xiaoling [San Jose, CA

    2012-04-17

    A method of biosensor-based detection of toxins comprises the steps of providing at least one time-dependent control signal generated by a biosensor in a gas or liquid medium, and obtaining a time-dependent biosensor signal from the biosensor in the gas or liquid medium to be monitored or analyzed for the presence of one or more toxins selected from chemical, biological or radiological agents. The time-dependent biosensor signal is processed to obtain a plurality of feature vectors using at least one of amplitude statistics and a time-frequency analysis. At least one parameter relating to toxicity of the gas or liquid medium is then determined from the feature vectors based on reference to the control signal.

  1. Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction

    Directory of Open Access Journals (Sweden)

    Zhu Qifu

    2011-09-01

    Full Text Available Abstract Background The widely used k top scoring pair (k-TSP algorithm is a simple yet powerful parameter-free classifier. It owes its success in many cancer microarray datasets to an effective feature selection algorithm that is based on relative expression ordering of gene pairs. However, its general robustness does not extend to some difficult datasets, such as those involving cancer outcome prediction, which may be due to the relatively simple voting scheme used by the classifier. We believe that the performance can be enhanced by separating its effective feature selection component and combining it with a powerful classifier such as the support vector machine (SVM. More generally the top scoring pairs generated by the k-TSP ranking algorithm can be used as a dimensionally reduced subspace for other machine learning classifiers. Results We developed an approach integrating the k-TSP ranking algorithm (TSP with other machine learning methods, allowing combination of the computationally efficient, multivariate feature ranking of k-TSP with multivariate classifiers such as SVM. We evaluated this hybrid scheme (k-TSP+SVM in a range of simulated datasets with known data structures. As compared with other feature selection methods, such as a univariate method similar to Fisher's discriminant criterion (Fisher, or a recursive feature elimination embedded in SVM (RFE, TSP is increasingly more effective than the other two methods as the informative genes become progressively more correlated, which is demonstrated both in terms of the classification performance and the ability to recover true informative genes. We also applied this hybrid scheme to four cancer prognosis datasets, in which k-TSP+SVM outperforms k-TSP classifier in all datasets, and achieves either comparable or superior performance to that using SVM alone. In concurrence with what is observed in simulation, TSP appears to be a better feature selector than Fisher and RFE in some of the

  2. Annotation-based feature extraction from sets of SBML models.

    Science.gov (United States)

    Alm, Rebekka; Waltemath, Dagmar; Wolfien, Markus; Wolkenhauer, Olaf; Henkel, Ron

    2015-01-01

    Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models. In this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate. Annotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison.

  3. A scale space approach for unsupervised feature selection in mass spectra classification for ovarian cancer detection.

    Science.gov (United States)

    Ceccarelli, Michele; d'Acierno, Antonio; Facchiano, Angelo

    2009-10-15

    Mass spectrometry spectra, widely used in proteomics studies as a screening tool for protein profiling and to detect discriminatory signals, are high dimensional data. A large number of local maxima (a.k.a. peaks) have to be analyzed as part of computational pipelines aimed at the realization of efficient predictive and screening protocols. With this kind of data dimensions and samples size the risk of over-fitting and selection bias is pervasive. Therefore the development of bio-informatics methods based on unsupervised feature extraction can lead to general tools which can be applied to several fields of predictive proteomics. We propose a method for feature selection and extraction grounded on the theory of multi-scale spaces for high resolution spectra derived from analysis of serum. Then we use support vector machines for classification. In particular we use a database containing 216 samples spectra divided in 115 cancer and 91 control samples. The overall accuracy averaged over a large cross validation study is 98.18. The area under the ROC curve of the best selected model is 0.9962. We improved previous known results on the problem on the same data, with the advantage that the proposed method has an unsupervised feature selection phase. All the developed code, as MATLAB scripts, can be downloaded from http://medeaserver.isa.cnr.it/dacierno/spectracode.htm.

  4. EMG feature assessment for myoelectric pattern recognition and channel selection: a study with incomplete spinal cord injury.

    Science.gov (United States)

    Liu, Jie; Li, Xiaoyan; Li, Guanglin; Zhou, Ping

    2014-07-01

    Myoelectric pattern recognition with a large number of electromyogram (EMG) channels provides an approach to assessing motor control information available from the recorded muscles. In order to develop a practical myoelectric control system, a feature dependent channel reduction method was developed in this study to determine a small number of EMG channels for myoelectric pattern recognition analysis. The method selects appropriate raw EMG features for classification of different movements, using the minimum Redundancy Maximum Relevance (mRMR) and the Markov random field (MRF) methods to rank a large number of EMG features, respectively. A k-nearest neighbor (KNN) classifier was used to evaluate the performance of the selected features in terms of classification accuracy. The method was tested using 57 channels' surface EMG signals recorded from forearm and hand muscles of individuals with incomplete spinal cord injury (SCI). Our results demonstrate that appropriate selection of a small number of raw EMG features from different recording channels resulted in similar high classification accuracies as achieved by using all the EMG channels or features. Compared with the conventional sequential forward selection (SFS) method, the feature dependent method does not require repeated classifier implementation. It can effectively reduce redundant information not only cross different channels, but also cross different features in the same channel. Such hybrid feature-channel selection from a large number of EMG recording channels can reduce computational cost for implementation of a myoelectric pattern recognition based control system. Copyright © 2014 IPEM. Published by Elsevier Ltd. All rights reserved.

  5. Statistical feature extraction based iris recognition system

    Indian Academy of Sciences (India)

    Atul Bansal

    features obtained from one's face [1], finger [2], voice [3] and/or iris [4, 5]. Iris recognition system is widely used in high security areas. A number of researchers have proposed various algorithms for feature extraction. A little work [6,. 7] however, has been reported using statistical techniques directly on pixel values in order to ...

  6. A novel feature extraction approach for microarray data based on multi-algorithm fusion.

    Science.gov (United States)

    Jiang, Zhu; Xu, Rong

    2015-01-01

    Feature extraction is one of the most important and effective method to reduce dimension in data mining, with emerging of high dimensional data such as microarray gene expression data. Feature extraction for gene selection, mainly serves two purposes. One is to identify certain disease-related genes. The other is to find a compact set of discriminative genes to build a pattern classifier with reduced complexity and improved generalization capabilities. Depending on the purpose of gene selection, two types of feature extraction algorithms including ranking-based feature extraction and set-based feature extraction are employed in microarray gene expression data analysis. In ranking-based feature extraction, features are evaluated on an individual basis, without considering inter-relationship between features in general, while set-based feature extraction evaluates features based on their role in a feature set by taking into account dependency between features. Just as learning methods, feature extraction has a problem in its generalization ability, which is robustness. However, the issue of robustness is often overlooked in feature extraction. In order to improve the accuracy and robustness of feature extraction for microarray data, a novel approach based on multi-algorithm fusion is proposed. By fusing different types of feature extraction algorithms to select the feature from the samples set, the proposed approach is able to improve feature extraction performance. The new approach is tested against gene expression dataset including Colon cancer data, CNS data, DLBCL data, and Leukemia data. The testing results show that the performance of this algorithm is better than existing solutions.

  7. UNLABELED SELECTED SAMPLES IN FEATURE EXTRACTION FOR CLASSIFICATION OF HYPERSPECTRAL IMAGES WITH LIMITED TRAINING SAMPLES

    Directory of Open Access Journals (Sweden)

    A. Kianisarkaleh

    2015-12-01

    Full Text Available Feature extraction plays a key role in hyperspectral images classification. Using unlabeled samples, often unlimitedly available, unsupervised and semisupervised feature extraction methods show better performance when limited number of training samples exists. This paper illustrates the importance of selecting appropriate unlabeled samples that used in feature extraction methods. Also proposes a new method for unlabeled samples selection using spectral and spatial information. The proposed method has four parts including: PCA, prior classification, posterior classification and sample selection. As hyperspectral image passes these parts, selected unlabeled samples can be used in arbitrary feature extraction methods. The effectiveness of the proposed unlabeled selected samples in unsupervised and semisupervised feature extraction is demonstrated using two real hyperspectral datasets. Results show that through selecting appropriate unlabeled samples, the proposed method can improve the performance of feature extraction methods and increase classification accuracy.

  8. Automatic selective feature retention in patient specific elastic surface registration

    CSIR Research Space (South Africa)

    Jansen van Rensburg, GJ

    2011-01-01

    Full Text Available . An intelligent mesh morphing strategy where dissimilar feature surfaces can be extracted automatically also greatly reduces the amount of user input required. REFERENCES [1] R. Bryan, P.S. Mohan, A. Hopkins, F. Galloway, M. Taylor and P. Nair, Statitical...

  9. IMAGE RETIEVAL COLOR, SHAPE AND TEXTURE FEATURES USING CONTENT BASED

    OpenAIRE

    K. NARESH BABU,; SAKE. POTHALAIAH; Dr.K ASHOK BABU

    2010-01-01

    Content-based image retrieval (CBIR) is an important research area for manipulating large amount of image databases and archives. Extraction of invariant features is the basis of CBIR. This paper focuses on the problem of texture, color& shape feature extractions. Using just one feature information for comparing images may cause inaccuracy than compared with using more than one features. Therefore many image retrieval system use many feature information like color, shape and other features. W...

  10. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data

    Directory of Open Access Journals (Sweden)

    Harris Lyndsay N

    2006-04-01

    Full Text Available Abstract Background Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. Results We developed a recursive support vector machine (R-SVM algorithm to select important genes/biomarkers for the classification of noisy data. We compared its performance to a similar, state-of-the-art method (SVM recursive feature elimination or SVM-RFE, paying special attention to the ability of recovering the true informative genes/biomarkers and the robustness to outliers in the data. Simulation experiments show that a 5 %-~20 % improvement over SVM-RFE can be achieved regard to these properties. The SVM-based methods are also compared with a conventional univariate method and their respective strengths and weaknesses are discussed. R-SVM was applied to two sets of SELDI-TOF-MS proteomics data, one from a human breast cancer study and the other from a study on rat liver cirrhosis. Important biomarkers found by the algorithm were validated by follow-up biological experiments. Conclusion The proposed R-SVM method is suitable for analyzing noisy high-throughput proteomics and microarray data and it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features. The multivariate SVM-based method outperforms the univariate method in the classification performance, but univariate methods can reveal more of the differentially expressed features especially when there are correlations between the features.

  11. Feature and Model Selection in Feedforward Neural Networks

    Science.gov (United States)

    1994-06-01

    output of middle nodej - M is the number of feature inputs -wij is the weight from input node i to middle node j - e0 is the input layer bias term, and is...is the updated weight from input i to middle node j - (wtuj) is the old weight from from input i to middle nodej - q7 is the step size - 62 = (d

  12. Space-based RF signal classification using adaptive wavelet features

    Energy Technology Data Exchange (ETDEWEB)

    Caffrey, M.; Briles, S.

    1995-04-01

    RF signals are dispersed in frequency as they propagate through the ionosphere. For wide-band signals, this results in nonlinearly- chirped-frequency, transient signals in the VHF portion of the spectrum. This ionospheric dispersion provide a means of discriminating wide-band transients from other signals (e.g., continuous-wave carriers, burst communications, chirped-radar signals, etc.). The transient nature of these dispersed signals makes them candidates for wavelet feature selection. Rather than choosing a wavelet ad hoc, we adaptively compute an optimal mother wavelet via a neural network. Gaussian weighted, linear frequency modulate (GLFM) wavelets are linearly combined by the network to generate our application specific mother wavelet, which is optimized for its capacity to select features that discriminate between the dispersed signals and clutter (e.g., multiple continuous-wave carriers), not for its ability to represent the dispersed signal. The resulting mother wavelet is then used to extract features for a neutral network classifier. The performance of the adaptive wavelet classifier is the compared to an FFT based neural network classifier.

  13. Effective automated feature construction and selection for classification of biological sequences.

    Directory of Open Access Journals (Sweden)

    Uday Kamath

    Full Text Available Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features.We present an algorithmic framework (EFFECT for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not.To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification

  14. Advances in feature selection methods for hyperspectral image processing in food industry applications: a review.

    Science.gov (United States)

    Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An

    2015-01-01

    There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.

  15. The effect of feature selection methods on computer-aided detection of masses in mammograms.

    Science.gov (United States)

    Hupse, Rianne; Karssemeijer, Nico

    2010-05-21

    In computer-aided diagnosis (CAD) research, feature selection methods are often used to improve generalization performance of classifiers and shorten computation times. In an application that detects malignant masses in mammograms, we investigated the effect of using a selection criterion that is similar to the final performance measure we are optimizing, namely the mean sensitivity of the system in a predefined range of the free-response receiver operating characteristics (FROC). To obtain the generalization performance of the selected feature subsets, a cross validation procedure was performed on a dataset containing 351 abnormal and 7879 normal regions, each region providing a set of 71 mass features. The same number of noise features, not containing any information, were added to investigate the ability of the feature selection algorithms to distinguish between useful and non-useful features. It was found that significantly higher performances were obtained using feature sets selected by the general test statistic Wilks' lambda than using feature sets selected by the more specific FROC measure. Feature selection leads to better performance when compared to a system in which all features were used.

  16. Optimal Feature Selection in High-Dimensional Discriminant Analysis.

    Science.gov (United States)

    Kolar, Mladen; Liu, Han

    2015-02-01

    We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the ℓ 2 convergence results to the discriminative rule. However, sharp theoretical analysis for the variable selection performance of these procedures have not been established, even though model interpretation is of fundamental importance in scientific data analysis. This paper bridges the gap by providing sharp sufficient conditions for consistent variable selection using the sparse discriminant analysis (Mai et al., 2012). Through careful analysis, we establish rates of convergence that are significantly faster than the best known results and admit an optimal scaling of the sample size n , dimensionality p , and sparsity level s in the high-dimensional setting. Sufficient conditions are complemented by the necessary information theoretic limits on the variable selection problem in the context of high-dimensional discriminant analysis. Exploiting a numerical equivalence result, our method also establish the optimal results for the ROAD estimator (Fan et al., 2012) and the sparse optimal scaling estimator (Clemmensen et al., 2011). Furthermore, we analyze an exhaustive search procedure, whose performance serves as a benchmark, and show that it is variable selection consistent under weaker conditions. Extensive simulations demonstrating the sharpness of the bounds are also provided.

  17. Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR.

    Science.gov (United States)

    MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali

    2017-01-01

    Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms.

  18. Feature selection of seismic waveforms for long period event detection at Cotopaxi Volcano

    Science.gov (United States)

    Lara-Cueva, R. A.; Benítez, D. S.; Carrera, E. V.; Ruiz, M.; Rojo-Álvarez, J. L.

    2016-04-01

    Volcano Early Warning Systems (VEWS) have become a research topic in order to preserve human lives and material losses. In this setting, event detection criteria based on classification using machine learning techniques have proven useful, and a number of systems have been proposed in the literature. However, to the best of our knowledge, no comprehensive and principled study has been conducted to compare the influence of the many different sets of possible features that have been used as input spaces in previous works. We present an automatic recognition system of volcano seismicity, by considering feature extraction, event classification, and subsequent event detection, in order to reduce the processing time as a first step towards a high reliability automatic detection system in real-time. We compiled and extracted a comprehensive set of temporal, moving average, spectral, and scale-domain features, for separating long period seismic events from background noise. We benchmarked two usual kinds of feature selection techniques, namely, filter (mutual information and statistical dependence) and embedded (cross-validation and pruning), each of them by using suitable and appropriate classification algorithms such as k Nearest Neighbors (k-NN) and Decision Trees (DT). We applied this approach to the seismicity presented at Cotopaxi Volcano in Ecuador during 2009 and 2010. The best results were obtained by using a 15 s segmentation window, feature matrix in the frequency domain, and DT classifier, yielding 99% of detection accuracy and sensitivity. Selected features and their interpretation were consistent among different input spaces, in simple terms of amplitude and spectral content. Our study provides the framework for an event detection system with high accuracy and reduced computational requirements.

  19. Classification methodology and feature selection to assist fault location in power distribution systems

    Directory of Open Access Journals (Sweden)

    Juan José Mora Flórez

    2008-01-01

    Full Text Available A classification methodology based on Support Vector Machines (SVM is proposed to locate the faulted zone in power distribution networks. The goal is to reduce the multiple-estimation problem inherent in those methods that use single end measures (in the substation to estimate the fault location in radial systems. A selection of features or descriptors obtained from voltages and currents measured in the substation are analyzed and used as input of the SVM classifier. Performance of the fault locator having several combinations of these features has been evaluated according to its capability to discriminate between faults in different zones but located at similar distance. An application example illustrates the precision, to locate the faulted zone, obtained with the proposed methodology in simulated framework. The proposal provides appropriate information for the prevention and opportune attention of faults,requires minimum investment and overcomes the multiple-estimation problem of the classic impedance based methods.

  20. Feature selection and classification model construction on type 2 diabetic patients' data.

    Science.gov (United States)

    Huang, Yue; McCullagh, Paul; Black, Norman; Harper, Roy

    2007-11-01

    Diabetes affects between 2% and 4% of the global population (up to 10% in the over 65 age group), and its avoidance and effective treatment are undoubtedly crucial public health and health economics issues in the 21st century. The aim of this research was to identify significant factors influencing diabetes control, by applying feature selection to a working patient management system to assist with ranking, classification and knowledge discovery. The classification models can be used to determine individuals in the population with poor diabetes control status based on physiological and examination factors. The diabetic patients' information was collected by Ulster Community and Hospitals Trust (UCHT) from year 2000 to 2004 as part of clinical management. In order to discover key predictors and latent knowledge, data mining techniques were applied. To improve computational efficiency, a feature selection technique, feature selection via supervised model construction (FSSMC), an optimisation of ReliefF, was used to rank the important attributes affecting diabetic control. After selecting suitable features, three complementary classification techniques (Naïve Bayes, IB1 and C4.5) were applied to the data to predict how well the patients' condition was controlled. FSSMC identified patients' 'age', 'diagnosis duration', the need for 'insulin treatment', 'random blood glucose' measurement and 'diet treatment' as the most important factors influencing blood glucose control. Using the reduced features, a best predictive accuracy of 95% and sensitivity of 98% was achieved. The influence of factors, such as 'type of care' delivered, the use of 'home monitoring', and the importance of 'smoking' on outcome can contribute to domain knowledge in diabetes control. In the care of patients with diabetes, the more important factors identified: patients' 'age', 'diagnosis duration' and 'family history', are beyond the control of physicians. Treatment methods such as 'insulin', 'diet

  1. Burned Area Mapping Using Support Vector Machines and the FuzCoC Feature Selection Method on VHR IKONOS Imagery

    Directory of Open Access Journals (Sweden)

    Eleni Dragozi

    2014-12-01

    Full Text Available The ever increasing need for accurate burned area mapping has led to a number of studies that focus on improving the mapping accuracy and effectiveness. In this work, we investigate the influence of derivative spectral and spatial features on accurately mapping recently burned areas using VHR IKONOS imagery. Our analysis considers both pixel and object-based approaches, using two advanced image analysis techniques: (a an efficient feature selection method based on the Fuzzy Complementary Criterion (FuzCoC and (b the Support Vector Machine (SVM classifier. In both cases (pixel and object-based, a number of higher-order spectral and spatial features were produced from the original image. The proposed methodology was tested in areas of Greece recently affected by severe forest fires, namely, Parnitha and Rhodes. The extensive comparative analysis indicates that the SVM object-based scheme exhibits higher classification accuracy than the respective pixel-based one. Additionally, the accuracy increased with the addition of derivative features and subsequent implementation of the FuzCoC feature selection (FS method. Apart from the positive effect in the classification accuracy, the application of the FuzCoC FS method significantly reduces the computational requirements and facilitates the manipulation of the large data volume. In both cases (pixel and objet the results confirmed that the use of an efficient feature selection method is a prerequisite step when extra information through higher-order features is added to the classification process of VHR imagery for burned area mapping.

  2. Evaluating EMG Feature and Classifier Selection for Application to Partial-Hand Prosthesis Control

    Directory of Open Access Journals (Sweden)

    Adenike A. Adewuyi

    2016-10-01

    Full Text Available Pattern recognition-based myoelectric control of upper limb prostheses has the potential to restore control of multiple degrees of freedom. Though this control method has been extensively studied in individuals with higher-level amputations, few studies have investigated its effectiveness for individuals with partial-hand amputations. Most partial-hand amputees retain a functional wrist and the ability of pattern recognition-based methods to correctly classify hand motions from different wrist positions is not well studied. In this study, focusing on partial-hand amputees, we evaluate (1 the performance of non-linear and linear pattern recognition algorithms and (2 the performance of optimal EMG feature subsets for classification of four hand motion classes in different wrist positions for 16 non-amputees and 4 amputees. Our results show that linear discriminant analysis and linear and non-linear artificial neural networks perform significantly better than the quadratic discriminant analysis for both non-amputees and partial-hand amputees. For amputees, including information from multiple wrist positions significantly decreased error (p<0.001 but no further significant decrease in error occurred when more than 4, 2, or 3 positions were included for the extrinsic (p=0.07, intrinsic (p=0.06, or combined extrinsic and intrinsic muscle EMG (p=0.08, respectively. Finally, we found that a feature set determined by selecting optimal features from each channel outperformed the commonly used time domain (p<0.001 and time domain/autoregressive feature sets (p<0.01. This method can be used as a screening filter to select the features from each channel that provide the best classification of hand postures across different wrist positions.

  3. Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

    Science.gov (United States)

    Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684

  4. Palmprint Based Verification System Using SURF Features

    Science.gov (United States)

    Srinivas, Badrinath G.; Gupta, Phalguni

    This paper describes the design and development of a prototype of robust biometric system for verification. The system uses features extracted using Speeded Up Robust Features (SURF) operator of human hand. The hand image for features is acquired using a low cost scanner. The palmprint region extracted is robust to hand translation and rotation on the scanner. The system is tested on IITK database of 200 images and PolyU database of 7751 images. The system is found to be robust with respect to translation and rotation. It has FAR 0.02%, FRR 0.01% and accuracy of 99.98% and can be a suitable system for civilian applications and high-security environments.

  5. Feature-based automatic color calibration for networked camera system

    Science.gov (United States)

    Yamamoto, Shoji; Taki, Keisuke; Tsumura, Norimichi; Nakaguchi, Toshiya; Miyake, Yoichi

    2011-01-01

    In this paper, we have developed a feature-based automatic color calibration by using an area-based detection and adaptive nonlinear regression method. Simple color matching of chartless is achieved by using the characteristic of overlapping image area with each camera. Accurate detection of common object is achieved by the area-based detection that combines MSER with SIFT. Adaptive color calibration by using the color of detected object is calculated by nonlinear regression method. This method can indicate the contribution of object's color for color calibration, and automatic selection notification for user is performed by this function. Experimental result show that the accuracy of the calibration improves gradually. It is clear that this method can endure practical use of multi-camera color calibration if an enough sample is obtained.

  6. Analytical Features: A Knowledge-Based Approach to Audio Feature Generation

    Directory of Open Access Journals (Sweden)

    Pachet François

    2009-01-01

    Full Text Available We present a feature generation system designed to create audio features for supervised classification tasks. The main contribution to feature generation studies is the notion of analytical features (AFs, a construct designed to support the representation of knowledge about audio signal processing. We describe the most important aspects of AFs, in particular their dimensional type system, on which are based pattern-based random generators, heuristics, and rewriting rules. We show how AFs generalize or improve previous approaches used in feature generation. We report on several projects using AFs for difficult audio classification tasks, demonstrating their advantage over standard audio features. More generally, we propose analytical features as a paradigm to bring raw signals into the world of symbolic computation.

  7. Statistical feature extraction based iris recognition system

    Indian Academy of Sciences (India)

    Performance of the proposed iris recognition system (IRS) has been measured by recording false acceptance rate (FAR) and false rejection rate (FRR) at differentthresholds in the distance metric. System performance has been evaluated by computing statistical features along two directions, namely, radial direction of ...

  8. Surface characterization based upon significant topographic features

    Energy Technology Data Exchange (ETDEWEB)

    Blanc, J; Grime, D; Blateyron, F, E-mail: fblateyron@digitalsurf.fr [Digital Surf, 16 rue Lavoisier, F-25000 Besancon (France)

    2011-08-19

    Watershed segmentation and Wolf pruning, as defined in ISO 25178-2, allow the detection of significant features on surfaces and their characterization in terms of dimension, area, volume, curvature, shape or morphology. These new tools provide a robust way to specify functional surfaces.

  9. Surface characterization based upon significant topographic features

    International Nuclear Information System (INIS)

    Blanc, J; Grime, D; Blateyron, F

    2011-01-01

    Watershed segmentation and Wolf pruning, as defined in ISO 25178-2, allow the detection of significant features on surfaces and their characterization in terms of dimension, area, volume, curvature, shape or morphology. These new tools provide a robust way to specify functional surfaces.

  10. Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data.

    Science.gov (United States)

    Grissa, Dhouha; Pétéra, Mélanie; Brandolini, Marion; Napoli, Amedeo; Comte, Blandine; Pujos-Guillot, Estelle

    2016-01-01

    Untargeted metabolomics is a powerful phenotyping tool for better understanding biological mechanisms involved in human pathology development and identifying early predictive biomarkers. This approach, based on multiple analytical platforms, such as mass spectrometry (MS), chemometrics and bioinformatics, generates massive and complex data that need appropriate analyses to extract the biologically meaningful information. Despite various tools available, it is still a challenge to handle such large and noisy datasets with limited number of individuals without risking overfitting. Moreover, when the objective is focused on the identification of early predictive markers of clinical outcome, few years before occurrence, it becomes essential to use the appropriate algorithms and workflow to be able to discover subtle effects among this large amount of data. In this context, this work consists in studying a workflow describing the general feature selection process, using knowledge discovery and data mining methodologies to propose advanced solutions for predictive biomarker discovery. The strategy was focused on evaluating a combination of numeric-symbolic approaches for feature selection with the objective of obtaining the best combination of metabolites producing an effective and accurate predictive model. Relying first on numerical approaches, and especially on machine learning methods (SVM-RFE, RF, RF-RFE) and on univariate statistical analyses (ANOVA), a comparative study was performed on an original metabolomic dataset and reduced subsets. As resampling method, LOOCV was applied to minimize the risk of overfitting. The best k-features obtained with different scores of importance from the combination of these different approaches were compared and allowed determining the variable stabilities using Formal Concept Analysis. The results revealed the interest of RF-Gini combined with ANOVA for feature selection as these two complementary methods allowed selecting the 48

  11. HEART RATE VARIABILITY CLASSIFICATION USING SADE-ELM CLASSIFIER WITH BAT FEATURE SELECTION

    Directory of Open Access Journals (Sweden)

    R Kavitha

    2017-07-01

    Full Text Available The electrical activity of the human heart is measured by the vital bio medical signal called ECG. This electrocardiogram is employed as a crucial source to gather the diagnostic information of a patient’s cardiopathy. The monitoring function of cardiac disease is diagnosed by documenting and handling the electrocardiogram (ECG impulses. In the recent years many research has been done and developing an enhanced method to identify the risk in the patient’s body condition by processing and analysing the ECG signal. This analysis of the signal helps to find the cardiac abnormalities, arrhythmias, and many other heart problems. ECG signal is processed to detect the variability in heart rhythm; heart rate variability is calculated based on the time interval between heart beats. Heart Rate Variability HRV is measured by the variation in the beat to beat interval. The Heart rate Variability (HRV is an essential aspect to diagnose the properties of the heart. Recent development enhances the potential with the aid of non-linear metrics in reference point with feature selection. In this paper, the fundamental elements are taken from the ECG signal for feature selection process where Bat algorithm is employed for feature selection to predict the best feature and presented to the classifier for accurate classification. The popular machine learning algorithm ELM is taken for classification, integrated with evolutionary algorithm named Self- Adaptive Differential Evolution Extreme Learning Machine SADEELM to improve the reliability of classification. It combines Effective Fuzzy Kohonen clustering network (EFKCN to be able to increase the accuracy of the effect for HRV transmission classification. Hence, it is observed that the experiment carried out unveils that the precision is improved by the SADE-ELM method and concurrently optimizes the computation time.

  12. Welding Diagnostics by Means of Particle Swarm Optimization and Feature Selection

    Directory of Open Access Journals (Sweden)

    J. Mirapeix

    2012-01-01

    Full Text Available In a previous contribution, a welding diagnostics approach based on plasma optical spectroscopy was presented. It consisted of the employment of optimization algorithms and synthetic spectra to obtain the participation profiles of the species participating in the plasma. A modification of the model is discussed here: on the one hand the controlled random search algorithm has been substituted by a particle swarm optimization implementation. On the other hand a feature selection stage has been included to determine those spectral windows where the optimization process will take place. Both experimental and field tests will be shown to illustrate the performance of the solution that improves the results of the previous work.

  13. Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

    Science.gov (United States)

    Attallah, Omneya; Karthikesalingam, Alan; Holt, Peter Je; Thompson, Matthew M; Sayers, Rob; Bown, Matthew J; Choke, Eddie C; Ma, Xianghong

    2017-11-01

    Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of

  14. Channel Selection and Feature Projection for Cognitive Load Estimation Using Ambulatory EEG

    Directory of Open Access Journals (Sweden)

    Tian Lan

    2007-01-01

    Full Text Available We present an ambulatory cognitive state classification system to assess the subject's mental load based on EEG measurements. The ambulatory cognitive state estimator is utilized in the context of a real-time augmented cognition (AugCog system that aims to enhance the cognitive performance of a human user through computer-mediated assistance based on assessments of cognitive states using physiological signals including, but not limited to, EEG. This paper focuses particularly on the offline channel selection and feature projection phases of the design and aims to present mutual-information-based techniques that use a simple sample estimator for this quantity. Analyses conducted on data collected from 3 subjects performing 2 tasks (n-back/Larson at 2 difficulty levels (low/high demonstrate that the proposed mutual-information-based dimensionality reduction scheme can achieve up to 94% cognitive load estimation accuracy.

  15. Grammar-based feature generation for time-series prediction

    CERN Document Server

    De Silva, Anthony Mihirana

    2015-01-01

    This book proposes a novel approach for time-series prediction using machine learning techniques with automatic feature generation. Application of machine learning techniques to predict time-series continues to attract considerable attention due to the difficulty of the prediction problems compounded by the non-linear and non-stationary nature of the real world time-series. The performance of machine learning techniques, among other things, depends on suitable engineering of features. This book proposes a systematic way for generating suitable features using context-free grammar. A number of feature selection criteria are investigated and a hybrid feature generation and selection algorithm using grammatical evolution is proposed. The book contains graphical illustrations to explain the feature generation process. The proposed approaches are demonstrated by predicting the closing price of major stock market indices, peak electricity load and net hourly foreign exchange client trade volume. The proposed method ...

  16. Learning image descriptors for matching based on Haar features

    Science.gov (United States)

    Chen, L.; Rottensteiner, F.; Heipke, C.

    2014-08-01

    This paper presents a new and fast binary descriptor for image matching learned from Haar features. The training uses AdaBoost; the weak learner is built on response function for Haar features, instead of histogram-type features. The weak classifier is selected from a large weak feature pool. The selected features have different feature type, scale and position within the patch, having correspond threshold value for weak classifiers. Besides, to cope with the fact in real matching that dissimilar matches are encountered much more often than similar matches, cascaded classifiers are trained to motivate training algorithms see a large number of dissimilar patch pairs. The final trained output are binary value vectors, namely descriptors, with corresponding weight and perceptron threshold for a strong classifier in every stage. We present preliminary results which serve as a proof-of-concept of the work.

  17. Mining for diagnostic information in body surface potential maps: A comparison of feature selection techniques

    Directory of Open Access Journals (Sweden)

    McCullagh Paul J

    2005-09-01

    Full Text Available Abstract Background In body surface potential mapping, increased spatial sampling is used to allow more accurate detection of a cardiac abnormality. Although diagnostically superior to more conventional electrocardiographic techniques, the perceived complexity of the Body Surface Potential Map (BSPM acquisition process has prohibited its acceptance in clinical practice. For this reason there is an interest in striking a compromise between the minimum number of electrocardiographic recording sites required to sample the maximum electrocardiographic information. Methods In the current study, several techniques widely used in the domains of data mining and knowledge discovery have been employed to mine for diagnostic information in 192 lead BSPMs. In particular, the Single Variable Classifier (SVC based filter and Sequential Forward Selection (SFS based wrapper approaches to feature selection have been implemented and evaluated. Using a set of recordings from 116 subjects, the diagnostic ability of subsets of 3, 6, 9, 12, 24 and 32 electrocardiographic recording sites have been evaluated based on their ability to correctly asses the presence or absence of Myocardial Infarction (MI. Results It was observed that the wrapper approach, using sequential forward selection and a 5 nearest neighbour classifier, was capable of choosing a set of 24 recording sites that could correctly classify 82.8% of BSPMs. Although the filter method performed slightly less favourably, the performance was comparable with a classification accuracy of 79.3%. In addition, experiments were conducted to show how (a features chosen using the wrapper approach were specific to the classifier used in the selection model, and (b lead subsets chosen were not necessarily unique. Conclusion It was concluded that both the filter and wrapper approaches adopted were suitable for guiding the choice of recording sites useful for determining the presence of MI. It should be noted however

  18. Object learning improves feature extraction but does not improve feature selection.

    Directory of Open Access Journals (Sweden)

    Linus Holm

    Full Text Available A single glance at your crowded desk is enough to locate your favorite cup. But finding an unfamiliar object requires more effort. This superiority in recognition performance for learned objects has at least two possible sources. For familiar objects observers might: 1 select more informative image locations upon which to fixate their eyes, or 2 extract more information from a given eye fixation. To test these possibilities, we had observers localize fragmented objects embedded in dense displays of random contour fragments. Eight participants searched for objects in 600 images while their eye movements were recorded in three daily sessions. Performance improved as subjects trained with the objects: The number of fixations required to find an object decreased by 64% across the 3 sessions. An ideal observer model that included measures of fragment confusability was used to calculate the information available from a single fixation. Comparing human performance to the model suggested that across sessions information extraction at each eye fixation increased markedly, by an amount roughly equal to the extra information that would be extracted following a 100% increase in functional field of view. Selection of fixation locations, on the other hand, did not improve with practice.

  19. Manifold regularized multi-task feature selection for multi-modality classification in Alzheimer's disease.

    Science.gov (United States)

    Jie, Biao; Zhang, Daoqiang; Cheng, Bo; Shen, Dinggang

    2013-01-01

    Accurate diagnosis of Alzheimer's disease (AD), as well as its prodromal stage (i.e., mild cognitive impairment, MCI), is very important for possible delay and early treatment of the disease. Recently, multi-modality methods have been used for fusing information from multiple different and complementary imaging and non-imaging modalities. Although there are a number of existing multi-modality methods, few of them have addressed the problem of joint identification of disease-related brain regions from multi-modality data for classification. In this paper, we proposed a manifold regularized multi-task learning framework to jointly select features from multi-modality data. Specifically, we formulate the multi-modality classification as a multi-task learning framework, where each task focuses on the classification based on each modality. In order to capture the intrinsic relatedness among multiple tasks (i.e., modalities), we adopted a group sparsity regularizer, which ensures only a small number of features to be selected jointly. In addition, we introduced a new manifold based Laplacian regularization term to preserve the geometric distribution of original data from each task, which can lead to the selection of more discriminative features. Furthermore, we extend our method to the semi-supervised setting, which is very important since the acquisition of a large set of labeled data (i.e., diagnosis of disease) is usually expensive and time-consuming, while the collection of unlabeled data is relatively much easier. To validate our method, we have performed extensive evaluations on the baseline Magnetic resonance imaging (MRI) and fluorodeoxyglucose positron emission tomography (FDG-PET) data of Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Our experimental results demonstrate the effectiveness of the proposed method.

  20. Prediction of protein modification sites of pyrrolidone carboxylic acid using mRMR feature selection and analysis.

    Directory of Open Access Journals (Sweden)

    Lu-Lu Zheng

    Full Text Available Pyrrolidone carboxylic acid (PCA is formed during a common post-translational modification (PTM of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR and incremental feature selection (IFS. We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations.

  1. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    Science.gov (United States)

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  2. Features and characteristics of problem based learning

    OpenAIRE

    Eser Ceker; Fezile Ozdamli

    2016-01-01

    Throughout the years, there appears to be an increase in Problem Based Learning applications in education; and Problem Based Learning related research areas. The main aim of this research is to underline the fundamentals (basic elements) of Problem Based Learning, investigate the dimensions of research approached to PBL oriented areas (with a look for the latest technology supported tools of Problem Based Learning). This research showed that the most researched characteristics of PBL are; tea...

  3. A Hybrid Feature Subset Selection Algorithm for Analysis of High Correlation Proteomic Data

    Science.gov (United States)

    Kordy, Hussain Montazery; Baygi, Mohammad Hossein Miran; Moradi, Mohammad Hassan

    2012-01-01

    Pathological changes within an organ can be reflected as proteomic patterns in biological fluids such as plasma, serum, and urine. The surface-enhanced laser desorption and ionization time-of-flight mass spectrometry (SELDI-TOF MS) has been used to generate proteomic profiles from biological fluids. Mass spectrometry yields redundant noisy data that the most data points are irrelevant features for differentiating between cancer and normal cases. In this paper, we have proposed a hybrid feature subset selection algorithm based on maximum-discrimination and minimum-correlation coupled with peak scoring criteria. Our algorithm has been applied to two independent SELDI-TOF MS datasets of ovarian cancer obtained from the NCI-FDA clinical proteomics databank. The proposed algorithm has used to extract a set of proteins as potential biomarkers in each dataset. We applied the linear discriminate analysis to identify the important biomarkers. The selected biomarkers have been able to successfully diagnose the ovarian cancer patients from the noncancer control group with an accuracy of 100%, a sensitivity of 100%, and a specificity of 100% in the two datasets. The hybrid algorithm has the advantage that increases reproducibility of selected biomarkers and able to find a small set of proteins with high discrimination power. PMID:23717808

  4. Features and Characteristics of Problem Based Learning

    Science.gov (United States)

    Ceker, Eser; Ozdamli, Fezile

    2016-01-01

    Throughout the years, there appears to be an increase in Problem Based Learning applications in education; and Problem Based Learning related research areas. The main aim of this research is to underline the fundamentals (basic elements) of Problem Based Learning, investigate the dimensions of research approached to PBL oriented areas (with a look…

  5. Multi Filtration Feature Selection (MFFS to improve discriminatory ability in clinical data set

    Directory of Open Access Journals (Sweden)

    S. Sasikala

    2016-07-01

    Full Text Available Selection of optimal features is an important area of research in medical data mining systems. In this paper we introduce an efficient four-stage procedure – feature extraction, feature subset selection, feature ranking and classification, called as Multi-Filtration Feature Selection (MFFS, for an investigation on the improvement of detection accuracy and optimal feature subset selection. The proposed method adjusts a parameter named “variance coverage” and builds the model with the value at which maximum classification accuracy is obtained. This facilitates the selection of a compact set of superior features, remarkably at a very low cost. An extensive experimental comparison of the proposed method and other methods using four different classifiers (Naïve Bayes (NB, Support Vector Machine (SVM, multi layer perceptron (MLP and J48 decision tree and 22 different medical data sets confirm that the proposed MFFS strategy yields promising results on feature selection and classification accuracy for medical data mining field of research.

  6. Multi-task feature selection in microarray data by binary integer programming.

    Science.gov (United States)

    Lan, Liang; Vucetic, Slobodan

    2013-12-20

    A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.

  7. Efficient feature selection using a hybrid algorithm for the task of epileptic seizure detection

    Science.gov (United States)

    Lai, Kee Huong; Zainuddin, Zarita; Ong, Pauline

    2014-07-01

    Feature selection is a very important aspect in the field of machine learning. It entails the search of an optimal subset from a very large data set with high dimensional feature space. Apart from eliminating redundant features and reducing computational cost, a good selection of feature also leads to higher prediction and classification accuracy. In this paper, an efficient feature selection technique is introduced in the task of epileptic seizure detection. The raw data are electroencephalography (EEG) signals. Using discrete wavelet transform, the biomedical signals were decomposed into several sets of wavelet coefficients. To reduce the dimension of these wavelet coefficients, a feature selection method that combines the strength of both filter and wrapper methods is proposed. Principal component analysis (PCA) is used as part of the filter method. As for wrapper method, the evolutionary harmony search (HS) algorithm is employed. This metaheuristic method aims at finding the best discriminating set of features from the original data. The obtained features were then used as input for an automated classifier, namely wavelet neural networks (WNNs). The WNNs model was trained to perform a binary classification task, that is, to determine whether a given EEG signal was normal or epileptic. For comparison purposes, different sets of features were also used as input. Simulation results showed that the WNNs that used the features chosen by the hybrid algorithm achieved the highest overall classification accuracy.

  8. Near-duplicate Video Detection Algorithm Based on Global GSP Feature and Local ScSIFT Feature Fusion

    Science.gov (United States)

    Luan, Xidao; Xie, Yuxiang; He, Jingmeng; Zhang, Lili; Li, Chen; Zhang, Xin

    2018-01-01

    The main problem with near-duplicate video detection is the high computational complexity and the low efficiency. Near-duplicate video detection methods based on global feature is running fast but with low accuracy, on the contrary, methods based on local feature is accurate, but the calculation is large and time-consuming. Therefore, a near-duplicate video detection algorithm combining global GSP feature and local ScSIFT feature is proposed. Firstly, the video clips and the query ones are discretized into a set of key frame sequences, and the temporal information is recorded at the same time. Secondly, by filtering the video clips on global Gaussian-Scale pyramid feature, similar video clips are selected as the candidates to assure the high recall. Then, combined with the temporal features of the keyframes, the candidate videos are further detected by the local ScSIFT feature to obtain a higher precision. Experimental results show that the proposed algorithm can improve the accuracy of near-duplicate video detection on the basis of guaranteeing the timeliness of the algorithm.

  9. Conjunctive input processing drives feature selectivity in hippocampal CA1 neurons.

    Science.gov (United States)

    Bittner, Katie C; Grienberger, Christine; Vaidya, Sachin P; Milstein, Aaron D; Macklin, John J; Suh, Junghyup; Tonegawa, Susumu; Magee, Jeffrey C

    2015-08-01

    Feature-selective firing allows networks to produce representations of the external and internal environments. Despite its importance, the mechanisms generating neuronal feature selectivity are incompletely understood. In many cortical microcircuits the integration of two functionally distinct inputs occurs nonlinearly through generation of active dendritic signals that drive burst firing and robust plasticity. To examine the role of this processing in feature selectivity, we recorded CA1 pyramidal neuron membrane potential and local field potential in mice running on a linear treadmill. We found that dendritic plateau potentials were produced by an interaction between properly timed input from entorhinal cortex and hippocampal CA3. These conjunctive signals positively modulated the firing of previously established place fields and rapidly induced new place field formation to produce feature selectivity in CA1 that is a function of both entorhinal cortex and CA3 input. Such selectivity could allow mixed network level representations that support context-dependent spatial maps.

  10. Cancer Feature Selection and Classification Using a Binary Quantum-Behaved Particle Swarm Optimization and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Maolong Xi

    2016-01-01

    Full Text Available This paper focuses on the feature gene selection for cancer classification, which employs an optimization algorithm to select a subset of the genes. We propose a binary quantum-behaved particle swarm optimization (BQPSO for cancer feature gene selection, coupling support vector machine (SVM for cancer classification. First, the proposed BQPSO algorithm is described, which is a discretized version of original QPSO for binary 0-1 optimization problems. Then, we present the principle and procedure for cancer feature gene selection and cancer classification based on BQPSO and SVM with leave-one-out cross validation (LOOCV. Finally, the BQPSO coupling SVM (BQPSO/SVM, binary PSO coupling SVM (BPSO/SVM, and genetic algorithm coupling SVM (GA/SVM are tested for feature gene selection and cancer classification on five microarray data sets, namely, Leukemia, Prostate, Colon, Lung, and Lymphoma. The experimental results show that BQPSO/SVM has significant advantages in accuracy, robustness, and the number of feature genes selected compared with the other two algorithms.

  11. Ensemble based system for whole-slide prostate cancer probability mapping using color texture features.

    LENUS (Irish Health Repository)

    DiFranco, Matthew D

    2011-01-01

    We present a tile-based approach for producing clinically relevant probability maps of prostatic carcinoma in histological sections from radical prostatectomy. Our methodology incorporates ensemble learning for feature selection and classification on expert-annotated images. Random forest feature selection performed over varying training sets provides a subset of generalized CIEL*a*b* co-occurrence texture features, while sample selection strategies with minimal constraints reduce training data requirements to achieve reliable results. Ensembles of classifiers are built using expert-annotated tiles from training images, and scores for the probability of cancer presence are calculated from the responses of each classifier in the ensemble. Spatial filtering of tile-based texture features prior to classification results in increased heat-map coherence as well as AUC values of 95% using ensembles of either random forests or support vector machines. Our approach is designed for adaptation to different imaging modalities, image features, and histological decision domains.

  12. Superpixel-Based Feature for Aerial Image Scene Recognition

    Directory of Open Access Journals (Sweden)

    Hongguang Li

    2018-01-01

    Full Text Available Image scene recognition is a core technology for many aerial remote sensing applications. Different landforms are inputted as different scenes in aerial imaging, and all landform information is regarded as valuable for aerial image scene recognition. However, the conventional features of the Bag-of-Words model are designed using local points or other related information and thus are unable to fully describe landform areas. This limitation cannot be ignored when the aim is to ensure accurate aerial scene recognition. A novel superpixel-based feature is proposed in this study to characterize aerial image scenes. Then, based on the proposed feature, a scene recognition method of the Bag-of-Words model for aerial imaging is designed. The proposed superpixel-based feature that utilizes landform information establishes top-task superpixel extraction of landforms to bottom-task expression of feature vectors. This characterization technique comprises the following steps: simple linear iterative clustering based superpixel segmentation, adaptive filter bank construction, Lie group-based feature quantification, and visual saliency model-based feature weighting. Experiments of image scene recognition are carried out using real image data captured by an unmanned aerial vehicle (UAV. The recognition accuracy of the proposed superpixel-based feature is 95.1%, which is higher than those of scene recognition algorithms based on other local features.

  13. Segmentation-Based PolSAR Image Classification Using Visual Features: RHLBP and Color Features

    Directory of Open Access Journals (Sweden)

    Jian Cheng

    2015-05-01

    Full Text Available A segmentation-based fully-polarimetric synthetic aperture radar (PolSAR image classification method that incorporates texture features and color features is designed and implemented. This method is based on the framework that conjunctively uses statistical region merging (SRM for segmentation and support vector machine (SVM for classification. In the segmentation step, we propose an improved local binary pattern (LBP operator named the regional homogeneity local binary pattern (RHLBP to guarantee the regional homogeneity in PolSAR images. In the classification step, the color features extracted from false color images are applied to improve the classification accuracy. The RHLBP operator and color features can provide discriminative information to separate those pixels and regions with similar polarimetric features, which are from different classes. Extensive experimental comparison results with conventional methods on L-band PolSAR data demonstrate the effectiveness of our proposed method for PolSAR image classification.

  14. A Hierarchical Feature and Sample Selection Framework and Its Application for Alzheimer’s Disease Diagnosis

    Science.gov (United States)

    An, Le; Adeli, Ehsan; Liu, Mingxia; Zhang, Jun; Lee, Seong-Whan; Shen, Dinggang

    2017-03-01

    Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer’s disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.

  15. Classical Music Clustering Based on Acoustic Features

    OpenAIRE

    Wang, Xindi; Haque, Syed Arefinul

    2017-01-01

    In this paper we cluster 330 classical music pieces collected from MusicNet database based on their musical note sequence. We use shingling and chord trajectory matrices to create signature for each music piece and performed spectral clustering to find the clusters. Based on different resolution, the output clusters distinctively indicate composition from different classical music era and different composing style of the musicians.

  16. Neural mechanisms of spatial- and feature-based attention: a quantitative analysis.

    Science.gov (United States)

    Stoppel, Christian Michael; Boehler, Carsten Nicolas; Sabelhaus, Clemens; Heinze, Hans-Jochen; Hopf, Jens Max; Schoenfeld, Mircea Ariel

    2007-11-21

    Attentional selection can be based on spatial locations, non-spatial stimulus features, or entire objects as integrated feature ensembles. Several studies reported attentional modulations in those regions that process the constituent features of the presented stimuli. Here we employed functional magnetic resonance imaging (fMRI) to directly compare the magnitude of space- and/or feature-based attentional modulations while subjects directed their attention to a particular color (red or green) of a transparent surface and at the same time to a spatial location (left or right visual field). The experimental design made it possible to disentangle and quantify the hemodynamic activity elicited by identical physical stimuli when attention was directed to spatial locations and/or stimulus features. The highest modulations were observed when the attentional selection was based on spatial location. Attended features also elicited a response increase relative to unattended features when their spatial location was attended. Importantly, at unattended locations, a response increase upon feature-based selection was observed in motion-sensitive but not in color-related areas. This suggests that compared to color, motion stimuli are more effective in capturing attention at unattended locations leading to a competitive advantage. These results support the idea of a high biological relevance of the feature motion in the visual world.

  17. Selecting Informative Features of the Helicopter and Aircraft Acoustic Signals in the Adaptive Autonomous Information Systems for Recognition

    Directory of Open Access Journals (Sweden)

    V. K. Hohlov

    2017-01-01

    Full Text Available The article forms the rationale for selecting the informative features of the helicopter and aircraft acoustic signals to solve a problem of their recognition and shows that the most informative ones are the counts of extrema in the energy spectra of the input signals, which represent non-centered random variables. An apparatus of the multiple initial regression coefficients was selected as a mathematical tool of research. The application of digital re-circulators with positive and negative feedbacks, which have the comb-like frequency characteristics, solves the problem of selecting informative features. A distinguishing feature of such an approach is easy agility of the comb frequency characteristics just through the agility of a delay value of digital signal in the feedback circuit. Adding an adaptation block to the selection block of the informative features enables us to ensure the invariance of used informative feature and counts of local extrema of the spectral power density to the airspeed of a helicopter. The paper gives reasons for the principle of adaptation and the structure of the adaptation block. To form the discriminator characteristics are used the cross-correlation statistical characteristics of the quadrature components of acoustic signal realizations, obtained by Hilbert transform. The paper proposes to provide signal recognition using a regression algorithm that allows handling the non-centered informative features and using a priori information about coefficients of initial regression of signal and noise.The simulation in Matlab Simulink has shown that selected informative features of signals in regressive processing of signal realizations of 0.5 s duration have good separability, and based on a set of 100 acoustic signal realizations in each class in full-scale conditions, has proved ensuring a correct recognition probability of 0.975, at least. The considered principles of informative features selection and adaptation can

  18. Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images.

    Science.gov (United States)

    Zhang, Lefei; Zhang, Qian; Du, Bo; Huang, Xin; Tang, Yuan Yan; Tao, Dacheng

    2018-01-01

    In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature, and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation has not efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient.

  19. A study of metaheuristic algorithms for high dimensional feature selection on microarray data

    Science.gov (United States)

    Dankolo, Muhammad Nasiru; Radzi, Nor Haizan Mohamed; Sallehuddin, Roselina; Mustaffa, Noorfa Haszlinna

    2017-11-01

    Microarray systems enable experts to examine gene profile at molecular level using machine learning algorithms. It increases the potentials of classification and diagnosis of many diseases at gene expression level. Though, numerous difficulties may affect the efficiency of machine learning algorithms which includes vast number of genes features comprised in the original data. Many of these features may be unrelated to the intended analysis. Therefore, feature selection is necessary to be performed in the data pre-processing. Many feature selection algorithms are developed and applied on microarray which including the metaheuristic optimization algorithms. This paper discusses the application of the metaheuristics algorithms for feature selection in microarray dataset. This study reveals that, the algorithms have yield an interesting result with limited resources thereby saving computational expenses of machine learning algorithms.

  20. Feature Selection Strategy for Classification of Single-Trial EEG Elicited by Motor Imagery

    DEFF Research Database (Denmark)

    Prasad, Swati; Tan, Zheng-Hua; Prasad, Ramjee

    2011-01-01

    Brain-Computer Interface (BCI) provides new means of communication for people with motor disabilities by utilizing electroencephalographic activity. Selection of features from Electroencephalogram (EEG) signals for classification plays a key part in the development of BCI systems. In this paper, we...... present a feature selection strategy consisting of channel selection by fisher ratio analysis in the frequency domain and time segment selection by visual inspection in time domain. The proposed strategy achieves an absolute improvement of 7.5% in the misclassification rate as compared with the baseline...

  1. Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure.

    Science.gov (United States)

    Foroughi Pour, Ali; Dalton, Lori A

    2018-03-21

    Many bioinformatics studies aim to identify markers, or features, that can be used to discriminate between distinct groups. In problems where strong individual markers are not available, or where interactions between gene products are of primary interest, it may be necessary to consider combinations of features as a marker family. To this end, recent work proposes a hierarchical Bayesian framework for feature selection that places a prior on the set of features we wish to select and on the label-conditioned feature distribution. While an analytical posterior under Gaussian models with block covariance structures is available, the optimal feature selection algorithm for this model remains intractable since it requires evaluating the posterior over the space of all possible covariance block structures and feature-block assignments. To address this computational barrier, in prior work we proposed a simple suboptimal algorithm, 2MNC-Robust, with robust performance across the space of block structures. Here, we present three new heuristic feature selection algorithms. The proposed algorithms outperform 2MNC-Robust and many other popular feature selection algorithms on synthetic data. In addition, enrichment analysis on real breast cancer, colon cancer, and Leukemia data indicates they also output many of the genes and pathways linked to the cancers under study. Bayesian feature selection is a promising framework for small-sample high-dimensional data, in particular biomarker discovery applications. When applied to cancer data these algorithms outputted many genes already shown to be involved in cancer as well as potentially new biomarkers. Furthermore, one of the proposed algorithms, SPM, outputs blocks of heavily correlated genes, particularly useful for studying gene interactions and gene networks.

  2. GENDER RECOGNITION BASED ON SIFT FEATURES

    OpenAIRE

    Sahar Yousefi; Morteza Zahedi

    2011-01-01

    This paper proposes a robust approach for face detection and gender classification in color images. Previous researches about gender recognition suppose an expensive computational and time-consuming pre-processing step in order to alignment in which face images are aligned so that facial landmarks like eyes, nose, lips, chin are placed in uniform locations in image. In this paper, a novel technique based on mathematical analysis is represented in three stages that eliminates align...

  3. Online Learning of Hierarchical Pitman-Yor Process Mixture of Generalized Dirichlet Distributions With Feature Selection.

    Science.gov (United States)

    Fan, Wentao; Sallay, Hassen; Bouguila, Nizar

    2017-09-01

    In this paper, a novel statistical generative model based on hierarchical Pitman-Yor process and generalized Dirichlet distributions (GDs) is presented. The proposed model allows us to perform joint clustering and feature selection thanks to the interesting properties of the GD distribution. We develop an online variational inference algorithm, formulated in terms of the minimization of a Kullback-Leibler divergence, of our resulting model that tackles the problem of learning from high-dimensional examples. This variational Bayes formulation allows simultaneously estimating the parameters, determining the model's complexity, and selecting the appropriate relevant features for the clustering structure. Moreover, the proposed online learning algorithm allows data instances to be processed in a sequential manner, which is critical for large-scale and real-time applications. Experiments conducted using challenging applications, namely, scene recognition and video segmentation, where our approach is viewed as an unsupervised technique for visual learning in high-dimensional spaces, showed that the proposed approach is suitable and promising.

  4. Feature selection for Bayesian network classifiers using the MDL-FS score

    NARCIS (Netherlands)

    Drugan, Madalina M.; Wiering, Marco A.

    When constructing a Bayesian network classifier from data, the more or less redundant features included in a dataset may bias the classifier and as a consequence may result in a relatively poor classification accuracy. In this paper, we study the problem of selecting appropriate subsets of features

  5. Selecting Optimal Feature Set in High-Dimensional Data by Swarm Search

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2013-01-01

    Full Text Available Selecting the right set of features from data of high dimensionality for inducing an accurate classification model is a tough computational challenge. It is almost a NP-hard problem as the combinations of features escalate exponentially as the number of features increases. Unfortunately in data mining, as well as other engineering applications and bioinformatics, some data are described by a long array of features. Many feature subset selection algorithms have been proposed in the past, but not all of them are effective. Since it takes seemingly forever to use brute force in exhaustively trying every possible combination of features, stochastic optimization may be a solution. In this paper, we propose a new feature selection scheme called Swarm Search to find an optimal feature set by using metaheuristics. The advantage of Swarm Search is its flexibility in integrating any classifier into its fitness function and plugging in any metaheuristic algorithm to facilitate heuristic search. Simulation experiments are carried out by testing the Swarm Search over some high-dimensional datasets, with different classification algorithms and various metaheuristic algorithms. The comparative experiment results show that Swarm Search is able to attain relatively low error rates in classification without shrinking the size of the feature subset to its minimum.

  6. Automatic brain MR image denoising based on texture feature-based artificial neural networks.

    Science.gov (United States)

    Chang, Yu-Ning; Chang, Herng-Hua

    2015-01-01

    Noise is one of the main sources of quality deterioration not only for visual inspection but also in computerized processing in brain magnetic resonance (MR) image analysis such as tissue classification, segmentation and registration. Accordingly, noise removal in brain MR images is important for a wide variety of subsequent processing applications. However, most existing denoising algorithms require laborious tuning of parameters that are often sensitive to specific image features and textures. Automation of these parameters through artificial intelligence techniques will be highly beneficial. In the present study, an artificial neural network associated with image texture feature analysis is proposed to establish a predictable parameter model and automate the denoising procedure. In the proposed approach, a total of 83 image attributes were extracted based on four categories: 1) Basic image statistics. 2) Gray-level co-occurrence matrix (GLCM). 3) Gray-level run-length matrix (GLRLM) and 4) Tamura texture features. To obtain the ranking of discrimination in these texture features, a paired-samples t-test was applied to each individual image feature computed in every image. Subsequently, the sequential forward selection (SFS) method was used to select the best texture features according to the ranking of discrimination. The selected optimal features were further incorporated into a back propagation neural network to establish a predictable parameter model. A wide variety of MR images with various scenarios were adopted to evaluate the performance of the proposed framework. Experimental results indicated that this new automation system accurately predicted the bilateral filtering parameters and effectively removed the noise in a number of MR images. Comparing to the manually tuned filtering process, our approach not only produced better denoised results but also saved significant processing time.

  7. A Mean-Shift-Based Feature Descriptor for Wide Baseline Stereo Matching

    Directory of Open Access Journals (Sweden)

    Yiwen Dou

    2015-01-01

    Full Text Available We propose a novel Mean-Shift-based building approach in wide baseline. Initially, scale-invariance feature transform (SIFT approach is used to extract relatively stable feature points. As to each matching SIFT feature point, it needs a reasonable neighborhood range so as to choose feature points set. Subsequently, in view of selecting repeatable and high robust feature points, Mean-Shift controls corresponding feature scale. At last, our approach is employed to depth image acquirement in wide baseline and Graph Cut algorithm optimizes disparity information. Compared with the existing methods such as SIFT, speeded up robust feature (SURF, and normalized cross-correlation (NCC, the presented approach has the advantages of higher robustness and accuracy rate. Experimental results on low resolution image and weak feature description in wide baseline confirm the validity of our approach.

  8. Compound feature selection and parameter optimization of ELM for fault diagnosis of rolling element bearings.

    Science.gov (United States)

    Luo, Meng; Li, Chaoshun; Zhang, Xiaoyuan; Li, Ruhai; An, Xueli

    2016-11-01

    This paper proposes a hybrid system named as HGSA-ELM for fault diagnosis of rolling element bearings, in which real-valued gravitational search algorithm (RGSA) is employed to optimize the input weights and bias of ELM, and the binary-valued of GSA (BGSA) is used to select important features from a compound feature set. Three types fault features, namely time and frequency features, energy features and singular value features, are extracted to compose the compound feature set by applying ensemble empirical mode decomposition (EEMD). For fault diagnosis of a typical rolling element bearing system with 56 working condition, comparative experiments were designed to evaluate the proposed method. And results show that HGSA-ELM achieves significant high classification accuracy compared with its original version and methods in literatures. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  9. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    Science.gov (United States)

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  10. Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction.

    Science.gov (United States)

    Lee, Michael C; Boroczky, Lilla; Sungur-Stasik, Kivilcim; Cann, Aaron D; Borczuk, Alain C; Kawut, Steven M; Powell, Charles A

    2010-09-01

    Accurate classification methods are critical in computer-aided diagnosis (CADx) and other clinical decision support systems. Previous research has reported on methods for combining genetic algorithm (GA) feature selection with ensemble classifier systems in an effort to increase classification accuracy. In this study, we describe a CADx system for pulmonary nodules using a two-step supervised learning system combining a GA with the random subspace method (RSM), with the aim of exploring algorithm design parameters and demonstrating improved classification performance over either the GA or RSM-based ensembles alone. We used a retrospective database of 125 pulmonary nodules (63 benign; 62 malignant) with CT volumes and clinical history. A total of 216 features were derived from the segmented image data and clinical history. Ensemble classifiers using RSM or GA-based feature selection were constructed and tested via leave-one-out validation with feature selection and classifier training executed within each iteration. We further tested a two-step approach using a GA ensemble to first assess the relevance of the features, and then using this information to control feature selection during a subsequent RSM step. The base classification was performed using linear discriminant analysis (LDA). The RSM classifier alone achieved a maximum leave-one-out Az of 0.866 (95% confidence interval: 0.794-0.919) at a subset size of s=36 features. The GA ensemble yielded an Az of 0.851 (0.775-0.907). The proposed two-step algorithm produced a maximum Az value of 0.889 (0.823-0.936) when the GA ensemble was used to completely remove less relevant features from the second RSM step, with similar results obtained when the GA-LDA results were used to reduce but not eliminate the occurrence of certain features. After accounting for correlations in the data, the leave-one-out Az in the two-step method was significantly higher than in the RSM and the GA-LDA. We have developed a CADx system for

  11. A comparison study on feature selection of DNA structural properties for promoter prediction

    OpenAIRE

    Gan, Yanglan; Guan, Jihong; Zhou, Shuigeng

    2012-01-01

    Abstract Background Promoter prediction is an integrant step for understanding gene regulation and annotating genomes. Traditional promoter analysis is mainly based on sequence compositional features. Recently, many kinds of structural features have been employed in promoter prediction. However, considering the high-dimensionality and overfitting problems, it is unfeasible to utilize all available features for promoter prediction. Thus it is necessary to choose some appropriate features for t...

  12. Selective attention increases both gain and feature selectivity of the human auditory cortex.

    Directory of Open Access Journals (Sweden)

    Jaakko Kauramäki

    2007-09-01

    Full Text Available An experienced car mechanic can often deduce what's wrong with a car by carefully listening to the sound of the ailing engine, despite the presence of multiple sources of noise. Indeed, the ability to select task-relevant sounds for awareness, whilst ignoring irrelevant ones, constitutes one of the most fundamental of human faculties, but the underlying neural mechanisms have remained elusive. While most of the literature explains the neural basis of selective attention by means of an increase in neural gain, a number of papers propose enhancement in neural selectivity as an alternative or a complementary mechanism.Here, to address the question whether pure gain increase alone can explain auditory selective attention in humans, we quantified the auditory cortex frequency selectivity in 20 healthy subjects by masking 1000-Hz tones by continuous noise masker with parametrically varying frequency notches around the tone frequency (i.e., a notched-noise masker. The task of the subjects was, in different conditions, to selectively attend to either occasionally occurring slight increments in tone frequency (1020 Hz, tones of slightly longer duration, or ignore the sounds. In line with previous studies, in the ignore condition, the global field power (GFP of event-related brain responses at 100 ms from the stimulus onset to the 1000-Hz tones was suppressed as a function of the narrowing of the notch width. During the selective attention conditions, the suppressant effect of the noise notch width on GFP was decreased, but as a function significantly different from a multiplicative one expected on the basis of simple gain model of selective attention.Our results suggest that auditory selective attention in humans cannot be explained by a gain model, where only the neural activity level is increased, but rather that selective attention additionally enhances auditory cortex frequency selectivity.

  13. Moment feature based fast feature extraction algorithm for moving object detection using aerial images.

    Directory of Open Access Journals (Sweden)

    A F M Saifuddin Saif

    Full Text Available Fast and computationally less complex feature extraction for moving object detection using aerial images from unmanned aerial vehicles (UAVs remains as an elusive goal in the field of computer vision research. The types of features used in current studies concerning moving object detection are typically chosen based on improving detection rate rather than on providing fast and computationally less complex feature extraction methods. Because moving object detection using aerial images from UAVs involves motion as seen from a certain altitude, effective and fast feature extraction is a vital issue for optimum detection performance. This research proposes a two-layer bucket approach based on a new feature extraction algorithm referred to as the moment-based feature extraction algorithm (MFEA. Because a moment represents the coherent intensity of pixels and motion estimation is a motion pixel intensity measurement, this research used this relation to develop the proposed algorithm. The experimental results reveal the successful performance of the proposed MFEA algorithm and the proposed methodology.

  14. Moment feature based fast feature extraction algorithm for moving object detection using aerial images.

    Science.gov (United States)

    Saif, A F M Saifuddin; Prabuwono, Anton Satria; Mahayuddin, Zainal Rasyid

    2015-01-01

    Fast and computationally less complex feature extraction for moving object detection using aerial images from unmanned aerial vehicles (UAVs) remains as an elusive goal in the field of computer vision research. The types of features used in current studies concerning moving object detection are typically chosen based on improving detection rate rather than on providing fast and computationally less complex feature extraction methods. Because moving object detection using aerial images from UAVs involves motion as seen from a certain altitude, effective and fast feature extraction is a vital issue for optimum detection performance. This research proposes a two-layer bucket approach based on a new feature extraction algorithm referred to as the moment-based feature extraction algorithm (MFEA). Because a moment represents the coherent intensity of pixels and motion estimation is a motion pixel intensity measurement, this research used this relation to develop the proposed algorithm. The experimental results reveal the successful performance of the proposed MFEA algorithm and the proposed methodology.

  15. Neighbors Based Discriminative Feature Difference Learning for Kinship Verification

    DEFF Research Database (Denmark)

    Duan, Xiaodong; Tan, Zheng-Hua

    2015-01-01

    In this paper, we present a discriminative feature difference learning method for facial image based kinship verification. To transform feature difference of an image pair to be discriminative for kinship verification, a linear transformation matrix for feature difference between an image pair...... databases show that the proposed method combined with a SVM classification method outperforms or is comparable to state-of-the-art kinship verification methods. © Springer International Publishing AG, Part of Springer Science+Business Media...

  16. SAR Target Recognition with Feature Fusion Based on Stacked Autoencoder

    Directory of Open Access Journals (Sweden)

    Kang Miao

    2017-04-01

    Full Text Available A feature fusion algorithm based on a Stacked AutoEncoder (SAE for Synthetic Aperture Rader (SAR imagery is proposed in this paper. Firstly, 25 baseline features and Three-Patch Local Binary Patterns (TPLBP features are extracted. Then, the features are combined in series and fed into the SAE network, which is trained by a greedy layer-wise method. Finally, the softmax classifier is employed to fine tune the SAE network for better fusion performance. Additionally, the Gabor texture features of SAR images are extracted, and the fusion experiments between different features are carried out. The results show that the baseline features and TPLBP features have low redundancy and high complementarity, which makes the fused feature more discriminative. Compared with the SAR target recognition algorithm based on SAE or CNN (Convolutional Neural Network, the proposed method simplifies the network structure and increases the recognition accuracy and efficiency. 10-classes SAR targets based on an MSTAR dataset got a classification accuracy up to 95.88%, which verifies the effectiveness of the presented algorithm.

  17. Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

    Science.gov (United States)

    Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

    2018-03-01

    This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.

  18. Assessment of metal ion concentration in water with structured feature selection.

    Science.gov (United States)

    Naula, Pekka; Airola, Antti; Pihlasalo, Sari; Montoya Perez, Ileana; Salakoski, Tapio; Pahikkala, Tapio

    2017-10-01

    We propose a cost-effective system for the determination of metal ion concentration in water, addressing a central issue in water resources management. The system combines novel luminometric label array technology with a machine learning algorithm that selects a minimal number of array reagents (modulators) and liquid sample dilutions, such that enable accurate quantification. The algorithm is able to identify the optimal modulators and sample dilutions leading to cost reductions since less manual labour and resources are needed. Inferring the ion detector involves a unique type of a structured feature selection problem, which we formalize in this paper. We propose a novel Cartesian greedy forward feature selection algorithm for solving the problem. The novel algorithm was evaluated in the concentration assessment of five metal ions and the performance was compared to two known feature selection approaches. The results demonstrate that the proposed system can assist in lowering the costs with minimal loss in accuracy. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Features of selection of children for occupations by artistic gymnastics in modern Kurdistan

    OpenAIRE

    Abdulvahid Dlshad Nihad

    2015-01-01

    Purpose: to study the organizational and pedagogical conditions of selection of children for occupations existing in the republic Kurdistan artistic gymnastics Material and Methods: questioning of 24 trainers on artistic gymnastics and experts in physical culture of the republic Kurdistan was carried out. The general questions of selection and methodical features of selection of children for occupations by artistic gymnastics in Kurdistan were studied. Results: questioning revealed absence of...

  20. Genetic feature selection to optimally detect P300 in brain computer interfaces.

    Science.gov (United States)

    Atum, Yanina; Gareis, Iván; Gentiletti, Gerardo; Acevedo, Rubén; Rufiner, Leonardo

    2010-01-01

    A Brain Computer Interface is a system that provides an artificial communication between the human brain and the external world. The paradigm based on event related evoked potentials is used in this work. Our main goal was to efficiently solve a binary classification problem: presence or absence of P300 in the registers. Genetic Algorithms and Support Vector Machines were used in a wrapper configuration for feature selection and classification. The original input patterns were provided by two channels (Oz and Fz) of resampled EEG registers and wavelet coefficients. To evaluate the performance of the system, accuracy, sensibility and specificity were calculated. The wrapped wavelet patterns show a better performance than the temporal ones. The results were similar for patterns from channel Oz and Fz, together or separated.

  1. Fast Feature Selection in a GPU Cluster Using the Delta Test

    Directory of Open Access Journals (Sweden)

    Alberto Guillén

    2014-02-01

    Full Text Available Feature or variable selection still remains an unsolved problem, due to the infeasible evaluation of all the solution space. Several algorithms based on heuristics have been proposed so far with successful results. However, these algorithms were not designed for considering very large datasets, making their execution impossible, due to the memory and time limitations. This paper presents an implementation of a genetic algorithm that has been parallelized using the classical island approach, but also considering graphic processing units to speed up the computation of the fitness function. Special attention has been paid to the population evaluation, as well as to the migration operator in the parallel genetic algorithm (GA, which is not usually considered too significant; although, as the experiments will show, it is crucial in order to obtain robust results.

  2. Infrared vehicle recognition using unsupervised feature learning based on K-feature

    Science.gov (United States)

    Lin, Jin; Tan, Yihua; Xia, Haijiao; Tian, Jinwen

    2018-02-01

    Subject to the complex battlefield environment, it is difficult to establish a complete knowledge base in practical application of vehicle recognition algorithms. The infrared vehicle recognition is always difficult and challenging, which plays an important role in remote sensing. In this paper we propose a new unsupervised feature learning method based on K-feature to recognize vehicle in infrared images. First, we use the target detection algorithm which is based on the saliency to detect the initial image. Then, the unsupervised feature learning based on K-feature, which is generated by Kmeans clustering algorithm that extracted features by learning a visual dictionary from a large number of samples without label, is calculated to suppress the false alarm and improve the accuracy. Finally, the vehicle target recognition image is finished by some post-processing. Large numbers of experiments demonstrate that the proposed method has satisfy recognition effectiveness and robustness for vehicle recognition in infrared images under complex backgrounds, and it also improve the reliability of it.

  3. YamiPred: A novel evolutionary method for predicting pre-miRNAs and selecting relevant features

    KAUST Repository

    Kleftogiannis, Dimitrios A.

    2015-01-23

    MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of Support Vector Machines (SVM) with Genetic Algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.

  4. Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods.

    Science.gov (United States)

    Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming

    2014-12-01

    Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  5. Edge and line feature extraction based on covariance models

    NARCIS (Netherlands)

    van der Heijden, Ferdinand

    Image segmentation based on contour extraction usually involves three stages of image operations: feature extraction, edge detection and edge linking. This paper is devoted to the first stage: a method to design feature extractors used to detect edges from noisy and/or blurred images.

  6. Effect of Feature Dimensionality on Object-based Land Cover ...

    African Journals Online (AJOL)

    Geographic object-based image analysis (GEOBIA) allows the easy integration of such additional features into the classification process. This paper compares the performance of three supervised classifiers in a GEOBIA environment as an increasing number of object features are included as classification input.

  7. Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.

    Directory of Open Access Journals (Sweden)

    Qingzhong Liu

    Full Text Available Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort out the datasets. In this paper, we propose a gene selection method called Recursive Feature Addition (RFA, which combines supervised learning and statistical similarity measures. We compare our method with the following gene selection methods: Support Vector Machine Recursive Feature Elimination (SVMRFE, Leave-One-Out Calculation Sequential Forward Selection (LOOCSFS, Gradient based Leave-one-out Gene Selection (GLGS. To evaluate the performance of these gene selection methods, we employ several popular learning classifiers on the MicroArray Quality Control phase II on predictive modeling (MAQC-II breast cancer dataset and the MAQC-II multiple myeloma dataset. Experimental results show that gene selection is strictly paired with learning classifier. Overall, our approach outperforms other compared methods. The biological functional analysis based on the MAQC-II breast cancer dataset convinced us to apply our method for phenotype prediction. Additionally, learning classifiers also play important roles in the classification of microarray data and our experimental results indicate that the Nearest Mean Scale Classifier (NMSC is a good choice due to its prediction reliability and its stability across the three performance measurements: Testing accuracy, MCC values, and

  8. WORD BASED TAMIL SPEECH RECOGNITION USING TEMPORAL FEATURE BASED SEGMENTATION

    Directory of Open Access Journals (Sweden)

    A. Akila

    2015-05-01

    Full Text Available Speech recognition system requires segmentation of speech waveform into fundamental acoustic units. Segmentation is a process of decomposing the speech signal into smaller units. Speech segmentation could be done using wavelet, fuzzy methods, Artificial Neural Networks and Hidden Markov Model. Speech segmentation is a process of breaking continuous stream of sound into some basic units like words, phonemes or syllable that could be recognized. Segmentation could be used to distinguish different types of audio signals from large amount of audio data, often referred as audio classification. The speech segmentation can be divided into two categories based on whether the algorithm uses previous knowledge of data to process the speech. The categories are blind segmentation and aided segmentation.The major issues with the connected speech recognition algorithms were the vocabulary size will be larger with variation in the combination of words in the connected speech and the complexity of the algorithm is more to find the best match for the given test pattern. To overcome these issues, the connected speech has to be segmented into words using the attributes of speech. A methodology using the temporal feature Short Term Energy was proposed and compared with an existing algorithm called Dynamic Thresholding segmentation algorithm which uses spectrogram image of the connected speech for segmentation.

  9. New Hybrid Features Selection Method: A Case Study on Websites Phishing

    Directory of Open Access Journals (Sweden)

    Khairan D. Rajab

    2017-01-01

    Full Text Available Phishing is one of the serious web threats that involves mimicking authenticated websites to deceive users in order to obtain their financial information. Phishing has caused financial damage to the different online stakeholders. It is massive in the magnitude of hundreds of millions; hence it is essential to minimize this risk. Classifying websites into “phishy” and legitimate types is a primary task in data mining that security experts and decision makers are hoping to improve particularly with respect to the detection rate and reliability of the results. One way to ensure the reliability of the results and to enhance performance is to identify a set of related features early on so the data dimensionality reduces and irrelevant features are discarded. To increase reliability of preprocessing, this article proposes a new feature selection method that combines the scores of multiple known methods to minimize discrepancies in feature selection results. The proposed method has been applied to the problem of website phishing classification to show its pros and cons in identifying relevant features. Results against a security dataset reveal that the proposed preprocessing method was able to derive new features datasets which when mined generate high competitive classifiers with reference to detection rate when compared to results obtained from other features selection methods.

  10. A DYNAMIC FEATURE SELECTION METHOD FOR DOCUMENT RANKING WITH RELEVANCE FEEDBACK APPROACH

    Directory of Open Access Journals (Sweden)

    K. Latha

    2010-07-01

    Full Text Available Ranking search results is essential for information retrieval and Web search. Search engines need to not only return highly relevant results, but also be fast to satisfy users. As a result, not all available features can be used for ranking, and in fact only a small percentage of these features can be used. Thus, it is crucial to have a feature selection mechanism that can find a subset of features that both meets latency requirements and achieves high relevance. In this paper we describe a 0/1 knapsack procedure for automatically selecting features to use within Generalization model for Document Ranking. We propose an approach for Relevance Feedback using Expectation Maximization method and evaluate the algorithm on the TREC Collection for describing classes of feedback textual information retrieval features. Experimental results, evaluated on standard TREC-9 part of the OHSUMED collections, show that our feature selection algorithm produces models that are either significantly more effective than, or equally effective as, models such as Markov Random Field model, Correlation Co-efficient and Count Difference method

  11. Comparative Study on Feature Selection and Fusion Schemes for Emotion Recognition from Speech

    Directory of Open Access Journals (Sweden)

    Santiago Planet

    2012-09-01

    Full Text Available The automatic analysis of speech to detect affective states may improve the way users interact with electronic devices. However, the analysis only at the acoustic level could be not enough to determine the emotion of a user in a realistic scenario. In this paper we analyzed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We compared two versions of this greedy selection algorithm by performing the search of the relevant features forwards and backwards. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme. Moreover, the extremely reduced set of acoustic features obtained by the greedy forward search selection algorithm improved the results provided by the full set.

  12. Opinion mining feature-level using Naive Bayes and feature extraction based analysis dependencies

    Science.gov (United States)

    Sanda, Regi; Baizal, Z. K. Abdurahman; Nhita, Fhira

    2015-12-01

    Development of internet and technology, has major impact and providing new business called e-commerce. Many e-commerce sites that provide convenience in transaction, and consumers can also provide reviews or opinions on products that purchased. These opinions can be used by consumers and producers. Consumers to know the advantages and disadvantages of particular feature of the product. Procuders can analyse own strengths and weaknesses as well as it's competitors products. Many opinions need a method that the reader can know the point of whole opinion. The idea emerged from review summarization that summarizes the overall opinion based on sentiment and features contain. In this study, the domain that become the main focus is about the digital camera. This research consisted of four steps 1) giving the knowledge to the system to recognize the semantic orientation of an opinion 2) indentify the features of product 3) indentify whether the opinion gives a positive or negative 4) summarizing the result. In this research discussed the methods such as Naï;ve Bayes for sentiment classification, and feature extraction algorithm based on Dependencies Analysis, which is one of the tools in Natural Language Processing (NLP) and knowledge based dictionary which is useful for handling implicit features. The end result of research is a summary that contains a bunch of reviews from consumers on the features and sentiment. With proposed method, accuration for sentiment classification giving 81.2 % for positive test data, 80.2 % for negative test data, and accuration for feature extraction reach 90.3 %.

  13. Feature Selection and Parameters Optimization of SVM Using Particle Swarm Optimization for Fault Classification in Power Distribution Systems

    OpenAIRE

    Cho, Ming-Yuan; Hoang, Thi Thom

    2017-01-01

    Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary seque...

  14. An Effective Combined Feature For Web Based Image Retrieval

    Directory of Open Access Journals (Sweden)

    H.M.R.B Herath

    2015-08-01

    Full Text Available Abstract Technology advances as well as the emergence of large scale multimedia applications and the revolution of the World Wide Web has changed the world into a digital age. Anybody can use their mobile phone to take a photo at any time anywhere and upload that image to ever growing image databases. Development of effective techniques for visual and multimedia retrieval systems is one of the most challenging and important directions of the future research. This paper proposes an effective combined feature for web based image retrieval. Frequently used colour and texture features are explored in order to develop a combined feature for this purpose. Widely used three colour features Colour moments Colour coherence vector and Colour Correlogram and three texture features Grey Level Co-occurrence matrix Tamura features and Gabor filter were analyzed for their performance. Precision and Recall were used to evaluate the performance of each of these techniques. By comparing precision and recall values the methods that performed best were taken and combined to form a hybrid feature. The developed combined feature was evaluated by developing a web based CBIR system. A web crawler was used to first crawl through Web sites and images found in those sites are downloaded and the combined feature representation technique was used to extract image features. The test results indicated that this web system can be used to index web images with the combined feature representation schema and to find similar images. Random image retrievals using the web system shows that the combined feature can be used to retrieve images belonging to the general image domain. Accuracy of the retrieval can be noted high for natural images like outdoor scenes images of flowers etc. Also images which have a similar colour and texture distribution were retrieved as similar even though the images were belonging to deferent semantic categories. This can be ideal for an artist who wants

  15. A selective overview of feature screening for ultrahigh-dimensional data.

    Science.gov (United States)

    JingYuan, Liu; Wei, Zhong; RunZe, L I

    2015-10-01

    High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of high-dimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data. Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many high-dimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures.

  16. Recursive Cluster Elimination (RCE for classification and feature selection from gene expression data

    Directory of Open Access Journals (Sweden)

    Showe Louise C

    2007-05-01

    Full Text Available Abstract Background Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE rather than recursive feature elimination (RFE. We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. Results We have developed a novel method for selecting significant genes in comparative gene expression studies. This method, which we refer to as SVM-RCE, combines K-means, a clustering method, to identify correlated gene clusters, and Support Vector Machines (SVMs, a supervised machine learning classification method, to identify and score (rank those gene clusters for the purpose of classification. K-means is used initially to group genes into clusters. Recursive cluster elimination (RCE is then applied to iteratively remove those clusters of genes that contribute the least to the classification performance. SVM-RCE identifies the clusters of correlated genes that are most significantly differentially expressed between the sample classes. Utilization of gene clusters, rather than individual genes, enhances the supervised classification accuracy of the same data as compared to the accuracy when either SVM or Penalized Discriminant Analysis (PDA with recursive feature elimination (SVM-RFE and PDA-RFE are used to remove genes based on their individual discriminant weights. Conclusion SVM-RCE provides improved classification accuracy with complex microarray data sets when it is compared to the classification accuracy of the same datasets using either SVM-RFE or PDA-RFE. SVM-RCE identifies clusters of correlated genes that when considered together

  17. ARABIC TEXT CLASSIFICATION USING NEW STEMMER FOR FEATURE SELECTION AND DECISION TREES

    Directory of Open Access Journals (Sweden)

    SAID BAHASSINE

    2017-06-01

    Full Text Available Text classification is the process of assignment of unclassified text to appropriate classes based on their content. The most prevalent representation for text classification is the bag of words vector. In this representation, the words that appear in documents often have multiple morphological structures, grammatical forms. In most cases, this morphological variant of words belongs to the same category. In the first part of this paper, anew stemming algorithm was developed in which each term of a given document is represented by its root. In the second part, a comparative study is conducted of the impact of two stemming algorithms namely Khoja’s stemmer and our new stemmer (referred to hereafter by origin-stemmer on Arabic text classification. This investigation was carried out using chi-square as a feature of selection to reduce the dimensionality of the feature space and decision tree classifier. In order to evaluate the performance of the classifier, this study used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, Middle East, switch and world on WEKA toolkit. The recall, f-measure and precision measures are used to compare the performance of the obtained models. The experimental results show that text classification using rout stemmer outperforms classification using Khoja’s stemmer. The f-measure was 92.9% in sport category and 89.1% in business category.

  18. Predicting bacteriophage proteins located in host cell with feature selection technique.

    Science.gov (United States)

    Ding, Hui; Liang, Zhi-Yong; Guo, Feng-Biao; Huang, Jian; Chen, Wei; Lin, Hao

    2016-04-01

    A bacteriophage is a virus that can infect a bacterium. The fate of an infected bacterium is determined by the bacteriophage proteins located in the host cell. Thus, reliably identifying bacteriophage proteins located in the host cell is extremely important to understand their functions and discover potential anti-bacterial drugs. Thus, in this paper, a computational method was developed to recognize bacteriophage proteins located in host cells based only on their amino acid sequences. The analysis of variance (ANOVA) combined with incremental feature selection (IFS) was proposed to optimize the feature set. Using a jackknife cross-validation, our method can discriminate between bacteriophage proteins located in a host cell and the bacteriophage proteins not located in a host cell with a maximum overall accuracy of 84.2%, and can further classify bacteriophage proteins located in host cell cytoplasm and in host cell membranes with a maximum overall accuracy of 92.4%. To enhance the value of the practical applications of the method, we built a web server called PHPred (〈http://lin.uestc.edu.cn/server/PHPred〉). We believe that the PHPred will become a powerful tool to study bacteriophage proteins located in host cells and to guide related drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Feature-based Ontology Mapping from an Information Receivers’ Viewpoint

    DEFF Research Database (Denmark)

    Glückstad, Fumiko Kano; Mørup, Morten

    2012-01-01

    This paper compares four algorithms for computing feature-based similarities between concepts respectively possessing a distinctive set of features. The eventual purpose of comparing these feature-based similarity algorithms is to identify a candidate term in a Target Language (TL) that can...... optimally convey the original meaning of a culturally-specific Source Language (SL) concept to a TL audience by aligning two culturally-dependent domain-specific ontologies. The results indicate that the Bayesian Model of Generalization [1] performs best, not only for identifying candidate translation terms...

  20. A Multiple Core Execution for Multiobjective Binary Particle Swarm Optimization Feature Selection Method with the Kernel P System Framework

    Directory of Open Access Journals (Sweden)

    Naeimeh Elkhani

    2017-01-01

    Full Text Available Membrane computing is a theoretical model of computation inspired by the structure and functioning of cells. Membrane computing models naturally have parallel structure, and this fact is generally for all variants of membrane computing like kernel P system. Most of the simulations of membrane computing have been done in a serial way on a machine with a central processing unit (CPU. This has neglected the advantage of parallelism in membrane computing. This paper uses multiple cores processing tools in MATLAB as a parallel tool to implement proposed feature selection method based on kernel P system-multiobjective binary particle swarm optimization to identify marker genes for cancer classification. Through this implementation, the proposed feature selection model will involve all the features of a P system including communication rule, division rule, parallelism, and nondeterminism.

  1. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System.

    Science.gov (United States)

    Partila, Pavol; Voznak, Miroslav; Tovarek, Jaromir

    2015-01-01

    The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  2. Pattern Recognition Methods and Features Selection for Speech Emotion Recognition System

    Directory of Open Access Journals (Sweden)

    Pavol Partila

    2015-01-01

    Full Text Available The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.

  3. Pattern Recognition by Dinamic Feature Analysis Based on PCA

    Directory of Open Access Journals (Sweden)

    Juliana Valencia-Aguirre

    2009-06-01

    Full Text Available Usually, in pattern recognition problems we represent the observations by mean of measures on appropriate variables of data set, these measures can be categorized as Static and Dynamic Features. Static features are not always an accurate representation of data. In these sense, many phenomena are better modeled by dynamic changes on their measures. The advantage of using an extended form (dynamic features is the inclusion of new information that allows us to get a better representation of the object. Nevertheless, sometimes it is difficult in a classification stage to deal with dynamic features, because the associated computational cost often can be higher than we deal with static features. For analyzing such representations, we use Principal Component Analysis (PCA, arranging dynamic data in such a way we can consider variations related to the intrinsic dynamic of observations. Therefore, the method made possible to evaluate the dynamic information about of the observations on a lower dimensionality feature space without decreasing the accuracy performance. Algorithms were tested on real data to classify pathological speech from normal voices, and using PCA for dynamic feature selection, as well.

  4. Functional connectivity supporting the selective maintenance of feature-location binding in visual working memory

    Directory of Open Access Journals (Sweden)

    Sachiko eTakahama

    2014-06-01

    Full Text Available Information on an object’s features bound to its location is very important for maintaining object representations in visual working memory. Interactions with dynamic multi-dimensional objects in an external environment require complex cognitive control, including the selective maintenance of feature-location binding. Here, we used event-related functional magnetic resonance imaging to investigate brain activity and functional connectivity related to the maintenance of complex feature-location binding. Participants were required to detect task-relevant changes in feature-location binding between objects defined by color, orientation, and location. We compared a complex binding task requiring complex feature-location binding (color-orientation-location with a simple binding task in which simple feature-location binding, such as color-location, was task-relevant and the other feature was task-irrelevant. Univariate analyses showed that the dorsolateral prefrontal cortex (DLPFC, hippocampus, and frontoparietal network were activated during the maintenance of complex feature-location binding. Functional connectivity analyses indicated cooperation between the inferior precentral sulcus (infPreCS, DLPFC, and hippocampus during the maintenance of complex feature-location binding. In contrast, the connectivity for the spatial updating of simple feature-location binding determined by reanalyzing the data from Takahama et al. (2010 demonstrated that the superior parietal lobule (SPL cooperated with the DLPFC and hippocampus. These results suggest that the connectivity for complex feature-location binding does not simply reflect general memory load and that the DLPFC and hippocampus flexibly modulate the dorsal frontoparietal network, depending on the task requirements, with the infPreCS involved in the maintenance of complex feature-location binding and the SPL involved in the spatial updating of simple feature-location binding.

  5. Automatic Feature Selection and Improved Classification in SICADA Counterfeit Electronics Detection

    Science.gov (United States)

    2017-03-20

    Electronic Parts Can Be Found on Internet Purchasing Platforms. GAO-12-375. (2012). 3. Cobb, W., Lapse, E., Baldwin, R., Temple, M., and Kim, Y . Intrinsic...This analysis approach is a machine learning process that involves four main parts: data collection, feature generation , feature selection, and...this variation maximization occurs indiscriminately, and can separate data within the same class. Further, projection techniques in general

  6. A Transform-Based Feature Extraction Approach for Motor Imagery Tasks Classification.

    Science.gov (United States)

    Baali, Hamza; Khorshidtalab, Aida; Mesbah, Mostefa; Salami, Momoh J E

    2015-01-01

    In this paper, we present a new motor imagery classification method in the context of electroencephalography (EEG)-based brain-computer interface (BCI). This method uses a signal-dependent orthogonal transform, referred to as linear prediction singular value decomposition (LP-SVD), for feature extraction. The transform defines the mapping as the left singular vectors of the LP coefficient filter impulse response matrix. Using a logistic tree-based model classifier; the extracted features are classified into one of four motor imagery movements. The proposed approach was first benchmarked against two related state-of-the-art feature extraction approaches, namely, discrete cosine transform (DCT) and adaptive autoregressive (AAR)-based methods. By achieving an accuracy of 67.35%, the LP-SVD approach outperformed the other approaches by large margins (25% compared with DCT and 6 % compared with AAR-based methods). To further improve the discriminatory capability of the extracted features and reduce the computational complexity, we enlarged the extracted feature subset by incorporating two extra features, namely, Q- and the Hotelling's [Formula: see text] statistics of the transformed EEG and introduced a new EEG channel selection method. The performance of the EEG classification based on the expanded feature set and channel selection method was compared with that of a number of the state-of-the-art classification methods previously reported with the BCI IIIa competition data set. Our method came second with an average accuracy of 81.38%.

  7. Feature Selection Software to Improve Accuracy and Reduce Cost in Automated Recognition Systems

    Czech Academy of Sciences Publication Activity Database

    Somol, Petr

    2011-01-01

    Roč. 2011, č. 84 (2011), s. 54-54 ISSN 0926-4981 R&D Projects: GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : feature selection * software library * machine learning Subject RIV: BD - Theory of Information http:// library .utia.cas.cz/separaty/2011/RO/somol-feature selection software to improve accuracy and reduce cost in automated recognition systems.pdf

  8. Understanding the selection of core head design features to match precisely challenging well applications

    Energy Technology Data Exchange (ETDEWEB)

    Zambrana, Roberto; Sousa, J. Tadeu V. de; Antunes, Ricardo [Halliburton Servicos Ltda., Rio de Janeiro, RJ (Brazil)

    2008-07-01

    Reliable rock mechanical information is very important for optimum reservoir development. This information can help specialists to accurately estimate reserves, reservoir compaction, sand production, stress field orientation, etc. In all cases, the solutions to problems involving rock mechanics lead to significant cost savings. Consequently, it is important that the decisions be based on the most accurate information possible. For the describing rock mechanics, cores represent the major source of data and therefore should be of good quality. However, there are several well conditions that cause coring and core recovery to be difficult, for example: unconsolidated formations; laminated and fractured rocks; critical mud losses, etc. The problem becomes even worse in high-inclination wells with long horizontal sections. In such situations, the optimum selections of core heads become critical. This paper will discuss the most important design features that enable core heads to be matched precisely to various challenging applications. Cases histories will be used to illustrate the superior performance of selected core heads. They include coring in horizontal wells and in harsh well conditions with critical mud losses. (author)

  9. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis.

    Science.gov (United States)

    Al-Rajab, Murad; Lu, Joan; Xu, Qiang

    2017-07-01

    This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Research on Forest Flame Recognition Algorithm Based on Image Feature

    Science.gov (United States)

    Wang, Z.; Liu, P.; Cui, T.

    2017-09-01

    In recent years, fire recognition based on image features has become a hotspot in fire monitoring. However, due to the complexity of forest environment, the accuracy of forest fireworks recognition based on image features is low. Based on this, this paper proposes a feature extraction algorithm based on YCrCb color space and K-means clustering. Firstly, the paper prepares and analyzes the color characteristics of a large number of forest fire image samples. Using the K-means clustering algorithm, the forest flame model is obtained by comparing the two commonly used color spaces, and the suspected flame area is discriminated and extracted. The experimental results show that the extraction accuracy of flame area based on YCrCb color model is higher than that of HSI color model, which can be applied in different scene forest fire identification, and it is feasible in practice.

  11. RESEARCH ON FOREST FLAME RECOGNITION ALGORITHM BASED ON IMAGE FEATURE

    Directory of Open Access Journals (Sweden)

    Z. Wang

    2017-09-01

    Full Text Available In recent years, fire recognition based on image features has become a hotspot in fire monitoring. However, due to the complexity of forest environment, the accuracy of forest fireworks recognition based on image features is low. Based on this, this paper proposes a feature extraction algorithm based on YCrCb color space and K-means clustering. Firstly, the paper prepares and analyzes the color characteristics of a large number of forest fire image samples. Using the K-means clustering algorithm, the forest flame model is obtained by comparing the two commonly used color spaces, and the suspected flame area is discriminated and extracted. The experimental results show that the extraction accuracy of flame area based on YCrCb color model is higher than that of HSI color model, which can be applied in different scene forest fire identification, and it is feasible in practice.

  12. Collaborative Filtering Fusing Label Features Based on SDAE

    DEFF Research Database (Denmark)

    Huo, Huan; Liu, Xiufeng; Zheng, Deyuan

    2017-01-01

    problem, auxiliary information such as labels are utilized. Another approach of recommendation system is content-based model which can’t be directly integrated with CF-based model due to its inherent characteristics. Considering that deep learning algorithms are capable of extracting deep latent features......Collaborative filtering (CF) is successfully applied to recommendation system by digging the latent features of users and items. However, conventional CF-based models usually suffer from the sparsity of rating matrices which would degrade model’s recommendation performance. To address this sparsity......, this paper applies Stack Denoising Auto Encoder (SDAE) to content-based model and proposes LCF(Deep Learning for Collaborative Filtering) algorithm by combing CF-based model which fuses label features. Experiments on real-world data sets show that DLCF can largely overcome the sparsity problem...

  13. SVM-based glioma grading. Optimization by feature reduction analysis

    International Nuclear Information System (INIS)

    Zoellner, Frank G.; Schad, Lothar R.; Emblem, Kyrre E.; Harvard Medical School, Boston, MA; Oslo Univ. Hospital

    2012-01-01

    We investigated the predictive power of feature reduction analysis approaches in support vector machine (SVM)-based classification of glioma grade. In 101 untreated glioma patients, three analytic approaches were evaluated to derive an optimal reduction in features; (i) Pearson's correlation coefficients (PCC), (ii) principal component analysis (PCA) and (iii) independent component analysis (ICA). Tumor grading was performed using a previously reported SVM approach including whole-tumor cerebral blood volume (CBV) histograms and patient age. Best classification accuracy was found using PCA at 85% (sensitivity = 89%, specificity = 84%) when reducing the feature vector from 101 (100-bins rCBV histogram + age) to 3 principal components. In comparison, classification accuracy by PCC was 82% (89%, 77%, 2 dimensions) and 79% by ICA (87%, 75%, 9 dimensions). For improved speed (up to 30%) and simplicity, feature reduction by all three methods provided similar classification accuracy to literature values (∝87%) while reducing the number of features by up to 98%. (orig.)

  14. Feature-based tolerancing for intelligent inspection process definition

    International Nuclear Information System (INIS)

    Brown, C.W.

    1993-07-01

    This paper describes a feature-based tolerancing capability that complements a geometric solid model with an explicit representation of conventional and geometric tolerances. This capability is focused on supporting an intelligent inspection process definition system. The feature-based tolerance model's benefits include advancing complete product definition initiatives (e.g., STEP -- Standard for Exchange of Product model dam), suppling computer-integrated manufacturing applications (e.g., generative process planning and automated part programming) with product definition information, and assisting in the solution of measurement performance issues. A feature-based tolerance information model was developed based upon the notion of a feature's toleranceable aspects and describes an object-oriented scheme for representing and relating tolerance features, tolerances, and datum reference frames. For easy incorporation, the tolerance feature entities are interconnected with STEP solid model entities. This schema will explicitly represent the tolerance specification for mechanical products, support advanced dimensional measurement applications, and assist in tolerance-related methods divergence issues

  15. Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

    Science.gov (United States)

    2013-01-01

    Background One of the main topics in the development of quantitative structure-property relationship (QSPR) predictive models is the identification of the subset of variables that represent the structure of a molecule and which are predictors for a given property. There are several automated feature selection methods, ranging from backward, forward or stepwise procedures, to further elaborated methodologies such as evolutionary programming. The problem lies in selecting the minimum subset of descriptors that can predict a certain property with a good performance, computationally efficient and in a more robust way, since the presence of irrelevant or redundant features can cause poor generalization capacity. In this paper an alternative selection method, based on Random Forests to determine the variable importance is proposed in the context of QSPR regression problems, with an application to a manually curated dataset for predicting standard enthalpy of formation. The subsequent predictive models are trained with support vector machines introducing the variables sequentially from a ranked list based on the variable importance. Results The model generalizes well even with a high dimensional dataset and in the presence of highly correlated variables. The feature selection step was shown to yield lower prediction errors with RMSE values 23% lower than without feature selection, albeit using only 6% of the total number of variables (89 from the original 1485). The proposed approach further compared favourably with other feature selection methods and dimension reduction of the feature space. The predictive model was selected using a 10-fold cross validation procedure and, after selection, it was validated with an independent set to assess its performance when applied to new data and the results were similar to the ones obtained for the training set, supporting the robustness of the proposed approach. Conclusions The proposed methodology seemingly improves the prediction

  16. How can selection of biologically inspired features improve the performance of a robust object recognition model?

    Directory of Open Access Journals (Sweden)

    Masoud Ghodrati

    Full Text Available Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

  17. 3-D FEATURE-BASED MATCHING BY RSTG APPROACH

    Directory of Open Access Journals (Sweden)

    J.-J. Jaw

    2012-07-01

    Full Text Available 3-D feature matching is the essential kernel in a fully automated feature-based LiDAR point cloud registration. After feasible procedures of feature acquisition, connecting corresponding features in different data frames is imperative to be solved. The objective addressed in this paper is developing an approach coined RSTG to retrieve corresponding counterparts of unsorted multiple 3-D features extracted from sets of LiDAR point clouds. RSTG stands for the four major processes, "Rotation alignment"; "Scale estimation"; "Translation alignment" and "Geometric check," strategically formulated towards finding out matching solution with high efficiency and leading to accomplishing the 3-D similarity transformation among all sets. The workable types of features to RSTG comprise points, lines, planes and clustered point groups. Each type of features can be employed exclusively or combined with others, if sufficiently supplied, throughout the matching scheme. The paper gives a detailed description of the matching methodology and discusses on the matching effects based on the statistical assessment which revealed that the RSTG approach reached an average matching rate of success up to 93% with around 6.6% of statistical type 1 error. Notably, statistical type 2 error, the critical indicator of matching reliability, was kept 0% throughout all the experiments.

  18. Adaptive Feature Based Control of Compact Disk Players

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Stoustrup, Jakob; Vidal, Enrique Sanchez

    2005-01-01

    -players playing CDs with surface fault is derived and described. This feature based control scheme uses precomputed base to remove the surface fault influence from the position measurements. In this paper an adaptive version of the feature based control scheme is proposed and described. This adaptive scheme can...... result that the adaptive scheme clearly adapts better to the given faults compared with the non-adaptive version of the feature based control scheme.......Many have experienced the problem that their Compact Disc players have difficulties playing Compact Discs with surface faults like scratches and fingerprints. The cause of this is due to the two servo control loops used to keep the Optical Pick-up Unit focused and radially on the information track...

  19. A hybrid feature selection approach for the early diagnosis of Alzheimer’s disease

    Science.gov (United States)

    Gallego-Jutglà, Esteve; Solé-Casals, Jordi; Vialatte, François-Benoît; Elgendi, Mohamed; Cichocki, Andrzej; Dauwels, Justin

    2015-02-01

    Objective. Recently, significant advances have been made in the early diagnosis of Alzheimer’s disease (AD) from electroencephalography (EEG). However, choosing suitable measures is a challenging task. Among other measures, frequency relative power (RP) and loss of complexity have been used with promising results. In the present study we investigate the early diagnosis of AD using synchrony measures and frequency RP on EEG signals, examining the changes found in different frequency ranges. Approach. We first explore the use of a single feature for computing the classification rate (CR), looking for the best frequency range. Then, we present a multiple feature classification system that outperforms all previous results using a feature selection strategy. These two approaches are tested in two different databases, one containing mild cognitive impairment (MCI) and healthy subjects (patients age: 71.9 ± 10.2, healthy subjects age: 71.7 ± 8.3), and the other containing Mild AD and healthy subjects (patients age: 77.6 ± 10.0 healthy subjects age: 69.4 ± 11.5). Main results. Using a single feature to compute CRs we achieve a performance of 78.33% for the MCI data set and of 97.56% for Mild AD. Results are clearly improved using the multiple feature classification, where a CR of 95% is found for the MCI data set using 11 features, and 100% for the Mild AD data set using four features. Significance. The new features selection method described in this work may be a reliable tool that could help to design a realistic system that does not require prior knowledge of a patient's status. With that aim, we explore the standardization of features for MCI and Mild AD data sets with promising results.

  20. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres—Focus on Feature Selection

    Science.gov (United States)

    Zawbaa, Hossam M.; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven. PMID:27315205

  1. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    Directory of Open Access Journals (Sweden)

    Hossam M Zawbaa

    Full Text Available Poly-lactide-co-glycolide (PLGA is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP, multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR. The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  2. Computational Intelligence Modeling of the Macromolecules Release from PLGA Microspheres-Focus on Feature Selection.

    Science.gov (United States)

    Zawbaa, Hossam M; Szlȩk, Jakub; Grosan, Crina; Jachowicz, Renata; Mendyk, Aleksander

    2016-01-01

    Poly-lactide-co-glycolide (PLGA) is a copolymer of lactic and glycolic acid. Drug release from PLGA microspheres depends not only on polymer properties but also on drug type, particle size, morphology of microspheres, release conditions, etc. Selecting a subset of relevant properties for PLGA is a challenging machine learning task as there are over three hundred features to consider. In this work, we formulate the selection of critical attributes for PLGA as a multiobjective optimization problem with the aim of minimizing the error of predicting the dissolution profile while reducing the number of attributes selected. Four bio-inspired optimization algorithms: antlion optimization, binary version of antlion optimization, grey wolf optimization, and social spider optimization are used to select the optimal feature set for predicting the dissolution profile of PLGA. Besides these, LASSO algorithm is also used for comparisons. Selection of crucial variables is performed under the assumption that both predictability and model simplicity are of equal importance to the final result. During the feature selection process, a set of input variables is employed to find minimum generalization error across different predictive models and their settings/architectures. The methodology is evaluated using predictive modeling for which various tools are chosen, such as Cubist, random forests, artificial neural networks (monotonic MLP, deep learning MLP), multivariate adaptive regression splines, classification and regression tree, and hybrid systems of fuzzy logic and evolutionary computations (fugeR). The experimental results are compared with the results reported by Szlȩk. We obtain a normalized root mean square error (NRMSE) of 15.97% versus 15.4%, and the number of selected input features is smaller, nine versus eleven.

  3. Privacy preserving data publishing of categorical data through k-anonymity and feature selection.

    Science.gov (United States)

    Aristodimou, Aristos; Antoniades, Athos; Pattichis, Constantinos S

    2016-03-01

    In healthcare, there is a vast amount of patients' data, which can lead to important discoveries if combined. Due to legal and ethical issues, such data cannot be shared and hence such information is underused. A new area of research has emerged, called privacy preserving data publishing (PPDP), which aims in sharing data in a way that privacy is preserved while the information lost is kept at a minimum. In this Letter, a new anonymisation algorithm for PPDP is proposed, which is based on k-anonymity through pattern-based multidimensional suppression (kPB-MS). The algorithm uses feature selection for reducing the data dimensionality and then combines attribute and record suppression for obtaining k-anonymity. Five datasets from different areas of life sciences [RETINOPATHY, Single Proton Emission Computed Tomography imaging, gene sequencing and drug discovery (two datasets)], were anonymised with kPB-MS. The produced anonymised datasets were evaluated using four different classifiers and in 74% of the test cases, they produced similar or better accuracies than using the full datasets.

  4. Robust Facial Feature Tracking Using Shape-Constrained Multiresolution-Selected Linear Predictors.

    Science.gov (United States)

    Ong, Eng-Jon; Bowden, Richard

    2011-09-01

    This paper proposes a learned data-driven approach for accurate, real-time tracking of facial features using only intensity information. The task of automatic facial feature tracking is nontrivial since the face is a highly deformable object with large textural variations and motion in certain regions. Existing works attempt to address these problems by either limiting themselves to tracking feature points with strong and unique visual cues (e.g., mouth and eye corners) or by incorporating a priori information that needs to be manually designed (e.g., selecting points for a shape model). The framework proposed here largely avoids the need for such restrictions by automatically identifying the optimal visual support required for tracking a single facial feature point. This automatic identification of the visual context required for tracking allows the proposed method to potentially track any point on the face. Tracking is achieved via linear predictors which provide a fast and effective method for mapping pixel intensities into tracked feature position displacements. Building upon the simplicity and strengths of linear predictors, a more robust biased linear predictor is introduced. Multiple linear predictors are then grouped into a rigid flock to further increase robustness. To improve tracking accuracy, a novel probabilistic selection method is used to identify relevant visual areas for tracking a feature point. These selected flocks are then combined into a hierarchical multiresolution LP model. Finally, we also exploit a simple shape constraint for correcting the occasional tracking failure of a minority of feature points. Experimental results show that this method performs more robustly and accurately than AAMs, with minimal training examples on example sequences that range from SD quality to Youtube quality. Additionally, an analysis of the visual support consistency across different subjects is also provided.

  5. Feature selection for speech emotion recognition in Spanish and Basque: on the use of machine learning to improve human-computer interaction.

    Science.gov (United States)

    Arruti, Andoni; Cearreta, Idoia; Alvarez, Aitor; Lazkano, Elena; Sierra, Basilio

    2014-01-01

    Study of emotions in human-computer interaction is a growing research area. This paper shows an attempt to select the most significant features for emotion recognition in spoken Basque and Spanish Languages using different methods for feature selection. RekEmozio database was used as the experimental data set. Several Machine Learning paradigms were used for the emotion classification task. Experiments were executed in three phases, using different sets of features as classification variables in each phase. Moreover, feature subset selection was applied at each phase in order to seek for the most relevant feature subset. The three phases approach was selected to check the validity of the proposed approach. Achieved results show that an instance-based learning algorithm using feature subset selection techniques based on evolutionary algorithms is the best Machine Learning paradigm in automatic emotion recognition, with all different feature sets, obtaining a mean of 80,05% emotion recognition rate in Basque and a 74,82% in Spanish. In order to check the goodness of the proposed process, a greedy searching approach (FSS-Forward) has been applied and a comparison between them is provided. Based on achieved results, a set of most relevant non-speaker dependent features is proposed for both languages and new perspectives are suggested.

  6. Feature selection and definition for contours classification of thermograms in breast cancer detection

    Science.gov (United States)

    Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał; Oleszkiewicz, Witold; Cichosz, Paweł

    2016-09-01

    This contribution introduces the method of cancer pathologies detection on breast skin temperature distribution images. The use of thermosensitive foils applied to the breasts skin allows to create thermograms, which displays the amount of infrared energy emitted by all breast cells. The significant foci of hyperthermia or inflammation are typical for cancer cells. That foci can be recognized on thermograms as a contours, which are the areas of higher temperature. Every contour can be converted to a feature set that describe it, using the raw, central, Hu, outline, Fourier and colour moments of image pixels processing. This paper defines also the new way of describing a set of contours through theirs neighbourhood relations. Contribution introduces moreover the way of ranking and selecting most relevant features. Authors used Neural Network with Gevrey`s concept and recursive feature elimination, to estimate feature importance.

  7. Feature-based attentional modulation of orientation perception in somatosensation

    Directory of Open Access Journals (Sweden)

    Meike Annika Schweisfurth

    2014-07-01

    Full Text Available In a reaction time study of human tactile orientation detection the effects of spatial attention and feature-based attention were investigated. Subjects had to give speeded responses to target orientations (parallel and orthogonal to the finger axis in a random stream of oblique tactile distractor orientations presented to their index and ring fingers. Before each block of trials, subjects received a tactile cue at one finger. By manipulating the validity of this cue with respect to its location and orientation (feature, we provided an incentive to subjects to attend spatially to the cued location and only there to the cued orientation. Subjects showed quicker responses to parallel compared to orthogonal targets, pointing to an orientation anisotropy in sensory processing. Also, faster reaction times were observed in location-matched trials, i.e. when targets appeared on the cued finger, representing a perceptual benefit of spatial attention. Most importantly, reaction times were shorter to orientations matching the cue, both at the cued and at the uncued location, documenting a global enhancement of tactile sensation by feature-based attention. This is the first report of a perceptual benefit of feature-based attention outside the spatial focus of attention in somatosensory perception. The similarity to effects of feature-based attention in visual perception supports the notion of matching attentional mechanisms across sensory domains.

  8. A biometric identification system based on eigenpalm and eigenfinger features.

    Science.gov (United States)

    Ribaric, Slobodan; Fratric, Ivan

    2005-11-01

    This paper presents a multimodal biometric identification system based on the features of the human hand. We describe a new biometric approach to personal identification using eigenfinger and eigenpalm features, with fusion applied at the matching-score level. The identification process can be divided into the following phases: capturing the image; preprocessing; extracting and normalizing the palm and strip-like finger subimages; extracting the eigenpalm and eigenfinger features based on the K-L transform; matching and fusion; and, finally, a decision based on the (k, l)-NN classifier and thresholding. The system was tested on a database of 237 people (1,820 hand images). The experimental results showed the effectiveness of the system in terms of the recognition rate (100 percent), the equal error rate (EER = 0.58 percent), and the total error rate (TER = 0.72 percent).

  9. Feature-based component model for design of embedded systems

    Science.gov (United States)

    Zha, Xuan Fang; Sriram, Ram D.

    2004-11-01

    An embedded system is a hybrid of hardware and software, which combines software's flexibility and hardware real-time performance. Embedded systems can be considered as assemblies of hardware and software components. An Open Embedded System Model (OESM) is currently being developed at NIST to provide a standard representation and exchange protocol for embedded systems and system-level design, simulation, and testing information. This paper proposes an approach to representing an embedded system feature-based model in OESM, i.e., Open Embedded System Feature Model (OESFM), addressing models of embedded system artifacts, embedded system components, embedded system features, and embedded system configuration/assembly. The approach provides an object-oriented UML (Unified Modeling Language) representation for the embedded system feature model and defines an extension to the NIST Core Product Model. The model provides a feature-based component framework allowing the designer to develop a virtual embedded system prototype through assembling virtual components. The framework not only provides a formal precise model of the embedded system prototype but also offers the possibility of designing variation of prototypes whose members are derived by changing certain virtual components with different features. A case study example is discussed to illustrate the embedded system model.

  10. A flowchart for selecting an ointment base.

    Science.gov (United States)

    Conway, Jeannine M; Brown, Michael C

    2014-02-12

    OBJECTIVES. To improve students' skills in selecting appropriate ointment bases through the development and implementation of a flowchart. A flowchart was designed to help students select the appropriate base for an ointment. Students used the flowchart throughout the semester in both dry and wet laboratory activities. At the end of the semester, students completed a dry laboratory practical that required them to select an appropriate ointment base and levigating agent. Student performance data from the year prior to implementation was compared to data for 2 years after implementation. Calculation, procedure, and labeling errors also were compared. Prior to implementation of the flowchart, 51 of 101 students selected the correct base. After implementation, 169 of 212 students selected the correct base (pflowchart to select an ointment base improved student performance when used in the context of a dry laboratory assignment.

  11. Unsupervised Feature Learning With Winner-Takes-All Based STDP

    Directory of Open Access Journals (Sweden)

    Paul Ferré

    2018-04-01

    Full Text Available We present a novel strategy for unsupervised feature learning in image applications inspired by the Spike-Timing-Dependent-Plasticity (STDP biological learning rule. We show equivalence between rank order coding Leaky-Integrate-and-Fire neurons and ReLU artificial neurons when applied to non-temporal data. We apply this to images using rank-order coding, which allows us to perform a full network simulation with a single feed-forward pass using GPU hardware. Next we introduce a binary STDP learning rule compatible with training on batches of images. Two mechanisms to stabilize the training are also presented : a Winner-Takes-All (WTA framework which selects the most relevant patches to learn from along the spatial dimensions, and a simple feature-wise normalization as homeostatic process. This learning process allows us to train multi-layer architectures of convolutional sparse features. We apply our method to extract features from the MNIST, ETH80, CIFAR-10, and STL-10 datasets and show that these features are relevant for classification. We finally compare these results with several other state of the art unsupervised learning methods.

  12. Automatic seamless image mosaic method based on SIFT features

    Science.gov (United States)

    Liu, Meiying; Wen, Desheng

    2017-02-01

    An automatic seamless image mosaic method based on SIFT features is proposed. First a scale-invariant feature extracting algorithm SIFT is used for feature extraction and matching, which gains sub-pixel precision for features extraction. Then, the transforming matrix H is computed with improved PROSAC algorithm , compared with RANSAC algorithm, the calculate efficiency is advanced, and the number of the inliers are more. Then the transforming matrix H is purify with LM algorithm. And finally image mosaic is completed with smoothing algorithm. The method implements automatically and avoids the disadvantages of traditional image mosaic method under different scale and illumination conditions. Experimental results show the image mosaic effect is wonderful and the algorithm is stable very much. It is high valuable in practice.

  13. Feature extraction for deep neural networks based on decision boundaries

    Science.gov (United States)

    Woo, Seongyoun; Lee, Chulhee

    2017-05-01

    Feature extraction is a process used to reduce data dimensions using various transforms while preserving the discriminant characteristics of the original data. Feature extraction has been an important issue in pattern recognition since it can reduce the computational complexity and provide a simplified classifier. In particular, linear feature extraction has been widely used. This method applies a linear transform to the original data to reduce the data dimensions. The decision boundary feature extraction method (DBFE) retains only informative directions for discriminating among the classes. DBFE has been applied to various parametric and non-parametric classifiers, which include the Gaussian maximum likelihood classifier (GML), the k-nearest neighbor classifier, support vector machines (SVM) and neural networks. In this paper, we apply DBFE to deep neural networks. This algorithm is based on the nonparametric version of DBFE, which was developed for neural networks. Experimental results with the UCI database show improved classification accuracy with reduced dimensionality.

  14. Eye movement identification based on accumulated time feature

    Science.gov (United States)

    Guo, Baobao; Wu, Qiang; Sun, Jiande; Yan, Hua

    2017-06-01

    Eye movement is a new kind of feature for biometrical recognition, it has many advantages compared with other features such as fingerprint, face, and iris. It is not only a sort of static characteristics, but also a combination of brain activity and muscle behavior, which makes it effective to prevent spoofing attack. In addition, eye movements can be incorporated with faces, iris and other features recorded from the face region into multimode systems. In this paper, we do an exploring study on eye movement identification based on the eye movement datasets provided by Komogortsev et al. in 2011 with different classification methods. The time of saccade and fixation are extracted from the eye movement data as the eye movement features. Furthermore, the performance analysis was conducted on different classification methods such as the BP, RBF, ELMAN and SVM in order to provide a reference to the future research in this field.

  15. Missile placement analysis based on improved SURF feature matching algorithm

    Science.gov (United States)

    Yang, Kaida; Zhao, Wenjie; Li, Dejun; Gong, Xiran; Sheng, Qian

    2015-03-01

    The precious battle damage assessment by use of video images to analysis missile placement is a new study area. The article proposed an improved speeded up robust features algorithm named restricted speeded up robust features, which combined the combat application of TV-command-guided missiles and the characteristics of video image. Its restrictions mainly reflected in two aspects, one is to restrict extraction area of feature point; the second is to restrict the number of feature points. The process of missile placement analysis based on video image was designed and a video splicing process and random sample consensus purification were achieved. The RSURF algorithm is proved that has good realtime performance on the basis of guarantee the accuracy.

  16. Optimum location of external markers using feature selection algorithms for real-time tumor tracking in external-beam radiotherapy: a virtual phantom study.

    Science.gov (United States)

    Nankali, Saber; Torshabi, Ahmad Esmaili; Miandoab, Payam Samadi; Baghizadeh, Amin

    2016-01-08

    In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based feature selection algorithm, in

  17. Features of selection of children for occupations by artistic gymnastics in modern Kurdistan

    Directory of Open Access Journals (Sweden)

    Abdulvahid Dlshad Nihad

    2015-10-01

    Full Text Available Purpose: to study the organizational and pedagogical conditions of selection of children for occupations existing in the republic Kurdistan artistic gymnastics Material and Methods: questioning of 24 trainers on artistic gymnastics and experts in physical culture of the republic Kurdistan was carried out. The general questions of selection and methodical features of selection of children for occupations by artistic gymnastics in Kurdistan were studied. Results: questioning revealed absence of the general approved tests and scientific recommendations concerning their use, dependence of quality of selection on experience of the trainer. Conclusions: experts in the field of physical culture and sport consider inefficient the existing system of selection of children for occupations artistic gymnastics in Kurdistan; gymnastics coaches consider necessary testing’s at children of a level of development of flexibility, dexterity, abilities to manifestation of dynamic force and preservation of dynamic balance

  18. Feature-size dependent selective edge enhancement of x-ray images

    International Nuclear Information System (INIS)

    Herman, S.

    1988-01-01

    Morphological filters are nonlinear signal transformations that operate on a picture directly in the space domain. Such filters are based on the theory of mathematical morphology previously formulated. The filt4er being presented here features a ''mask'' operator (called a ''structuring element'' in some of the literature) which is a function of the two spatial coordinates x and y. The two basic mathematical operations are called ''masked erosion'' and ''masked dilation''. In the case of masked erosion the mask is passed over the input image in a raster pattern. At each position of the mask, the pixel values under the mask are multiplied by the mask pixel values. Then the output pixel value, located at the center position of the mask,is set equal to the minimum of the product of the mask and input values. Similarity, for masked dilation, the output pixel value is the maximum of the product of the input and the mask pixel values. The two basic processes of dilation and erosion can be used to construct the next level of operations the ''positive sieve'' (also called ''opening'') and the ''negative sieve'' (''closing''). The positive sieve modifies the peaks in the image whereas the negative sieve works on image valleys. The positive sieve is implemented by passing the output of the masked erosion step through the masked dilation function. The negative sieve reverses this procedure, using a dilation followed by an erosion. Each such sifting operator is characterized by a ''hole size''. It will be shown that the choice of hole size will select the range of pixel detail sizes which are to be enhanced. The shape of the mask will govern the shape of the enhancement. Finally positive sifting is used to enhance positive-going (peak) features, whereas negative enhances the negative-going (valley) landmarks

  19. Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis

    Science.gov (United States)

    Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan

    2017-10-01

    This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.

  20. Pattern classification using an olfactory model with PCA feature selection in electronic noses: study and application.

    Science.gov (United States)

    Fu, Jun; Huang, Canqin; Xing, Jianguo; Zheng, Junbao

    2012-01-01

    Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.

  1. Ant-cuckoo colony optimization for feature selection in digital mammogram.

    Science.gov (United States)

    Jona, J B; Nagaveni, N

    2014-01-15

    Digital mammogram is the only effective screening method to detect the breast cancer. Gray Level Co-occurrence Matrix (GLCM) textural features are extracted from the mammogram. All the features are not essential to detect the mammogram. Therefore identifying the relevant feature is the aim of this work. Feature selection improves the classification rate and accuracy of any classifier. In this study, a new hybrid metaheuristic named Ant-Cuckoo Colony Optimization a hybrid of Ant Colony Optimization (ACO) and Cuckoo Search (CS) is proposed for feature selection in Digital Mammogram. ACO is a good metaheuristic optimization technique but the drawback of this algorithm is that the ant will walk through the path where the pheromone density is high which makes the whole process slow hence CS is employed to carry out the local search of ACO. Support Vector Machine (SVM) classifier with Radial Basis Kernal Function (RBF) is done along with the ACO to classify the normal mammogram from the abnormal mammogram. Experiments are conducted in miniMIAS database. The performance of the new hybrid algorithm is compared with the ACO and PSO algorithm. The results show that the hybrid Ant-Cuckoo Colony Optimization algorithm is more accurate than the other techniques.

  2. Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

    Directory of Open Access Journals (Sweden)

    Junbao Zheng

    2012-03-01

    Full Text Available Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor as well as its parallel channels (inner factor. The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.

  3. A novel relational regularization feature selection method for joint regression and classification in AD diagnosis.

    Science.gov (United States)

    Zhu, Xiaofeng; Suk, Heung-Il; Wang, Li; Lee, Seong-Whan; Shen, Dinggang

    2017-05-01

    In this paper, we focus on joint regression and classification for Alzheimer's disease diagnosis and propose a new feature selection method by embedding the relational information inherent in the observations into a sparse multi-task learning framework. Specifically, the relational information includes three kinds of relationships (such as feature-feature relation, response-response relation, and sample-sample relation), for preserving three kinds of the similarity, such as for the features, the response variables, and the samples, respectively. To conduct feature selection, we first formulate the objective function by imposing these three relational characteristics along with an ℓ 2,1 -norm regularization term, and further propose a computationally efficient algorithm to optimize the proposed objective function. With the dimension-reduced data, we train two support vector regression models to predict the clinical scores of ADAS-Cog and MMSE, respectively, and also a support vector classification model to determine the clinical label. We conducted extensive experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset to validate the effectiveness of the proposed method. Our experimental results showed the efficacy of the proposed method in enhancing the performances of both clinical scores prediction and disease status identification, compared to the state-of-the-art methods. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Selection of clinical features for pattern recognition applied to gait analysis.

    Science.gov (United States)

    Altilio, Rosa; Paoloni, Marco; Panella, Massimo

    2017-04-01

    This paper deals with the opportunity of extracting useful information from medical data retrieved directly from a stereophotogrammetric system applied to gait analysis. A feature selection method to exhaustively evaluate all the possible combinations of the gait parameters is presented, in order to find the best subset able to classify among diseased and healthy subjects. This procedure will be used for estimating the performance of widely used classification algorithms, whose performance has been ascertained in many real-world problems with respect to well-known classification benchmarks, both in terms of number of selected features and classification accuracy. Precisely, support vector machine, Naive Bayes and K nearest neighbor classifiers can obtain the lowest classification error, with an accuracy greater than 97 %. For the considered classification problem, the whole set of features will be proved to be redundant and it can be significantly pruned. Namely, groups of 3 or 5 features only are able to preserve high accuracy when the aim is to check the anomaly of a gait. The step length and the swing speed are the most informative features for the gait analysis, but also cadence and stride may add useful information for the movement evaluation.

  5. Fault Features Extraction and Identification based Rolling Bearing Fault Diagnosis

    International Nuclear Information System (INIS)

    Qin, B; Sun, G D; Zhang L Y; Wang J G; HU, J

    2017-01-01

    For the fault classification model based on extreme learning machine (ELM), the diagnosis accuracy and stability of rolling bearing is greatly influenced by a critical parameter, which is the number of nodes in hidden layer of ELM. An adaptive adjustment strategy is proposed based on vibrational mode decomposition, permutation entropy, and nuclear kernel extreme learning machine to determine the tunable parameter. First, the vibration signals are measured and then decomposed into different fault feature models based on variation mode decomposition. Then, fault feature of each model is formed to a high dimensional feature vector set based on permutation entropy. Second, the ELM output function is expressed by the inner product of Gauss kernel function to adaptively determine the number of hidden layer nodes. Finally, the high dimension feature vector set is used as the input to establish the kernel ELM rolling bearing fault classification model, and the classification and identification of different fault states of rolling bearings are carried out. In comparison with the fault classification methods based on support vector machine and ELM, the experimental results show that the proposed method has higher classification accuracy and better generalization ability. (paper)

  6. Supervised feature selection for linear and non-linear regression of L⁎a⁎b⁎ color from multispectral images of meat

    DEFF Research Database (Denmark)

    Sharifzadeh, Sara; Clemmensen, Line Katrine Harder; Borggaard, Claus

    2014-01-01

    feature selection method outperforms the PCA for both linear and non-linear methods. The highest performance was obtained by linear ridge regression applied on the selected features from the proposed Elastic net (EN) -based feature selection strategy. All the best models use a reduced number...... of meat samples (430–970 nm) were used for training and testing of the L⁎a⁎b prediction models. Finding a sparse solution or the use of a minimum number of bands is of particular interest to make an industrial vision set-up simpler and cost effective. In this paper, a wide range of linear, non-linear......, kernel-based regression and sparse regression methods are compared. In order to improve the prediction results of these models, we propose a supervised feature selection strategy which is compared with the Principal component analysis (PCA) as a pre-processing step. The results showed that the proposed...

  7. Enhancement web proxy cache performance using Wrapper Feature Selection methods with NB and J48

    Science.gov (United States)

    Mahmoud Al-Qudah, Dua'a.; Funke Olanrewaju, Rashidah; Wong Azman, Amelia

    2017-11-01

    Web proxy cache technique reduces response time by storing a copy of pages between client and server sides. If requested pages are cached in the proxy, there is no need to access the server. Due to the limited size and excessive cost of cache compared to the other storages, cache replacement algorithm is used to determine evict page when the cache is full. On the other hand, the conventional algorithms for replacement such as Least Recently Use (LRU), First in First Out (FIFO), Least Frequently Use (LFU), Randomized Policy etc. may discard important pages just before use. Furthermore, using conventional algorithm cannot be well optimized since it requires some decision to intelligently evict a page before replacement. Hence, most researchers propose an integration among intelligent classifiers and replacement algorithm to improves replacement algorithms performance. This research proposes using automated wrapper feature selection methods to choose the best subset of features that are relevant and influence classifiers prediction accuracy. The result present that using wrapper feature selection methods namely: Best First (BFS), Incremental Wrapper subset selection(IWSS)embedded NB and particle swarm optimization(PSO)reduce number of features and have a good impact on reducing computation time. Using PSO enhance NB classifier accuracy by 1.1%, 0.43% and 0.22% over using NB with all features, using BFS and using IWSS embedded NB respectively. PSO rises J48 accuracy by 0.03%, 1.91 and 0.04% over using J48 classifier with all features, using IWSS-embedded NB and using BFS respectively. While using IWSS embedded NB fastest NB and J48 classifiers much more than BFS and PSO. However, it reduces computation time of NB by 0.1383 and reduce computation time of J48 by 2.998.

  8. A Distributed Feature-based Environment for Collaborative Design

    Directory of Open Access Journals (Sweden)

    Wei-Dong Li

    2003-02-01

    Full Text Available This paper presents a client/server design environment based on 3D feature-based modelling and Java technologies to enable design information to be shared efficiently among members within a design team. In this environment, design tasks and clients are organised through working sessions generated and maintained by a collaborative server. The information from an individual design client during a design process is updated and broadcast to other clients in the same session through an event-driven and call-back mechanism. The downstream manufacturing analysis modules can be wrapped as agents and plugged into the open environment to support the design activities. At the server side, a feature-feature relationship is established and maintained to filter the varied information of a working part, so as to facilitate efficient information update during the design process.

  9. Electricity market price spike analysis by a hybrid data model and feature selection technique

    International Nuclear Information System (INIS)

    Amjady, Nima; Keynia, Farshid

    2010-01-01

    In a competitive electricity market, energy price forecasting is an important activity for both suppliers and consumers. For this reason, many techniques have been proposed to predict electricity market prices in the recent years. However, electricity price is a complex volatile signal owning many spikes. Most of electricity price forecast techniques focus on the normal price prediction, while price spike forecast is a different and more complex prediction process. Price spike forecasting has two main aspects: prediction of price spike occurrence and value. In this paper, a novel technique for price spike occurrence prediction is presented composed of a new hybrid data model, a novel feature selection technique and an efficient forecast engine. The hybrid data model includes both wavelet and time domain variables as well as calendar indicators, comprising a large candidate input set. The set is refined by the proposed feature selection technique evaluating both relevancy and redundancy of the candidate inputs. The forecast engine is a probabilistic neural network, which are fed by the selected candidate inputs of the feature selection technique and predict price spike occurrence. The efficiency of the whole proposed method for price spike occurrence forecasting is evaluated by means of real data from the Queensland and PJM electricity markets. (author)

  10. Identification of Biomarkers for Esophageal Squamous Cell Carcinoma Using Feature Selection and Decision Tree Methods

    Directory of Open Access Journals (Sweden)

    Chun-Wei Tung

    2013-01-01

    Full Text Available Esophageal squamous cell cancer (ESCC is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.

  11. An expert botanical feature extraction technique based on phenetic features for identifying plant species.

    Directory of Open Access Journals (Sweden)

    Hoshang Kolivand

    Full Text Available In this paper, we present a new method to recognise the leaf type and identify plant species using phenetic parts of the leaf; lobes, apex and base detection. Most of the research in this area focuses on the popular features such as the shape, colour, vein, and texture, which consumes large amounts of computational processing and are not efficient, especially in the Acer database with a high complexity structure of the leaves. This paper is focused on phen